1. File discovery
unch index walks the repository, applies .gitignore plus explicit --exclude patterns, and decides which source files should be processed.
2. File hashing
Each candidate file is hashed. The active file-hash state in.semsearch/filehashes.db is used to decide whether the file can be reused or needs to be reindexed.
3. Symbol extraction
For supported languages,unch uses Tree-sitter to extract top-level symbols and attached docs.
That currently covers:
- Go
- Rust
- TypeScript
- JavaScript
- Python
index can still use the legacy prefix fallback.
4. Embedding generation
Each extracted symbol is flattened into an indexed document and embedded with the selected provider. Provider options today:llama.cppfor local GGUF models throughyzmaopenrouterfor remote embedding APIs
embeddinggemmaqwen3
5. Snapshot activation
Embeddings and symbols are written into provider-scoped and model-scoped snapshots in.semsearch/index.db.
When indexing succeeds, the new snapshot becomes active for that provider/model pair. Other provider/model snapshots are left untouched.
Embedding vector tables are also separated by dimension, so models with different embedding sizes can coexist in one .semsearch directory.
6. Search
unch search supports:
semanticlexicalauto
auto stays semantic-first, but can prefer lexical results when the query looks more like a symbol name or code fragment.
7. Optional remote restore
If the manifest is bound to remote CI,unch search can restore the latest compatible published state before executing the query.