In LanceDB Enterprise,
create_fts_index API returns immediately, but index building happens asynchronously.Creating FTS Indexes
Synchronous API
Usecreate_fts_index with synchronous LanceDB connections:
Check FTS index status using the API:
wait_for_index(...) waits until the named FTS index exists and index_stats(...) reports num_unindexed_rows == 0. It can time out if writes keep adding rows faster than the index catches up. If a table has multiple FTS indexes, specify the target text column when querying instead of relying on implicit selection.
Asynchronous API
When using async connections (connect_async), use create_index with the FTS configuration:
The
create_fts_index method is not available on AsyncTable. Use create_index with FTS config instead.Nested field paths
FTS indexes can target text leaves inside struct columns by passing a dotted path (for example,payload.text). The same path works for MatchQuery and PhraseQuery, and for the columns argument on async nearest_to_text queries.
You can point an index at any string leaf nested in a struct, regardless of depth. The struct container itself isn’t indexable: you have to name a specific text field.
LanceDB rejects paths that don’t resolve to a text leaf:
- A struct container (for example,
payload): raisesValueError: FTS index cannot be created .... - A non-text leaf such as an integer or float (for example,
payload.count): raises the same error. - A path that doesn’t exist in the schema (for example,
payload.missing): raisesValueError: Field path ... not found.
create_index:
Python
Configuration Options
FTS Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
with_position | bool | False | Store token positions (required for phrase queries) |
base_tokenizer | str | "simple" | Text splitting method (simple, whitespace, raw, ngram, jieba/*, or lindera/*) |
language | str | "English" | Language for stemming/stop words |
max_token_length | int | 40 | Maximum token size; longer tokens are omitted |
lower_case | bool | True | Lowercase tokens |
stem | bool | True | Apply stemming (running → run) |
remove_stop_words | bool | True | Drop common stop words |
ascii_folding | bool | True | Normalize accented characters |
custom_stop_words | list[str] | None | Extra stop words to drop in addition to the language defaults. Requires remove_stop_words=True. |
ngram_min_length | int | 3 | Minimum n-gram length. Applies only when base_tokenizer="ngram". |
ngram_max_length | int | 3 | Maximum n-gram length. Applies only when base_tokenizer="ngram". |
prefix_only | bool | False | Index only prefix n-grams rather than all substrings. Applies only when base_tokenizer="ngram". |
max_token_lengthcan filter out base64 blobs or long URLs.- Disabling
with_positionreduces index size but disables phrase queries. ascii_foldinghelps with international text (e.g., “café” → “cafe”).
jieba/default and lindera/ipadic require tokenizer model files in Lance’s language model home. Lance looks under the default platform data directory for lance/language_models, or you can set LANCE_LANGUAGE_MODEL_HOME to point to another model root. For example, jieba/default is resolved under <model-home>/jieba/default/....
Phrase Query Configuration
Enable phrase queries by setting:| Parameter | Required Value | Purpose |
|---|---|---|
with_position | True | Track token positions for phrase matching |
remove_stop_words | False | Preserve stop words for exact phrase matching |
Indexing nested string fields
You can build an FTS index on a string field inside a struct by passing its full dotted path, likenested.text. The same path is used when you query the index through fts_columns, and the indexed column is reported back as the full path from list_indices().
Use the canonical Lance path: dot-separate each struct field from root to leaf (for example,
metadata.author.name). The same convention applies to scalar and vector indexes.