Skip to content

Fulltext

Fulltext DB Indexer module.

FulltextDBIndexer(data_store_map=None, cache_size=DEFAULT_CACHE_SIZE, retryable_exceptions=None)

Bases: BaseIndexer

Index elements into a fulltext datastore capability (no embeddings required).

Initialize the indexer with mappings for fulltext datastore capabilities.

Parameters:

Name Type Description Default
data_store_map dict[str, Type[BaseDataStore]] | None

Mapping of db_engine strings to BaseDataStore classes. If not provided, uses DEFAULT_DATA_STORE_MAP which includes "chroma", "elasticsearch", and "opensearch". Defaults to None.

None
cache_size int

Maximum number of fulltext datastore instances to cache using LRU policy. Defaults to DEFAULT_CACHE_SIZE (128).

DEFAULT_CACHE_SIZE
retryable_exceptions tuple[type[Exception], ...] | None

Tuple of exception types to retry on during batch processing. Defaults to DEFAULT_RETRYABLE_EXCEPTIONS.

None

delete_chunk(chunk_id, file_id, **kwargs)

Delete a single chunk by chunk ID and file ID.

Parameters:

Name Type Description Default
chunk_id str

The ID of the chunk to delete.

required
file_id str

The ID of the file the chunk belongs to.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: The response from the deletion process. Should include: 1. success (bool): True if deletion succeeded, False otherwise. 2. error_message (str): Error message if deletion failed, empty string otherwise.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

delete_file_chunks(file_id, **kwargs)

Delete all chunks for a specific file.

Missing index/collection is treated as success (nothing to delete). On version conflicts, the delete is retried up to max_retries times; if conflicts persist after all attempts, a RuntimeError is raised and success=False is returned.

Parameters:

Name Type Description Default
file_id str

The ID of the file to delete chunks from.

required
**kwargs Any

Additional keyword arguments for customization.

{}
Kwargs

db_engine (str): The database engine to use (e.g., "chroma", "elasticsearch", "opensearch"). db_config (dict[str, Any]): Datastore config (index_name, url, etc.). max_retries (int, optional): Maximum retry attempts on version conflicts. Defaults to 3.

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with success status and error message.

Raises:

Type Description
ValueError

If invalid params or unsupported config is provided.

KeyError

If missing required kwargs.

get_chunk(chunk_id, file_id, **kwargs)

Get a single chunk by chunk ID and file ID.

Parameters:

Name Type Description Default
chunk_id str

The ID of the chunk to retrieve.

required
file_id str

The ID of the file the chunk belongs to.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any] | None

dict[str, Any] | None: The chunk data following the Element structure with 'text' and 'metadata' keys, or None if the chunk is not found.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

get_file_chunks(file_id, page=0, size=20, **kwargs)

Get chunks for a specific file with pagination support.

Parameters:

Name Type Description Default
file_id str

The ID of the file to get chunks from.

required
page int

The page number (0-indexed). Defaults to 0.

0
size int

The number of chunks per page. Defaults to 20.

20
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response containing: 1. chunks (list[dict[str, Any]]): List of chunks (elements) with text, structure, and metadata. 2. pagination (dict[str, Any]): Pagination metadata with: - page (int): Current page number. - size (int): Number of items per page. - total_chunks (int): Total number of chunks for the file. - total_pages (int): Total number of pages. - has_next (bool): Whether there is a next page. - has_previous (bool): Whether there is a previous page.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

index_chunk(element, **kwargs)

Index a single chunk.

Note: This method only indexes the chunk. It does NOT update the metadata of neighboring chunks (previous_chunk/next_chunk). The caller is responsible for maintaining chunk relationships by updating adjacent chunks' metadata separately.

Parameters:

Name Type Description Default
element dict[str, Any]

The chunk to be indexed. Should follow the Element structure with 'text' and 'metadata' keys. Metadata must include 'file_id' and 'chunk_id'.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: The response from the indexing process. Should include: 1. success (bool): True if indexing succeeded, False otherwise. 2. error_message (str): Error message if indexing failed, empty string otherwise. 3. chunk_id (str): The ID of the indexed chunk.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

index_chunks(elements, **kwargs)

Index multiple chunks.

This method enables indexing multiple chunks in a single operation without requiring file replacement semantics (i.e., it inserts or overwrites the provided chunks directly without first deleting existing chunks). The chunks provided can belong to multiple different files.

Parameters:

Name Type Description Default
elements list[dict[str, Any]]

The chunks to be indexed. Each dict should follow the Element structure with 'text' and 'metadata' keys. Metadata must include 'file_id' and 'chunk_id'.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: The response from the indexing process. Should include: 1. success (bool): True if indexing succeeded, False otherwise. 2. error_message (str): Error message if indexing failed, empty string otherwise. 3. total (int): The total number of chunks indexed.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

index_file_chunks(elements, file_id, **kwargs)

Index chunks for a specific file, replacing any existing chunks for that file.

Parameters:

Name Type Description Default
elements list[dict[str, Any]]

The chunks to be indexed.

required
file_id str

The ID of the file these chunks belong to.

required
**kwargs Any

Additional keyword arguments for customization.

{}
Kwargs

db_engine (str): The database engine to use (e.g., "chroma", "elasticsearch", "opensearch"). db_config (dict[str, Any]): Datastore config (index_name, url, etc.). batch_size (int, optional): Number of elements to process in each batch. Defaults to 100. max_retries (int, optional): Maximum number of retry attempts for failed batches. Defaults to 3.

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with success status, error message, and total count.

Raises:

Type Description
ValueError

If invalid params or unsupported config is provided.

KeyError

If missing required kwargs.

update_chunk(element, **kwargs)

Update a chunk by chunk ID.

This method updates both the text content and metadata of a chunk.

Fails on chunk not found.

Parameters:

Name Type Description Default
element dict[str, Any]

The updated chunk data. Should follow the Element structure with 'text' and 'metadata' keys. Metadata must include 'file_id' and 'chunk_id'.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: The response from the update process. Should include: 1. success (bool): True if update succeeded, False otherwise. 2. error_message (str): Error message if update failed, empty string otherwise. 3. chunk_id (str): The ID of the updated chunk.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

update_chunk_metadata(chunk_id, file_id, metadata, **kwargs)

Update metadata for a specific chunk.

This method patches new metadata into the existing chunk metadata. Existing metadata fields will be overwritten, and new fields will be added.

Fails on chunk not found.

Parameters:

Name Type Description Default
chunk_id str

The ID of the chunk to update.

required
file_id str

The ID of the file the chunk belongs to.

required
metadata dict[str, Any]

The metadata fields to update. Only the provided fields will be updated; other existing metadata will remain unchanged.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: The response from the update process. Should include: 1. success (bool): True if update succeeded, False otherwise. 2. error_message (str): Error message if update failed, empty string otherwise.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.