Indexer

Document Processing Orchestrator Indexer Package.

Modules:

Name	Description
`BaseIndexer`	Abstract base class for indexing document.

`BaseIndexer`

Bases: ABC

Base class for document converter.

`delete(**kwargs)` `abstractmethod`

Delete document from a vector DB.

The arguments are not defined yet, it depends on the implementation. Some vector database will require: db_url, index_name, document_id.

Parameters:

Name	Type	Description	Default
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Name	Type	Description
`Any`	`Any`	The response from the deletion process.

`delete_chunk(chunk_id, file_id, **kwargs)` `abstractmethod`

Delete a single chunk by chunk ID and file ID.

Parameters:

Name	Type	Description	Default
`chunk_id`	`str`	The ID of the chunk to delete.	required
`file_id`	`str`	The ID of the file the chunk belongs to.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: The response from the deletion process. Should include: 1. success (bool): True if deletion succeeded, False otherwise. 2. error_message (str): Error message if deletion failed, empty string otherwise.

`delete_file_chunks(file_id, **kwargs)` `abstractmethod`

Delete all chunks for a specific file.

Parameters:

Name	Type	Description	Default
`file_id`	`str`	The ID of the file whose chunks should be deleted.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: The response from the deletion process. Should include: 1. success (bool): True if deletion succeeded, False otherwise. 2. error_message (str): Error message if deletion failed, empty string otherwise.

`get_chunk(chunk_id, file_id, **kwargs)` `abstractmethod`

Get a single chunk by chunk ID and file ID.

Parameters:

Name	Type	Description	Default
`chunk_id`	`str`	The ID of the chunk to retrieve.	required
`file_id`	`str`	The ID of the file the chunk belongs to.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any] \| None`	dict[str, Any] \| None: The chunk data following the Element structure with 'text' and 'metadata' keys, or None if the chunk is not found.

`get_file_chunks(file_id, page=0, size=20, **kwargs)` `abstractmethod`

Get chunks for a specific file with pagination support.

Parameters:

Name	Type	Description	Default
`file_id`	`str`	The ID of the file to get chunks from.	required
`page`	`int`	The page number (0-indexed). Defaults to 0.	`0`
`size`	`int`	The number of chunks per page. Defaults to 20.	`20`
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: The response containing chunks and pagination metadata. Should include: 1. chunks (list[dict[str, Any]]): List of chunks, each following the Element structure. 2. total (int): Total number of chunks for the file. 3. page (int): Current page number. 4. size (int): Number of chunks per page. 5. total_pages (int): Total number of pages.

Note

Chunks should be sorted by their metadata.order field (position within the file).

`index(elements, **kwargs)` `abstractmethod`

Index data from a source file into Elasticsearch.

Parameters:

Name	Type	Description	Default
`elements`	`Any`	The information to be indexed. Ideally formatted as List[Dict] and each Dict following the structure of model 'Element'.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Name	Type	Description
`Any`	`Any`	The response from the indexing process.

`index_chunk(element, **kwargs)` `abstractmethod`

Index a single chunk.

Note: This method only indexes the chunk. It does NOT update the metadata of neighboring chunks (previous_chunk/next_chunk). The caller is responsible for maintaining chunk relationships by updating adjacent chunks' metadata separately.

Parameters:

Name	Type	Description	Default
`element`	`dict[str, Any]`	The chunk to be indexed. Should follow the Element structure with 'text' and 'metadata' keys. Metadata must include 'file_id' and 'chunk_id'.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: The response from the indexing process. Should include: 1. success (bool): True if indexing succeeded, False otherwise. 2. error_message (str): Error message if indexing failed, empty string otherwise. 3. chunk_id (str): The ID of the indexed chunk.

`index_file_chunks(elements, file_id, **kwargs)` `abstractmethod`

Index chunks for a specific file.

This method indexes chunks for a file. The indexer is responsible for deleting any existing chunks for the file_id before indexing the new chunks to ensure consistency. This ensures that the file's chunks are completely replaced with the new set of chunks.

Parameters:

Name	Type	Description	Default
`elements`	`list[dict[str, Any]]`	The chunks to be indexed. Each dict should follow the Element structure with 'text' and 'metadata' keys. Metadata must include 'file_id' and 'chunk_id'.	required
`file_id`	`str`	The ID of the file these chunks belong to.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: The response from the indexing process. Should include: 1. success (bool): True if indexing succeeded, False otherwise. 2. error_message (str): Error message if indexing failed, empty string otherwise. 3. total (int): The total number of chunks indexed.

`update_chunk(element, **kwargs)` `abstractmethod`

Update a chunk by chunk ID.

This method updates both the text content and metadata of a chunk.

Parameters:

Name	Type	Description	Default
`element`	`dict[str, Any]`	The updated chunk data. Should follow the Element structure with 'text' and 'metadata' keys. Metadata must include 'file_id' and 'chunk_id'.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: The response from the update process. Should include: 1. success (bool): True if update succeeded, False otherwise. 2. error_message (str): Error message if update failed, empty string otherwise. 3. chunk_id (str): The ID of the updated chunk.

`update_chunk_metadata(chunk_id, file_id, metadata, **kwargs)` `abstractmethod`

Update metadata for a specific chunk.

This method patches new metadata into the existing chunk metadata. Existing metadata fields will be overwritten, and new fields will be added.

Parameters:

Name	Type	Description	Default
`chunk_id`	`str`	The ID of the chunk to update.	required
`file_id`	`str`	The ID of the file the chunk belongs to.	required
`metadata`	`dict[str, Any]`	The metadata fields to update. Only the provided fields will be updated; other existing metadata will remain unchanged.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: The response from the update process. Should include: 1. success (bool): True if update succeeded, False otherwise. 2. error_message (str): Error message if update failed, empty string otherwise.

Indexer

BaseIndexer

delete(**kwargs) abstractmethod

delete_chunk(chunk_id, file_id, **kwargs) abstractmethod

delete_file_chunks(file_id, **kwargs) abstractmethod

get_chunk(chunk_id, file_id, **kwargs) abstractmethod

get_file_chunks(file_id, page=0, size=20, **kwargs) abstractmethod

index(elements, **kwargs) abstractmethod

index_chunk(element, **kwargs) abstractmethod

index_file_chunks(elements, file_id, **kwargs) abstractmethod

update_chunk(element, **kwargs) abstractmethod

update_chunk_metadata(chunk_id, file_id, metadata, **kwargs) abstractmethod

`BaseIndexer`

`delete(**kwargs)` `abstractmethod`

`delete_chunk(chunk_id, file_id, **kwargs)` `abstractmethod`

`delete_file_chunks(file_id, **kwargs)` `abstractmethod`

`get_chunk(chunk_id, file_id, **kwargs)` `abstractmethod`

`get_file_chunks(file_id, page=0, size=20, **kwargs)` `abstractmethod`

`index(elements, **kwargs)` `abstractmethod`

`index_chunk(element, **kwargs)` `abstractmethod`

`index_file_chunks(elements, file_id, **kwargs)` `abstractmethod`

`update_chunk(element, **kwargs)` `abstractmethod`

`update_chunk_metadata(chunk_id, file_id, metadata, **kwargs)` `abstractmethod`