Graph

Package containing Graph Indexer modules.

Modules:

Name	Description
`LlamaIndexGraphRAGIndexer`	A class for indexing elements using LlamaIndex.
`LightRAGGraphRAGIndexer`	A class for indexing elements using LightRAG.

`LightRAGGraphRAGIndexer(graph_store)`

Bases: BaseGraphRAGIndexer

Indexer abstract base class for LightRAG-based graph RAG.

How to run LightRAG with PostgreSQL using Docker:

docker run         -p 5455:5432         -d         --name postgres-LightRag         shangor/postgres-for-rag:v1.0         sh -c "service postgresql start && sleep infinity"

Example

from gllm_inference.em_invoker import OpenAIEMInvoker
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_docproc.indexer.graph.light_rag_graph_rag_indexer import LightRAGGraphRAGIndexer
from gllm_datastore.graph_data_store.light_rag_postgres_data_store import LightRAGPostgresDataStore

# Create the LightRAGPostgresDataStore instance
graph_store = LightRAGPostgresDataStore(
    lm_invoker=OpenAILMInvoker(model_name="gpt-4o-mini"),
    em_invoker=OpenAIEMInvoker(model_name="text-embedding-3-small"),
    postgres_db_host="localhost",
    postgres_db_port=5455,
    postgres_db_user="rag",
    postgres_db_password="rag",
    postgres_db_name="rag",
    postgres_db_workspace="default",
)


# Create the indexer
indexer = LightRAGGraphRAGIndexer(graph_store=graph_store)

# Create elements to index
elements = [
    {
        "text": "This is a sample document about AI.",
        "structure": "uncategorized",
        "metadata": {
            "source": "sample.txt",
            "source_type": "TEXT",
            "loaded_datetime": "2025-07-10T12:00:00",
            "chunk_id": "chunk_001",
            "file_id": "file_001"
        }
    }
]

# Index the elements
indexer.index(elements)

Attributes:

Name	Type	Description
`_graph_store`	`BaseLightRAGDataStore`	The LightRAG data store used for indexing and querying.

Initialize the LightRAGGraphRAGIndexer.

Parameters:

Name	Type	Description	Default
`graph_store`	`BaseLightRAGDataStore`	The LightRAG instance to use for indexing.	required

`delete(file_id=None, chunk_id=None, entity_id=None, **kwargs)`

Delete entities from the LightRAG system and graph.

Supports multiple deletion modes based on the provided keyword arguments. Exactly one of the supported deletion parameters must be provided. If file_id is provided, delegates to delete_file_chunks.

Parameters:

Name	Type	Description	Default
`file_id`	`str`	Delete a file and all its associated chunks. Defaults to None.	`None`
`chunk_id`	`str`	Delete a specific chunk entity. Defaults to None.	`None`
`entity_id`	`str`	Delete a specific entity or node. Defaults to None.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status and error message. - success (bool): True if deletion succeeded, False otherwise. - error_message (str): Error message if deletion failed, empty string otherwise.

Raises:

Type	Description
`ValueError`	If no deletion parameter is provided or multiple are provided.

`delete_chunk(chunk_id, file_id, **kwargs)`

Delete a single chunk by chunk ID and file ID.

Parameters:

Name	Type	Description	Default
`chunk_id`	`str`	The ID of the chunk to delete.	required
`file_id`	`str`	The ID of the file the chunk belongs to.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status and error message.

Raises:

Type	Description
`NotImplementedError`	This method is not yet implemented.

`delete_file_chunks(file_id, **kwargs)`

Delete all chunks for a specific file.

Parameters:

Name	Type	Description	Default
`file_id`	`str`	The ID of the file whose chunks should be deleted.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status and error message.

`get_chunk(chunk_id, file_id, **kwargs)`

Get a single chunk by chunk ID and file ID.

Parameters:

Name	Type	Description	Default
`chunk_id`	`str`	The ID of the chunk to retrieve.	required
`file_id`	`str`	The ID of the file the chunk belongs to.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any] \| None`	dict[str, Any] \| None: The chunk data, or None if not found.

Raises:

Type	Description
`NotImplementedError`	This method is not yet implemented.

`get_file_chunks(file_id, page=0, size=20, **kwargs)`

Get chunks for a specific file with pagination support.

Parameters:

Name	Type	Description	Default
`file_id`	`str`	The ID of the file to get chunks from.	required
`page`	`int`	The page number (0-indexed). Defaults to 0.	`0`
`size`	`int`	The number of chunks per page. Defaults to 20.	`20`
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with chunks list, total count, and pagination info.

Raises:

Type	Description
`NotImplementedError`	This method is not yet implemented.

`index(elements, **kwargs)`

Index elements into the LightRAG system and create graph relationships.

This method validates that file_id is present in kwargs and delegates to index_file_chunks.

Parameters:

Name	Type	Description	Default
`elements`	`list[dict[str, Any]]`	List of Element objects containing text and metadata. Each element should have a metadata attribute with a chunk_id and a file_id.	required
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Kwargs

file_id (str): The ID of the file these chunks belong to.

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status, error message, and total count. - success (bool): True if indexing succeeded, False otherwise. - error_message (str): Error message if indexing failed, empty string otherwise. - total (int): The total number of chunks indexed.

Raises:

Type	Description
`ValueError`	If file_id is not provided in kwargs.

`index_chunk(element, **kwargs)`

Index a single chunk.

This method only indexes the chunk. It does NOT update the metadata of neighboring chunks (previous_chunk/next_chunk). The caller is responsible for maintaining chunk relationships by updating adjacent chunks' metadata separately.

Parameters:

Name	Type	Description	Default
`element`	`dict[str, Any]`	The chunk to be indexed.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status, error message, and chunk_id.

Raises:

Type	Description
`NotImplementedError`	This method is not yet implemented.

`index_file_chunks(elements, file_id, **kwargs)`

Index chunks for a specific file.

This method extracts text and chunk IDs from the provided elements, inserts them into the LightRAG system, and creates a graph structure connecting files to chunks.

Parameters:

Name	Type	Description	Default
`elements`	`list[dict[str, Any]]`	The chunks to be indexed.	required
`file_id`	`str`	The ID of the file these chunks belong to.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status, error message, and total count.

`resolve_entities()`

Resolve entities from the graph.

Currently, this method does nothing. Resolve entities has been implicitly implemented in the LightRAG instance.

`update_chunk(element, **kwargs)`

Update a chunk by chunk ID.

This method updates both the text content and metadata of a chunk. When text content is updated, the chunk should be re-processed through data generators and re-indexed with updated vector embeddings.

Parameters:

Name	Type	Description	Default
`element`	`dict[str, Any]`	The updated chunk data.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status, error message, and chunk_id.

Raises:

Type	Description
`NotImplementedError`	This method is not yet implemented.

`update_chunk_metadata(chunk_id, file_id, metadata, **kwargs)`

Update metadata for a specific chunk.

This method patches new metadata into the existing chunk metadata. Existing metadata fields will be overwritten, and new fields will be added. System-managed metadata fields (file_id, chunk_id, etc.) should be preserved and not overwritten.

Parameters:

Name	Type	Description	Default
`chunk_id`	`str`	The ID of the chunk to update.	required
`file_id`	`str`	The ID of the file the chunk belongs to.	required
`metadata`	`dict[str, Any]`	The metadata fields to update.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status and error message.

Raises:

Type	Description
`NotImplementedError`	This method is not yet implemented.

`LlamaIndexGraphRAGIndexer(graph_store, llama_index_llm=None, allowed_entity_types=None, allowed_relation_types=None, kg_validation_schema=None, strict_mode=False, kg_extractors=None, embed_model=None, vector_store=None, max_triplets_per_chunk=10, num_workers=4, **kwargs)`

Bases: BaseGraphRAGIndexer

Indexer for graph RAG using LlamaIndex.

Attributes:

Name	Type	Description
`_index`	`PropertyGraphIndex`	Property graph index.
`_graph_store`	`LlamaIndexGraphRAGDataStore`	Storage for property graph.
`_strict_mode`	`bool`	Whether strict schema validation is enabled.

Initialize the LlamaIndexGraphRAGIndexer.

Parameters:

Name	Type	Description	Default
`graph_store`	`LlamaIndexGraphRAGDataStore`	Storage for property graph.	required
`llama_index_llm`	`BaseLLM \| None`	Language model for LlamaIndex. Defaults to None. Deprecated: Use graph_store.llm instead. Instantiate the LLM via LlamaIndexGraphRAGDataStore (e.g., LlamaIndexGraphRAGDataStore(lm_invoker=...)).	`None`
`allowed_entity_types`	`list[str] \| None`	List of allowed entity types. When strict_mode=True, only these types are extracted. When strict_mode=False, serves as hints. Defaults to None.	`None`
`allowed_relation_types`	`list[str] \| None`	List of allowed relationship types. Behavior depends on strict_mode. Defaults to None.	`None`
`kg_validation_schema`	`dict[str, list[str]] \| None`	Validation schema for strict mode. Maps entity types to their allowed outgoing relationship types. Format: {"ENTITY_TYPE": ["ALLOWED_REL1", "ALLOWED_REL2"], ...} Example: {"PERSON": ["WORKS_AT", "FOUNDED"], "ORGANIZATION": ["LOCATED_IN"]} Defaults to None.	`None`
`strict_mode`	`bool`	If True, uses SchemaLLMPathExtractor with strict validation. If False (default), uses DynamicLLMPathExtractor with optional guidance. Defaults to False.	`False`
`kg_extractors`	`list[TransformComponent] \| None`	Custom list of extractors. If provided, overrides automatic extractor selection based on strict_mode. Defaults to None.	`None`
`embed_model`	`BaseEmbedding \| None`	Embedding model for vector representations. Defaults to None. Deprecated: Use graph_store.embed_model instead. Instantiate the embedding model via LlamaIndexGraphRAGDataStore (e.g., LlamaIndexGraphRAGDataStore(em_invoker=...)).	`None`
`vector_store`	`BasePydanticVectorStore \| None`	Storage for vector data. Defaults to None.	`None`
`max_triplets_per_chunk`	`int`	Maximum triplets to extract per chunk. Defaults to 10.	`10`
`num_workers`	`int`	Number of parallel workers. Defaults to 4.	`4`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

`delete(**kwargs)`

Delete elements from the knowledge graph.

This method deletes elements from the knowledge graph. It validates that file_id or document_id is present in kwargs and delegates to delete_file_chunks. If only document_id is provided, it is used as file_id.

Parameters:

Name	Type	Description	Default
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Kwargs

file_id (str, optional): The ID of the file whose chunks should be deleted. document_id (str, optional): The document ID. If file_id is not provided, document_id is used as file_id.

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status and error message. - success (bool): True if deletion succeeded, False otherwise. - error_message (str): Error message if deletion failed, empty string otherwise.

Raises:

Type	Description
`ValueError`	If neither file_id nor document_id is provided in kwargs.

`delete_chunk(chunk_id, file_id, **kwargs)`

Delete a single chunk by chunk ID and file ID.

Parameters:

Name	Type	Description	Default
`chunk_id`	`str`	The ID of the chunk to delete.	required
`file_id`	`str`	The ID of the file the chunk belongs to.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status and error message.

Raises:

Type	Description
`NotImplementedError`	This method is not yet implemented.

`delete_file_chunks(file_id, **kwargs)`

Delete all chunks for a specific file.

This method deletes all chunks from the knowledge graph based on the provided file_id.

Parameters:

Name	Type	Description	Default
`file_id`	`str`	The ID of the file whose chunks should be deleted.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status and error message.

`get_chunk(chunk_id, file_id, **kwargs)`

Get a single chunk by chunk ID and file ID.

Parameters:

Name	Type	Description	Default
`chunk_id`	`str`	The ID of the chunk to retrieve.	required
`file_id`	`str`	The ID of the file the chunk belongs to.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any] \| None`	dict[str, Any] \| None: The chunk data, or None if not found.

Raises:

Type	Description
`NotImplementedError`	This method is not yet implemented.

`get_file_chunks(file_id, page=0, size=20, **kwargs)`

Get chunks for a specific file with pagination support.

Parameters:

Name	Type	Description	Default
`file_id`	`str`	The ID of the file to get chunks from.	required
`page`	`int`	The page number (0-indexed). Defaults to 0.	`0`
`size`	`int`	The number of chunks per page. Defaults to 20.	`20`
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with chunks list, total count, and pagination info.

Raises:

Type	Description
`NotImplementedError`	This method is not yet implemented.

`index(elements, **kwargs)`

Index elements into the graph.

This method indexes elements into the graph. It validates that file_id or document_id is present in kwargs and delegates to index_file_chunks. If only document_id is provided, it is used as file_id.

Parameters:

Name	Type	Description	Default
`elements`	`list[dict[str, Any]]`	List of dictionaries representing elements to be indexed.	required
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Kwargs

file_id (str, optional): The ID of the file these chunks belong to. document_id (str, optional): The document ID. If file_id is not provided, document_id is used as file_id.

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status, error message, and total count. - success (bool): True if indexing succeeded, False otherwise. - error_message (str): Error message if indexing failed, empty string otherwise. - total (int): The total number of chunks indexed.

Raises:

Type	Description
`ValueError`	If neither file_id nor document_id is provided in kwargs.

`index_chunk(element, **kwargs)`

Index a single chunk.

This method only indexes the chunk. It does NOT update the metadata of neighboring chunks (previous_chunk/next_chunk). The caller is responsible for maintaining chunk relationships by updating adjacent chunks' metadata separately.

Parameters:

Name	Type	Description	Default
`element`	`dict[str, Any]`	The chunk to be indexed.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status, error message, and chunk_id.

Raises:

Type	Description
`NotImplementedError`	This method is not yet implemented.

`index_file_chunks(elements, file_id, **kwargs)`

Index chunks for a specific file.

This method indexes chunks for a file.

Notes: - Currently only Neo4jPropertyGraphStore that is supported for indexing the metadata from the TextNode. - The 'chunk_id' parameter is used to specify the chunk ID for the elements.

Parameters:

Name	Type	Description	Default
`elements`	`list[dict[str, Any]]`	The chunks to be indexed.	required
`file_id`	`str`	The ID of the file these chunks belong to.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status, error message, and total count.

`resolve_entities()`

Resolve entities in the graph.

Currently, this method does nothing.

`update_chunk(element, **kwargs)`

Update a chunk by chunk ID.

This method updates both the text content and metadata of a chunk. When text content is updated, the chunk should be re-processed through data generators and re-indexed with updated vector embeddings.

Parameters:

Name	Type	Description	Default
`element`	`dict[str, Any]`	The updated chunk data.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status, error message, and chunk_id.

Raises:

Type	Description
`NotImplementedError`	This method is not yet implemented.

`update_chunk_metadata(chunk_id, file_id, metadata, **kwargs)`

Update metadata for a specific chunk.

This method patches new metadata into the existing chunk metadata. Existing metadata fields will be overwritten, and new fields will be added. System-managed metadata fields (file_id, chunk_id, etc.) should be preserved and not overwritten.

Parameters:

Name	Type	Description	Default
`chunk_id`	`str`	The ID of the chunk to update.	required
`file_id`	`str`	The ID of the file the chunk belongs to.	required
`metadata`	`dict[str, Any]`	The metadata fields to update.	required
`**kwargs`	`Any`	Additional keyword arguments for customization.	`{}`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Response with success status and error message.

Raises:

Type	Description
`NotImplementedError`	This method is not yet implemented.