Graph
Package containing Graph Indexer modules.
Modules:
| Name | Description |
|---|---|
LlamaIndexGraphRAGIndexer |
A class for indexing elements using LlamaIndex. |
LightRAGGraphRAGIndexer |
A class for indexing elements using LightRAG. |
LightRAGGraphRAGIndexer(graph_store)
Bases: BaseGraphRAGIndexer
Indexer abstract base class for LightRAG-based graph RAG.
How to run LightRAG with PostgreSQL using Docker:
docker run -p 5455:5432 -d --name postgres-LightRag shangor/postgres-for-rag:v1.0 sh -c "service postgresql start && sleep infinity"
Example
from gllm_inference.em_invoker import OpenAIEMInvoker
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_docproc.indexer.graph.light_rag_graph_rag_indexer import LightRAGGraphRAGIndexer
from gllm_datastore.graph_data_store.light_rag_postgres_data_store import LightRAGPostgresDataStore
# Create the LightRAGPostgresDataStore instance
graph_store = LightRAGPostgresDataStore(
lm_invoker=OpenAILMInvoker(model_name="gpt-4o-mini"),
em_invoker=OpenAIEMInvoker(model_name="text-embedding-3-small"),
postgres_db_host="localhost",
postgres_db_port=5455,
postgres_db_user="rag",
postgres_db_password="rag",
postgres_db_name="rag",
postgres_db_workspace="default",
)
# Create the indexer
indexer = LightRAGGraphRAGIndexer(graph_store=graph_store)
# Create elements to index
elements = [
{
"text": "This is a sample document about AI.",
"structure": "uncategorized",
"metadata": {
"source": "sample.txt",
"source_type": "TEXT",
"loaded_datetime": "2025-07-10T12:00:00",
"chunk_id": "chunk_001",
"file_id": "file_001"
}
}
]
# Index the elements
indexer.index(elements)
Attributes:
| Name | Type | Description |
|---|---|---|
_graph_store |
BaseLightRAGDataStore
|
The LightRAG data store used for indexing and querying. |
Initialize the LightRAGGraphRAGIndexer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph_store
|
BaseLightRAGDataStore
|
The LightRAG instance to use for indexing. |
required |
delete(file_id=None, chunk_id=None, entity_id=None, **kwargs)
Delete entities from the LightRAG system and graph.
Supports multiple deletion modes based on the provided keyword arguments. Exactly one of the supported deletion parameters must be provided.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_id
|
str
|
Delete a file and all its associated chunks. Defaults to None. |
None
|
chunk_id
|
str
|
Delete a specific chunk entity. Defaults to None. |
None
|
entity_id
|
str
|
Delete a specific entity or node. Defaults to None. |
None
|
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If no deletion parameter is provided or multiple are provided. |
delete_chunk(chunk_id, file_id, **kwargs)
Delete a single chunk by chunk ID and file ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_id
|
str
|
The ID of the chunk to delete. |
required |
file_id
|
str
|
The ID of the file the chunk belongs to. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status and error message. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
delete_file_chunks(file_id, **kwargs)
Delete all chunks for a specific file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_id
|
str
|
The ID of the file whose chunks should be deleted. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status and error message. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
get_chunk(chunk_id, file_id, **kwargs)
Get a single chunk by chunk ID and file ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_id
|
str
|
The ID of the chunk to retrieve. |
required |
file_id
|
str
|
The ID of the file the chunk belongs to. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
dict[str, Any] | None: The chunk data, or None if not found. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
get_file_chunks(file_id, page=0, size=20, **kwargs)
Get chunks for a specific file with pagination support.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_id
|
str
|
The ID of the file to get chunks from. |
required |
page
|
int
|
The page number (0-indexed). Defaults to 0. |
0
|
size
|
int
|
The number of chunks per page. Defaults to 20. |
20
|
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with chunks list, total count, and pagination info. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
index(elements, **kwargs)
Index elements into the LightRAG system and create graph relationships.
This method extracts text and chunk IDs from the provided elements, inserts them into the LightRAG system, and creates a graph structure connecting files to chunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
elements
|
list[dict[str, Any]]
|
List of Element objects containing text and metadata. Each element should have a metadata attribute with a chunk_id and a file_id. |
required |
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
index_chunk(element, **kwargs)
Index a single chunk.
This method only indexes the chunk. It does NOT update the metadata of neighboring chunks (previous_chunk/next_chunk). The caller is responsible for maintaining chunk relationships by updating adjacent chunks' metadata separately.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
element
|
dict[str, Any]
|
The chunk to be indexed. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status, error message, and chunk_id. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
index_file_chunks(elements, file_id, **kwargs)
Index chunks for a specific file.
This method indexes chunks for a file. The indexer is responsible for deleting any existing chunks for the file_id before indexing the new chunks to ensure consistency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
elements
|
list[dict[str, Any]]
|
The chunks to be indexed. |
required |
file_id
|
str
|
The ID of the file these chunks belong to. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status, error message, and total count. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
resolve_entities()
Resolve entities from the graph.
Currently, this method does nothing. Resolve entities has been implicitly implemented in the LightRAG instance.
update_chunk(element, **kwargs)
Update a chunk by chunk ID.
This method updates both the text content and metadata of a chunk. When text content is updated, the chunk should be re-processed through data generators and re-indexed with updated vector embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
element
|
dict[str, Any]
|
The updated chunk data. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status, error message, and chunk_id. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
update_chunk_metadata(chunk_id, file_id, metadata, **kwargs)
Update metadata for a specific chunk.
This method patches new metadata into the existing chunk metadata. Existing metadata fields will be overwritten, and new fields will be added. System-managed metadata fields (file_id, chunk_id, etc.) should be preserved and not overwritten.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_id
|
str
|
The ID of the chunk to update. |
required |
file_id
|
str
|
The ID of the file the chunk belongs to. |
required |
metadata
|
dict[str, Any]
|
The metadata fields to update. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status and error message. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
LlamaIndexGraphRAGIndexer(graph_store, llama_index_llm=None, allowed_entity_types=None, allowed_relation_types=None, kg_validation_schema=None, strict_mode=False, kg_extractors=None, embed_model=None, vector_store=None, max_triplets_per_chunk=10, num_workers=4, **kwargs)
Bases: BaseGraphRAGIndexer
Indexer for graph RAG using LlamaIndex.
Attributes:
| Name | Type | Description |
|---|---|---|
_index |
PropertyGraphIndex
|
Property graph index. |
_graph_store |
LlamaIndexGraphRAGDataStore
|
Storage for property graph. |
_strict_mode |
bool
|
Whether strict schema validation is enabled. |
Initialize the LlamaIndexGraphRAGIndexer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph_store
|
LlamaIndexGraphRAGDataStore
|
Storage for property graph. |
required |
llama_index_llm
|
BaseLLM | None
|
Language model for LlamaIndex. Defaults to None. |
None
|
allowed_entity_types
|
list[str] | None
|
List of allowed entity types. When strict_mode=True, only these types are extracted. When strict_mode=False, serves as hints. Defaults to None. |
None
|
allowed_relation_types
|
list[str] | None
|
List of allowed relationship types. Behavior depends on strict_mode. Defaults to None. |
None
|
kg_validation_schema
|
dict[str, list[str]] | None
|
Validation schema for strict mode. Maps entity types to their allowed outgoing relationship types. Format: {"ENTITY_TYPE": ["ALLOWED_REL1", "ALLOWED_REL2"], ...} Example: {"PERSON": ["WORKS_AT", "FOUNDED"], "ORGANIZATION": ["LOCATED_IN"]} Defaults to None. |
None
|
strict_mode
|
bool
|
If True, uses SchemaLLMPathExtractor with strict validation. If False (default), uses DynamicLLMPathExtractor with optional guidance. Defaults to False. |
False
|
kg_extractors
|
list[TransformComponent] | None
|
Custom list of extractors. If provided, overrides automatic extractor selection based on strict_mode. Defaults to None. |
None
|
embed_model
|
BaseEmbedding | None
|
Embedding model for vector representations. Defaults to None. |
None
|
vector_store
|
BasePydanticVectorStore | None
|
Storage for vector data. Defaults to None. |
None
|
max_triplets_per_chunk
|
int
|
Maximum triplets to extract per chunk. Defaults to 10. |
10
|
num_workers
|
int
|
Number of parallel workers. Defaults to 4. |
4
|
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
delete(**kwargs)
Delete elements from the knowledge graph.
This method deletes elements from the knowledge graph based on the provided document_id.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If document_id is not provided. |
Exception
|
If an error occurs during deletion. |
delete_chunk(chunk_id, file_id, **kwargs)
Delete a single chunk by chunk ID and file ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_id
|
str
|
The ID of the chunk to delete. |
required |
file_id
|
str
|
The ID of the file the chunk belongs to. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status and error message. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
delete_file_chunks(file_id, **kwargs)
Delete all chunks for a specific file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_id
|
str
|
The ID of the file whose chunks should be deleted. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status and error message. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
get_chunk(chunk_id, file_id, **kwargs)
Get a single chunk by chunk ID and file ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_id
|
str
|
The ID of the chunk to retrieve. |
required |
file_id
|
str
|
The ID of the file the chunk belongs to. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
dict[str, Any] | None: The chunk data, or None if not found. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
get_file_chunks(file_id, page=0, size=20, **kwargs)
Get chunks for a specific file with pagination support.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_id
|
str
|
The ID of the file to get chunks from. |
required |
page
|
int
|
The page number (0-indexed). Defaults to 0. |
0
|
size
|
int
|
The number of chunks per page. Defaults to 20. |
20
|
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with chunks list, total count, and pagination info. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
index(elements, **kwargs)
Index elements into the graph.
This method indexes elements into the graph.
Notes: - Currently only Neo4jPropertyGraphStore that is supported for indexing the metadata from the TextNode. - The 'document_id' parameter is used to specify the document ID for the elements. - The 'chunk_id' parameter is used to specify the chunk ID for the elements.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
elements
|
list[Element] | list[dict[str, Any]]
|
List of elements or list of dictionaries representing elements to be indexed. |
required |
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
index_chunk(element, **kwargs)
Index a single chunk.
This method only indexes the chunk. It does NOT update the metadata of neighboring chunks (previous_chunk/next_chunk). The caller is responsible for maintaining chunk relationships by updating adjacent chunks' metadata separately.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
element
|
dict[str, Any]
|
The chunk to be indexed. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status, error message, and chunk_id. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
index_file_chunks(elements, file_id, **kwargs)
Index chunks for a specific file.
This method indexes chunks for a file. The indexer is responsible for deleting any existing chunks for the file_id before indexing the new chunks to ensure consistency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
elements
|
list[dict[str, Any]]
|
The chunks to be indexed. |
required |
file_id
|
str
|
The ID of the file these chunks belong to. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status, error message, and total count. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
resolve_entities()
Resolve entities in the graph.
Currently, this method does nothing.
update_chunk(element, **kwargs)
Update a chunk by chunk ID.
This method updates both the text content and metadata of a chunk. When text content is updated, the chunk should be re-processed through data generators and re-indexed with updated vector embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
element
|
dict[str, Any]
|
The updated chunk data. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status, error message, and chunk_id. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
update_chunk_metadata(chunk_id, file_id, metadata, **kwargs)
Update metadata for a specific chunk.
This method patches new metadata into the existing chunk metadata. Existing metadata fields will be overwritten, and new fields will be added. System-managed metadata fields (file_id, chunk_id, etc.) should be preserved and not overwritten.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_id
|
str
|
The ID of the chunk to update. |
required |
file_id
|
str
|
The ID of the file the chunk belongs to. |
required |
metadata
|
dict[str, Any]
|
The metadata fields to update. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status and error message. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |