Vector
Elasticsearch implementation of vector search and CRUD capability.
References
NONE
ElasticsearchVectorCapability(index_name, client, em_invoker, query_field='text', vector_query_field='vector', retrieval_strategy=None, distance_strategy=None)
Elasticsearch implementation of VectorCapability protocol.
This class provides document CRUD operations and vector search using Elasticsearch.
Attributes:
| Name | Type | Description |
|---|---|---|
index_name |
str
|
The name of the Elasticsearch index. |
vector_store |
AsyncElasticsearchStore
|
The vector store instance. |
em_invoker |
BaseEMInvoker
|
The embedding model to perform vectorization. |
Initialize the Elasticsearch vector capability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index_name |
str
|
The name of the Elasticsearch index. |
required |
client |
AsyncElasticsearch
|
The Elasticsearch client. |
required |
em_invoker |
BaseEMInvoker
|
The embedding model to perform vectorization. |
required |
query_field |
str
|
The field name for text queries. Defaults to "text". |
'text'
|
vector_query_field |
str
|
The field name for vector queries. Defaults to "vector". |
'vector'
|
retrieval_strategy |
AsyncRetrievalStrategy | None
|
The retrieval strategy for retrieval. Defaults to None, in which case DenseVectorStrategy() is used. |
None
|
distance_strategy |
str | None
|
The distance strategy for retrieval. Defaults to None. |
None
|
em_invoker: BaseEMInvoker
property
Returns the EM Invoker instance.
Returns:
| Name | Type | Description |
|---|---|---|
BaseEMInvoker |
BaseEMInvoker
|
The EM Invoker instance. |
clear(**kwargs)
async
Clear all records from the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs |
Any
|
Datastore-specific parameters. |
{}
|
create(data, **kwargs)
async
create_from_vector(chunk_vectors, **kwargs)
async
Add pre-computed embeddings directly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_vectors |
list[tuple[Chunk, Vector]]
|
List of tuples containing chunks and their corresponding vectors. |
required |
**kwargs |
Datastore-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: List of IDs assigned to added embeddings. |
delete(filters=None, **kwargs)
async
Delete records from the data store based on filters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters |
FilterClause | QueryFilter | None
|
Filters to select records for deletion. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
**kwargs |
Any
|
Datastore-specific parameters. |
{}
|
delete_by_id(id, **kwargs)
async
Delete records from the data store based on IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id |
str | list[str]
|
ID or list of IDs to delete. |
required |
**kwargs |
Any
|
Datastore-specific parameters. |
{}
|
retrieve(query, filters=None, options=None, **kwargs)
async
Semantic search using text query converted to vector.
Usage Example
from gllm_datastore.core.filters import filter as F
# Direct FilterClause usage
await vector_capability.retrieve(
query="What is the capital of France?",
filters=F.eq("metadata.category", "tech"),
options=QueryOptions(limit=10),
)
# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await vector_capability.retrieve(query="What is the capital of France?", filters=filters)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
str
|
Text query to embed and search for. |
required |
filters |
FilterClause | QueryFilter | None
|
Filters to apply to the search. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options |
QueryOptions | None
|
Options to apply to the search. Defaults to None. |
None
|
**kwargs |
Any
|
Datastore-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: List of chunks ordered by relevance score. |
retrieve_by_vector(vector, filters=None, options=None)
async
Direct vector similarity search.
Usage Example
from gllm_datastore.core.filters import filter as F
# Direct FilterClause usage
await vector_capability.retrieve_by_vector(
vector=[0.1, 0.2, 0.3],
filters=F.eq("metadata.category", "tech"),
options=QueryOptions(limit=10),
)
# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await vector_capability.retrieve_by_vector(vector=[0.1, 0.2, 0.3], filters=filters)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vector |
Vector
|
Query embedding vector. |
required |
filters |
FilterClause | QueryFilter | None
|
Filters to apply to the search. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options |
QueryOptions | None
|
Options to apply to the search. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: List of chunks ordered by similarity score. |
update(update_values, filters=None, **kwargs)
async
Update existing records in the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_values |
dict
|
Values to update. |
required |
filters |
FilterClause | QueryFilter | None
|
Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
**kwargs |
Any
|
Datastore-specific parameters. |
{}
|