Vector
Elasticsearch implementation of vector search and CRUD capability.
References
NONE
ElasticsearchVectorCapability(index_name, client, em_invoker, query_field='text', vector_query_field='vector', retrieval_strategy=None, distance_strategy=None, encryption=None)
Elasticsearch implementation of VectorCapability protocol.
This class provides document CRUD operations and vector search using Elasticsearch.
Attributes:
| Name | Type | Description |
|---|---|---|
index_name |
str
|
The name of the Elasticsearch index. |
vector_store |
AsyncElasticsearchStore
|
The vector store instance. |
em_invoker |
BaseEMInvoker
|
The embedding model to perform vectorization. |
Initialize the Elasticsearch vector capability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index_name
|
str
|
The name of the Elasticsearch index. |
required |
client
|
AsyncElasticsearch
|
The Elasticsearch client. |
required |
em_invoker
|
BaseEMInvoker
|
The embedding model to perform vectorization. |
required |
query_field
|
str
|
The field name for text queries. Defaults to "text". |
'text'
|
vector_query_field
|
str
|
The field name for vector queries. Defaults to "vector". |
'vector'
|
retrieval_strategy
|
AsyncRetrievalStrategy | None
|
The retrieval strategy for retrieval. Defaults to None, in which case DenseVectorStrategy() is used. |
None
|
distance_strategy
|
str | None
|
The distance strategy for retrieval. Defaults to None. |
None
|
encryption
|
ElasticsearchEncryptionCapability | None
|
Encryption capability for field-level encryption. Defaults to None. |
None
|
em_invoker
property
Returns the EM Invoker instance.
Returns:
| Name | Type | Description |
|---|---|---|
BaseEMInvoker |
BaseEMInvoker
|
The EM Invoker instance. |
clear(**kwargs)
async
Clear all records from the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
create(data, **kwargs)
async
Create new records in the datastore.
This method will automatically encrypt the content and metadata of the chunks if encryption is enabled following the encryption configuration. When encryption is enabled, embeddings are generated from plaintext first, then chunks are encrypted, ensuring that embeddings represent the original content rather than encrypted ciphertext.
Examples:
await vector_capability.create([
Chunk(content="Test content 1", metadata={"source": "test"}),
Chunk(content="Test content 2", metadata={"source": "test"}),
])
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Chunk | list[Chunk]
|
Data to create (single item or collection). |
required |
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If data structure is invalid. |
create_from_vector(chunk_vectors, **kwargs)
async
Add pre-computed embeddings directly.
This method will automatically encrypt the content and metadata of the chunks if encryption is enabled following the encryption configuration.
Examples:
await vector_capability.create_from_vector([
(Chunk(content="Test content 1", metadata={"source": "test"}), [0.1, 0.2, 0.3]),
(Chunk(content="Test content 2", metadata={"source": "test"}), [0.4, 0.5, 0.6]),
])
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_vectors
|
list[tuple[Chunk, Vector]]
|
List of tuples containing chunks and their corresponding vectors. |
required |
**kwargs
|
Datastore-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: List of IDs assigned to added embeddings. |
delete(filters=None, **kwargs)
async
Delete records from the data store based on filters.
Warning
Filters cannot target encrypted fields. If you try to delete documents based on an encrypted metadata field (e.g., filters=F.eq("metadata.secret", "val")), the filter will fail to match because the filter value is not encrypted but the stored data is. Always use non-encrypted fields (like 'id') in filters when working with encrypted data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters
|
FilterClause | QueryFilter | None
|
Filters to select records for deletion. FilterClause objects are automatically converted to QueryFilter internally. Cannot use encrypted fields in filters. Defaults to None. |
None
|
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
delete_by_id(id, **kwargs)
async
Delete records from the data store based on IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str | list[str]
|
ID or list of IDs to delete. |
required |
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
ensure_index(mapping=None, index_settings=None, dimension=None, distance_strategy=None)
async
Ensure Elasticsearch index exists, creating it if necessary.
This method is idempotent - if the index already exists, it will skip creation and return early.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mapping
|
dict[str, Any] | None
|
Custom mapping dictionary to use for index creation. If provided, this mapping will be used directly. The mapping should follow Elasticsearch mapping format. Defaults to None, in which default mapping will be used. |
None
|
index_settings
|
dict[str, Any] | None
|
Custom index settings. These settings will be merged with any default settings. Defaults to None. |
None
|
dimension
|
int | None
|
Vector dimension. If not provided and mapping is not provided, will be inferred from em_invoker by generating a test embedding. |
None
|
distance_strategy
|
str | None
|
Distance strategy for vector similarity. Supported values: "cosine", "l2_norm", "dot_product", etc. Only used when building default mapping. Defaults to "cosine" if not specified. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If mapping is invalid or required parameters are missing. |
RuntimeError
|
If index creation fails. |
retrieve(query, filters=None, options=None, **kwargs)
async
Semantic search using text query converted to vector.
This method will automatically decrypt the content and metadata of the chunks if encryption is enabled following the encryption configuration.
Warning
Filters cannot target encrypted fields. If you try to filter by an encrypted metadata
field (e.g., filters=F.eq("metadata.secret", "val")), the filter will fail to match
because the filter value is not encrypted but the stored data is. Always use non-encrypted
fields in filters when working with encrypted data.
Examples:
from gllm_datastore.core.filters import filter as F
# Direct FilterClause usage - using non-encrypted field
await vector_capability.retrieve(
query="What is the capital of France?",
filters=F.eq("id", "document_id"),
options=QueryOptions(limit=10),
)
# Multiple filters - using non-encrypted fields
filters = F.and_(F.eq("id", "doc1"), F.eq("id", "doc2"))
await vector_capability.retrieve(query="What is the capital of France?", filters=filters)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Text query to embed and search for. |
required |
filters
|
FilterClause | QueryFilter | None
|
Filters to apply to the search. FilterClause objects are automatically converted to QueryFilter internally. Cannot use encrypted fields in filters. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Options to apply to the search. Defaults to None. |
None
|
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: List of chunks ordered by relevance score. |
retrieve_by_vector(vector, filters=None, options=None)
async
Direct vector similarity search.
Warning
Filters cannot target encrypted fields. If you try to filter by an encrypted metadata
field (e.g., filters=F.eq("metadata.secret", "val")), the filter will fail to match
because the filter value is not encrypted but the stored data is. Always use non-encrypted
fields in filters when working with encrypted data.
Usage Example
from gllm_datastore.core.filters import filter as F
# Direct FilterClause usage - using non-encrypted field
await vector_capability.retrieve_by_vector(
vector=[0.1, 0.2, 0.3],
filters=F.eq("id", "document_id"),
options=QueryOptions(limit=10),
)
# Multiple filters - using non-encrypted fields
filters = F.and_(F.eq("id", "doc1"), F.eq("id", "doc2"))
await vector_capability.retrieve_by_vector(vector=[0.1, 0.2, 0.3], filters=filters)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vector
|
Vector
|
Query embedding vector. |
required |
filters
|
FilterClause | QueryFilter | None
|
Filters to apply to the search. FilterClause objects are automatically converted to QueryFilter internally. Cannot use encrypted fields in filters. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Options to apply to the search. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: List of chunks ordered by similarity score. |
update(update_values, filters=None, **kwargs)
async
Update existing records in the datastore.
This method will automatically encrypt the content and metadata in update_values if encryption is enabled following the encryption configuration.
Warning
Filters cannot target encrypted fields. While update_values are encrypted before
being written, the filters used to identify which documents to update are NOT encrypted.
If you try to update documents based on an encrypted metadata field (e.g.,
filters=F.eq("metadata.secret", "val")), the filter will fail to match because
the filter value is not encrypted but the stored data is. Always use non-encrypted
fields (like 'id') in filters when working with encrypted data.
Examples:
from gllm_datastore.core.filters import filter as F
# Update content - using non-encrypted field for filter
await vector_capability.update(
update_values={"content": "new_content"},
filters=F.eq("id", "unique_id"),
)
# Update metadata - using non-encrypted field for filter
await vector_capability.update(
update_values={"metadata": {"status": "published"}},
filters=F.eq("id", "unique_id"),
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_values
|
dict
|
Values to update. |
required |
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Cannot use encrypted fields in filters. Defaults to None. |
None
|
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|