Skip to content

Vector

Elasticsearch implementation of vector search and CRUD capability.

Authors

Kadek Denaya (kadek.d.r.diana@gdplabs.id)

References

NONE

ElasticsearchVectorCapability(index_name, client, em_invoker, query_field='text', vector_query_field='vector', retrieval_strategy=None, distance_strategy=None)

Elasticsearch implementation of VectorCapability protocol.

This class provides document CRUD operations and vector search using Elasticsearch.

Attributes:

Name Type Description
index_name str

The name of the Elasticsearch index.

vector_store AsyncElasticsearchStore

The vector store instance.

em_invoker BaseEMInvoker

The embedding model to perform vectorization.

Initialize the Elasticsearch vector capability.

Parameters:

Name Type Description Default
index_name str

The name of the Elasticsearch index.

required
client AsyncElasticsearch

The Elasticsearch client.

required
em_invoker BaseEMInvoker

The embedding model to perform vectorization.

required
query_field str

The field name for text queries. Defaults to "text".

'text'
vector_query_field str

The field name for vector queries. Defaults to "vector".

'vector'
retrieval_strategy AsyncRetrievalStrategy | None

The retrieval strategy for retrieval. Defaults to None, in which case DenseVectorStrategy() is used.

None
distance_strategy str | None

The distance strategy for retrieval. Defaults to None.

None

em_invoker: BaseEMInvoker property

Returns the EM Invoker instance.

Returns:

Name Type Description
BaseEMInvoker BaseEMInvoker

The EM Invoker instance.

clear(**kwargs) async

Clear all records from the datastore.

Parameters:

Name Type Description Default
**kwargs Any

Datastore-specific parameters.

{}

create(data, **kwargs) async

Create new records in the datastore.

Parameters:

Name Type Description Default
data Chunk | list[Chunk]

Data to create (single item or collection).

required
**kwargs Any

Datastore-specific parameters.

{}

Raises:

Type Description
ValueError

If data structure is invalid.

create_from_vector(chunk_vectors, **kwargs) async

Add pre-computed embeddings directly.

Parameters:

Name Type Description Default
chunk_vectors list[tuple[Chunk, Vector]]

List of tuples containing chunks and their corresponding vectors.

required
**kwargs

Datastore-specific parameters.

{}

Returns:

Type Description
list[str]

list[str]: List of IDs assigned to added embeddings.

delete(filters=None, **kwargs) async

Delete records from the data store based on filters.

Parameters:

Name Type Description Default
filters FilterClause | QueryFilter | None

Filters to select records for deletion. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.

None
**kwargs Any

Datastore-specific parameters.

{}

delete_by_id(id, **kwargs) async

Delete records from the data store based on IDs.

Parameters:

Name Type Description Default
id str | list[str]

ID or list of IDs to delete.

required
**kwargs Any

Datastore-specific parameters.

{}

retrieve(query, filters=None, options=None, **kwargs) async

Semantic search using text query converted to vector.

Usage Example
from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage
await vector_capability.retrieve(
    query="What is the capital of France?",
    filters=F.eq("metadata.category", "tech"),
    options=QueryOptions(limit=10),
)

# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await vector_capability.retrieve(query="What is the capital of France?", filters=filters)

Parameters:

Name Type Description Default
query str

Text query to embed and search for.

required
filters FilterClause | QueryFilter | None

Filters to apply to the search. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.

None
options QueryOptions | None

Options to apply to the search. Defaults to None.

None
**kwargs Any

Datastore-specific parameters.

{}

Returns:

Type Description
list[Chunk]

list[Chunk]: List of chunks ordered by relevance score.

retrieve_by_vector(vector, filters=None, options=None) async

Direct vector similarity search.

Usage Example
from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage
await vector_capability.retrieve_by_vector(
    vector=[0.1, 0.2, 0.3],
    filters=F.eq("metadata.category", "tech"),
    options=QueryOptions(limit=10),
)

# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await vector_capability.retrieve_by_vector(vector=[0.1, 0.2, 0.3], filters=filters)

Parameters:

Name Type Description Default
vector Vector

Query embedding vector.

required
filters FilterClause | QueryFilter | None

Filters to apply to the search. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.

None
options QueryOptions | None

Options to apply to the search. Defaults to None.

None

Returns:

Type Description
list[Chunk]

list[Chunk]: List of chunks ordered by similarity score.

update(update_values, filters=None, **kwargs) async

Update existing records in the datastore.

Parameters:

Name Type Description Default
update_values dict

Values to update.

required
filters FilterClause | QueryFilter | None

Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.

None
**kwargs Any

Datastore-specific parameters.

{}