Vector

In-memory implementation of vector similarity search capability.

This module provides an in-memory implementation of the VectorCapability protocol using dictionary-based storage optimized for development and testing scenarios.

`InMemoryVectorCapability(em_invoker, store=None)`

In-memory implementation of VectorCapability protocol.

This class provides vector similarity search operations using pure Python data structures optimized for development and testing.

Attributes:

Name	Type	Description
`store`	`dict[str, Chunk]`	Dictionary storing Chunk objects with their IDs as keys.
`em_invoker`	`BaseEMInvoker`	The embedding model to perform vectorization.

Initialize the in-memory vector capability.

Parameters:

Name	Type	Description	Default
`em_invoker`	`BaseEMInvoker`	em_invoker model for text-to-vector conversion.	required
`store`	`dict[str, Any] \| None`	Dictionary storing Chunk objects with their IDs as keys. Defaults to None.	`None`

`em_invoker` `property`

Returns the EM Invoker instance.

Returns:

Name	Type	Description
`BaseEMInvoker`	`BaseEMInvoker`	The EM Invoker instance.

`clear()` `async`

Clear all vectors from the store.

`create(data)` `async`

Add chunks to the vector store with automatic embedding generation.

Parameters:

Name	Type	Description	Default
`data`	`Chunk \| list[Chunk]`	Single chunk or list of chunks to add.	required

`create_from_vector(chunk_vectors)` `async`

Add pre-computed vectors directly.

Parameters:

Name	Type	Description	Default
`chunk_vectors`	`list[tuple[Chunk, Vector]]`	List of tuples containing chunks and their corresponding vectors.	required

`delete(filters=None)` `async`

Delete records from the datastore.

Usage Example

from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage
await vector_capability.delete(filters=F.eq("metadata.category", "AI"))

# Multiple filters
await vector_capability.delete(
    filters=F.and_(F.eq("metadata.category", "AI"), F.eq("metadata.status", "published")),
)

This will delete all chunks from the vector store that match the filters.

Parameters:

Name	Type	Description	Default
`filters`	`FilterClause \| QueryFilter \| None`	Filters to select records to delete. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op).	`None`

`ensure_index()` `async`

Ensure in-memory vector store exists, initializing it if necessary.

This method is idempotent - if the store already exists, it will skip initialization and return early.

`retrieve(query, filters=None, options=None)` `async`

Read records from the datastore using text-based similarity search with optional filtering.

Usage Example

from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage
await vector_capability.retrieve(
    query="What is the capital of France?",
    filters=F.eq("metadata.category", "tech"),
    options=QueryOptions(limit=2),
)

# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await vector_capability.retrieve(
    query="What is the capital of France?",
    filters=filters,
    options=QueryOptions(limit=2),
)

This will retrieve the top 2 chunks by similarity score from the vector store that match the query and the filters. The chunks will be sorted by score in descending order.

Parameters:

Name	Type	Description	Default
`query`	`str`	Input text to embed and search with.	required
`filters`	`FilterClause \| QueryFilter \| None`	Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.	`None`
`options`	`QueryOptions \| None`	Query options like limit and sorting. Defaults to None, in which case, no sorting is applied and top 10 chunks are returned.	`None`

Returns:

Type	Description
`list[Chunk]`	list[Chunk]: Top ranked chunks by similarity score.

`retrieve_by_vector(vector, filters=None, options=None)` `async`

Direct vector similarity search.

Parameters:

Name	Type	Description	Default
`vector`	`Vector`	Query embedding vector.	required
`filters`	`FilterClause \| QueryFilter \| None`	Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.	`None`
`options`	`QueryOptions \| None`	Query options like limit and sorting. Defaults to None, in which case, no sorting is applied and top 10 chunks are returned.	`None`

Returns:

Type	Description
`list[Chunk]`	list[Chunk]: List of chunks ordered by similarity score.

`update(update_values, filters=None, **kwargs)` `async`

Update existing records in the datastore.

Examples:

Update certain metadata of a chunk with specific filters.

from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage
await vector_capability.update(
    update_values={"metadata": {"status": "published"}},
    filters=F.eq("metadata.category", "tech"),
)

# Multiple filters
await vector_capability.update(
    update_values={"metadata": {"status": "published"}},
    filters=F.and_(F.eq("metadata.status", "draft"), F.eq("metadata.category", "tech")),
)

Update certain content of a chunk with specific id. This will also regenerate the vector of the chunk.

# Direct FilterClause usage
await vector_capability.update(
    update_values={"content": "new_content"},
    filters=F.eq("id", "unique_id"),
)

# Multiple filters
await vector_capability.update(
    update_values={"content": "new_content"},
    filters=F.and_(F.eq("id", "unique_id"), F.eq("metadata.category", "tech")),
)

Parameters:

Name	Type	Description	Default
`update_values`	`dict[str, Any]`	Values to update.	required
`filters`	`FilterClause \| QueryFilter \| None`	Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op).	`None`
`**kwargs`	`Any`	Datastore-specific parameters.	`{}`

Raises:

Type	Description
`ValueError`	If content is empty.

Vector

InMemoryVectorCapability(em_invoker, store=None)

em_invoker property

clear() async

create(data) async

create_from_vector(chunk_vectors) async

delete(filters=None) async

ensure_index() async

retrieve(query, filters=None, options=None) async

retrieve_by_vector(vector, filters=None, options=None) async

update(update_values, filters=None, **kwargs) async

`InMemoryVectorCapability(em_invoker, store=None)`

`em_invoker` `property`

`clear()` `async`

`create(data)` `async`

`create_from_vector(chunk_vectors)` `async`

`delete(filters=None)` `async`

`ensure_index()` `async`

`retrieve(query, filters=None, options=None)` `async`

`retrieve_by_vector(vector, filters=None, options=None)` `async`

`update(update_values, filters=None, **kwargs)` `async`