Vector
In-memory implementation of vector similarity search capability.
This module provides an in-memory implementation of the VectorCapability protocol using dictionary-based storage optimized for development and testing scenarios.
References
NONE
InMemoryVectorCapability(em_invoker, store=None)
In-memory implementation of VectorCapability protocol.
This class provides vector similarity search operations using pure Python data structures optimized for development and testing.
Attributes:
| Name | Type | Description |
|---|---|---|
store |
dict[str, Chunk]
|
Dictionary storing Chunk objects with their IDs as keys. |
em_invoker |
BaseEMInvoker
|
The embedding model to perform vectorization. |
Initialize the in-memory vector capability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
em_invoker |
BaseEMInvoker
|
em_invoker model for text-to-vector conversion. |
required |
store |
dict[str, Any] | None
|
Dictionary storing Chunk objects with their IDs as keys. Defaults to None. |
None
|
em_invoker: BaseEMInvoker
property
Returns the EM Invoker instance.
Returns:
| Name | Type | Description |
|---|---|---|
BaseEMInvoker |
BaseEMInvoker
|
The EM Invoker instance. |
clear()
async
Clear all vectors from the store.
create(data)
async
create_from_vector(chunk_vectors)
async
Add pre-computed vectors directly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_vectors |
list[tuple[Chunk, Vector]]
|
List of tuples containing chunks and their corresponding vectors. |
required |
delete(filters=None)
async
Delete records from the datastore.
Usage Example
from gllm_datastore.core.filters import filter as F
# Direct FilterClause usage
await vector_capability.delete(filters=F.eq("metadata.category", "AI"))
# Multiple filters
await vector_capability.delete(
filters=F.and_(F.eq("metadata.category", "AI"), F.eq("metadata.status", "published")),
)
This will delete all chunks from the vector store that match the filters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters |
FilterClause | QueryFilter | None
|
Filters to select records to delete. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op). |
None
|
retrieve(query, filters=None, options=None)
async
Read records from the datastore using text-based similarity search with optional filtering.
Usage Example
from gllm_datastore.core.filters import filter as F
# Direct FilterClause usage
await vector_capability.retrieve(
query="What is the capital of France?",
filters=F.eq("metadata.category", "tech"),
options=QueryOptions(limit=2),
)
# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await vector_capability.retrieve(
query="What is the capital of France?",
filters=filters,
options=QueryOptions(limit=2),
)
This will retrieve the top 2 chunks by similarity score from the vector store that match the query and the filters. The chunks will be sorted by score in descending order.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
str
|
Input text to embed and search with. |
required |
filters |
FilterClause | QueryFilter | None
|
Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options |
QueryOptions | None
|
Query options like limit and sorting. Defaults to None, in which case, no sorting is applied and top 10 chunks are returned. |
None
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: Top ranked chunks by similarity score. |
retrieve_by_vector(vector, filters=None, options=None)
async
Direct vector similarity search.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vector |
Vector
|
Query embedding vector. |
required |
filters |
FilterClause | QueryFilter | None
|
Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options |
QueryOptions | None
|
Query options like limit and sorting. Defaults to None, in which case, no sorting is applied and top 10 chunks are returned. |
None
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: List of chunks ordered by similarity score. |
update(update_values, filters=None, **kwargs)
async
Update existing records in the datastore.
Example
- Update certain metadata of a chunk with specific filters.
from gllm_datastore.core.filters import filter as F
# Direct FilterClause usage
await vector_capability.update(
update_values={"metadata": {"status": "published"}},
filters=F.eq("metadata.category", "tech"),
)
# Multiple filters
await vector_capability.update(
update_values={"metadata": {"status": "published"}},
filters=F.and_(F.eq("metadata.status", "draft"), F.eq("metadata.category", "tech")),
)
- Update certain content of a chunk with specific id. This will also regenerate the vector of the chunk.
# Direct FilterClause usage
await vector_capability.update(
update_values={"content": "new_content"},
filters=F.eq("id", "unique_id"),
)
# Multiple filters
await vector_capability.update(
update_values={"content": "new_content"},
filters=F.and_(F.eq("id", "unique_id"), F.eq("metadata.category", "tech")),
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_values |
dict[str, Any]
|
Values to update. |
required |
filters |
FilterClause | QueryFilter | None
|
Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op). |
None
|
**kwargs |
Any
|
Datastore-specific parameters. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If content is empty. |