Retriever
Defines an abstract base class to create a retriever.
This module provides the BaseRetriever class, which serves as a foundation for implementing retrieval systems in Gen AI applications using the BaseDataStore from gllm-datastore.
References
None
BaseRetriever(data_store)
Bases: Component, Generic[T]
An abstract base class for retrievers using BaseDataStore.
This class defines the interface for retriever components, which are responsible for retrieving relevant documents or information based on a given query. The retriever utilizes BaseDataStore from gllm-datastore for data access.
The type parameter T defines the return type of the retrieve method, allowing subclasses to specify their own return types (e.g., list[Chunk] for vector retrievers, pd.DataFrame for SQL retrievers).
Attributes:
| Name | Type | Description |
|---|---|---|
data_store |
BaseDataStore
|
The data store to be used for retrieval operations. |
Initialize the BaseRetriever.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_store
|
BaseDataStore
|
The data store to be used for retrieval operations. Must be an instance of BaseDataStore from gllm-datastore. |
required |
Raises:
| Type | Description |
|---|---|
TypeError
|
If data_store is not an instance of BaseDataStore. |
retrieve(query, query_filter=None, **kwargs)
abstractmethod
async
retrieve(query: str, query_filter: FilterClause | QueryFilter | None = None, **kwargs: Any) -> T
retrieve(query: list[str], query_filter: FilterClause | QueryFilter | None = None, **kwargs: Any) -> list[T]
Retrieve data based on the query.
This method should be implemented by subclasses to define the specific retrieval logic. The return type is determined by the type parameter T.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str | list[str]
|
The query string or list of query strings to retrieve data. If a list is provided, retrieval is performed for each query and results are returned as a list of T. |
required |
query_filter
|
FilterClause | QueryFilter | None
|
Filter criteria for the retrieval. Can be a single FilterClause or a composite QueryFilter. Defaults to None. |
None
|
**kwargs
|
Any
|
Additional parameters for the retrieval process. Common parameters include: 1. top_k (int): Maximum number of documents to retrieve. 2. threshold (float): Minimum score threshold for filtering results. 3. timeout (float): Maximum time in seconds to wait for retrieval. |
{}
|
Returns:
| Type | Description |
|---|---|
T | list[T]
|
T | list[T]: Retrieved data. Returns T for single query, list[T] for batch queries. The type depends on subclass implementation: 1. list[Chunk] for vector-based retrievers. 2. pd.DataFrame for SQL-based retrievers. 3. Other types as defined by specific implementations. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If the method is not implemented by subclass. |