Skip to content

Retriever

Defines an abstract base class to create a retriever.

This module provides the BaseRetriever class, which serves as a foundation for implementing retrieval systems in Gen AI applications using the BaseDataStore from gllm-datastore.

BaseRetriever()

Bases: Component, Generic[T]

An abstract base class for retrievers.

This class defines the interface for retriever components, which are responsible for retrieving relevant documents or information based on a given query.

The type parameter T defines the return type for single query retrieval, allowing subclasses to specify their own return types: - list[Chunk] for vector/fulltext retrievers - pd.DataFrame for SQL retrievers - dict[str, Any] for graph retrievers - Any custom type for specialized retrievers

The base class provides orchestration for: - Single vs batch query handling - Concurrent batch processing - Common parameter management

Subclasses must implement the core retrieval logic in _retrieve_single.

Initialize the BaseRetriever.

Subclasses should override this to accept their specific dependencies (e.g., data_store, api_client, graph_db, etc.).

retrieve(query=None, query_filter=None, **kwargs) async

retrieve(query: str, query_filter: FilterClause | QueryFilter | None = None, **kwargs: Any) -> T
retrieve(query: list[str], query_filter: FilterClause | QueryFilter | None = None, **kwargs: Any) -> list[T]
retrieve(query: None = None, query_filter: FilterClause | QueryFilter | None = None, **kwargs: Any) -> T

Retrieve data based on the query.

This method dispatches to _retrieve_batch for list queries and _retrieve_single for single queries. Subclasses should implement _retrieve_single with their specific retrieval logic.

Parameters:

Name Type Description Default
query str | list[str]

The query string or list of query strings to retrieve data. If a list is provided, retrieval is performed for each query and results are returned as a list of T.

None
query_filter FilterClause | QueryFilter | None

Filter criteria for the retrieval. Can be a single FilterClause or a composite QueryFilter. Defaults to None.

None
**kwargs Any

Additional parameters for the retrieval process. Common parameters include: 1. top_k (int): Maximum number of documents to retrieve. 2. threshold (float): Minimum score threshold for filtering results. 3. timeout (float): Maximum time in seconds to wait for retrieval.

{}

Returns:

Type Description
T | list[T]

T | list[T]: Retrieved data. Returns T for single query, list[T] for batch queries. The type depends on subclass implementation: 1. list[Chunk] for vector-based retrievers. 2. pd.DataFrame for SQL-based retrievers. 3. Other types as defined by specific implementations.