Skip to content

Retriever

Defines an abstract base class to create a retriever.

This module provides the BaseRetriever class, which serves as a foundation for implementing retrieval systems in Gen AI applications using the BaseDataStore from gllm-datastore.

Authors

Kevin Yauris (kevin.yauris@gdplabs.id)

References

None

BaseRetriever(data_store)

Bases: Component, Generic[T]

An abstract base class for retrievers using BaseDataStore.

This class defines the interface for retriever components, which are responsible for retrieving relevant documents or information based on a given query. The retriever utilizes BaseDataStore from gllm-datastore for data access.

The type parameter T defines the return type of the retrieve method, allowing subclasses to specify their own return types (e.g., list[Chunk] for vector retrievers, pd.DataFrame for SQL retrievers).

Attributes:

Name Type Description
data_store BaseDataStore

The data store to be used for retrieval operations.

Initialize the BaseRetriever.

Parameters:

Name Type Description Default
data_store BaseDataStore

The data store to be used for retrieval operations. Must be an instance of BaseDataStore from gllm-datastore.

required

Raises:

Type Description
TypeError

If data_store is not an instance of BaseDataStore.

retrieve(query, query_filter=None, **kwargs) abstractmethod async

retrieve(query: str, query_filter: FilterClause | QueryFilter | None = None, **kwargs: Any) -> T
retrieve(query: list[str], query_filter: FilterClause | QueryFilter | None = None, **kwargs: Any) -> list[T]

Retrieve data based on the query.

This method should be implemented by subclasses to define the specific retrieval logic. The return type is determined by the type parameter T.

Parameters:

Name Type Description Default
query str | list[str]

The query string or list of query strings to retrieve data. If a list is provided, retrieval is performed for each query and results are returned as a list of T.

required
query_filter FilterClause | QueryFilter | None

Filter criteria for the retrieval. Can be a single FilterClause or a composite QueryFilter. Defaults to None.

None
**kwargs Any

Additional parameters for the retrieval process. Common parameters include: 1. top_k (int): Maximum number of documents to retrieve. 2. threshold (float): Minimum score threshold for filtering results. 3. timeout (float): Maximum time in seconds to wait for retrieval.

{}

Returns:

Type Description
T | list[T]

T | list[T]: Retrieved data. Returns T for single query, list[T] for batch queries. The type depends on subclass implementation: 1. list[Chunk] for vector-based retrievers. 2. pd.DataFrame for SQL-based retrievers. 3. Other types as defined by specific implementations.

Raises:

Type Description
NotImplementedError

If the method is not implemented by subclass.