Skip to content

Hierarchical retriever

Hierarchical retriever module for N-level coarse-to-fine retrieval.

This module provides a general HierarchicalRetriever that supports N-level coarse-to-fine retrieval across multiple levels (e.g., corpus -> doc -> section -> chunk).

Each level can be constrained by the IDs returned by earlier levels, enabling progressive refinement of retrieval results.

ConstraintMode

Bases: StrEnum

Constraint mode for hierarchical retrieval levels.

HierarchicalRetriever(config)

Bases: BaseRetriever[list[Chunk]]

A retriever that performs N-level coarse-to-fine hierarchical retrieval.

The HierarchicalRetriever executes a sequence of retrieval levels, where each level can be constrained by the IDs returned from previous levels. This enables progressive refinement from coarse-grained to fine-grained retrieval (e.g., corpus -> document -> chunk).

Algorithm
  1. Initialize results_by_level dictionary
  2. For each level in order: a. Determine constraint IDs based on constrain_by mode b. If constrained and no IDs available, short-circuit return [] c. Build filters with constraint IDs d. Retrieve chunks using level retriever e. Apply score threshold if set f. Stable sort by (-score, id) g. Store results in results_by_level h. Log level execution details
  3. Select output level (configured or last level)
  4. Return top final_top_k results with stable sort

Examples:

from gllm_retrieval.retriever.hierarchical_retriever import (
    HierarchicalRetriever,
    HierarchicalRetrieverConfig,
    LevelConfig,
)

config = HierarchicalRetrieverConfig(
    levels=[
        LevelConfig(
            name="document",
            retriever=doc_retriever,
            top_k=20,
            filter_key="doc_id",
            constrain_by=None,
        ),
        LevelConfig(
            name="chunk",
            retriever=chunk_retriever,
            top_k=50,
            filter_key="parent_doc_id",
            constrain_by="previous",
            score_threshold=0.7,
        ),
    ],
    output_level="chunk",
    final_top_k=10,
)

retriever = HierarchicalRetriever(config=config)
results = await retriever.retrieve("search query")
# results: [Chunk(...), Chunk(...), ...]

Attributes:

Name Type Description
config HierarchicalRetrieverConfig

The configuration for hierarchical retrieval.

Initialize the HierarchicalRetriever with a configuration.

Parameters:

Name Type Description Default
config HierarchicalRetrieverConfig

The hierarchical retrieval configuration.

required

Raises:

Type Description
TypeError

If config is not a HierarchicalRetrieverConfig instance.

retrieve(query, query_filter=None, **kwargs) async

Retrieve documents using hierarchical retrieval.

Parameters:

Name Type Description Default
query str | list[str]

The query string or list of query strings.

required
query_filter FilterClause | QueryFilter | None

Base filter for all levels. This filter is combined with level-specific constraint filters. Defaults to None.

None
**kwargs Any

Additional parameters passed to level retrievers.

{}

Returns:

Type Description
list[Chunk] | list[list[Chunk]]

list[Chunk] | list[list[Chunk]]: Retrieved chunks. Returns list[Chunk] for single query, list[list[Chunk]] for batch queries.

HierarchicalRetrieverConfig

Bases: BaseModel

Configuration for the HierarchicalRetriever.

Attributes:

Name Type Description
levels list[LevelConfig]

List of level configurations in order of execution.

output_level str | None

Name of the level to output results from. If None, outputs from the last level.

final_top_k int | None

Maximum number of final results to return. If None, returns all results from the output level.

validate_config()

Validate the entire configuration.

Returns:

Name Type Description
HierarchicalRetrieverConfig HierarchicalRetrieverConfig

The validated configuration.

Raises:

Type Description
ValueError

If level names are not unique.

ValueError

If output_level is not found in level names.

LevelConfig

Bases: BaseModel

Configuration for a single retrieval level in the hierarchy.

Attributes:

Name Type Description
name str

Unique name for this level (e.g., "corpus", "document", "chunk").

retriever BaseRetriever[list[Chunk]]

The retriever to use for this level.

top_k int

Maximum number of results to retrieve at this level.

filter_key str

The metadata field name used to filter by parent IDs.

constrain_by ConstraintMode | None

How to constrain this level by prior levels. None means no constraint (typically for the first level).

score_threshold float | None

Minimum score threshold for filtering results. If None, no threshold filtering is applied.

validate_constrain_by(value) classmethod

Validate and convert constrain_by to ConstraintMode.

Parameters:

Name Type Description Default
value None | ConstraintMode | str

The value to validate and convert.

required

Returns:

Type Description
ConstraintMode | None

ConstraintMode | None: The validated and converted value.

Raises:

Type Description
ValueError

If the string value is not a valid ConstraintMode.