Utils

Utility modules for gllm-retrieval.

`Fuseable`

Bases: Protocol

Protocol for objects that can be fused using rank fusion algorithms.

Objects must have an id attribute for deduplication and an optional score attribute that can be set with the fusion score.

`id` `property`

Unique identifier for deduplication.

`score` `property` `writable`

Optional score attribute that can be set.

`concat_fusion(chunk_lists, **kwargs)`

Concatenates lists in order, deduplicating by chunk ID.

Preserves the order of first appearance. If a chunk with the same ID appears in multiple lists, only the first occurrence is kept.

Parameters:

Name	Type	Description	Default
`chunk_lists`	`list[list[FuseableT]]`	A list of chunk lists to concatenate.	required
`**kwargs`	`Any`	Keyword arguments (unused, accepted for signature compatibility).	`{}`

Returns:

Type	Description
`list[FuseableT]`	list[FuseableT]: Deduplicated concatenated list of chunks.

`format_sql_query(query)`

Format the SQL query to ensure it is correctly structured.

Removes the code block markdown from the SQL query and trims any leading or trailing whitespace.

Parameters:

Name	Type	Description	Default
`query`	`str`	The SQL query output from the language model.	required

Returns:

Name	Type	Description
`str`	`str`	The formatted SQL query.

`passthrough_fusion(chunk_lists, **kwargs)`

Returns all retriever results as-is without any processing.

Returns the nested list structure unchanged, preserving the separation between each retriever's results.

Parameters:

Name	Type	Description	Default
`chunk_lists`	`list[list[FuseableT]]`	A list of chunk lists from each retriever.	required
`**kwargs`	`Any`	Keyword arguments (unused, accepted for signature compatibility).	`{}`

Returns:

Type	Description
`list[list[FuseableT]]`	list[list[FuseableT]]: The original nested list structure unchanged.

`resolve_fusion_fn(fn)`

Resolves fn to a callable.

If fn is a string, looks it up in FUSION_REGISTRY. If fn is already callable, returns it unchanged.

Custom fusion functions should accept

chunk_lists (list[list[FuseableT]]): Required first positional argument
**kwargs (Any): Optional keyword arguments (weights, rank_constant, etc.)

Parameters:

Name	Type	Description	Default
`fn`	`str \| FusionCallable`	Fusion function name or callable.	required

Returns:

Name	Type	Description
`FusionCallable`	`FusionCallable`	The resolved fusion callable.

Raises:

Type	Description
`ValueError`	If fn is a string not found in FUSION_REGISTRY.
`TypeError`	If fn is neither a string nor a callable.

`rrf_fusion(chunk_lists, **kwargs)`

Reciprocal Rank Fusion with weights.

Delegates to gllm_retrieval.utils.weighted_reciprocal_rank.

Parameters:

Name	Type	Description	Default
`chunk_lists`	`list[list[FuseableT]]`	A list of ranked chunk lists.	required
`**kwargs`	`Any`	Keyword arguments including: - weights (list[float]): Weights for each chunk list. Defaults to equal weights. - rank_constant (int): Constant for RRF calculation. Defaults to 60.	`{}`

Returns:

Type	Description
`list[FuseableT]`	list[FuseableT]: Fused and deduplicated list of chunks, sorted by RRF score.

`validate_query(query, dialect='postgres')`

Validates if the given string is an SQL statement using sqlglot.

Parameters:

Name	Type	Description	Default
`query`	`str`	The SQL query to be validated.	required
`dialect`	`str`	The SQL dialect to be used for validation. Defaults to "postgres".	`'postgres'`

Raises:

Type	Description
`ValueError`	If the query is not a valid SQL statement.

`weighted_reciprocal_rank(doc_lists, weights, rank_constant=60, set_scores=False)`

Perform weighted Reciprocal Rank Fusion on multiple rank lists.

This function implements the Weighted Reciprocal Rank Fusion (RRF) algorithm, which combines multiple ranked document lists into a single ranked list. RRF is particularly effective for combining results from different retrieval strategies (e.g., filtered search and semantic search, or multiple retrievers).

The RRF score for each document is calculated as: score = sum(weight_i / (rank_i + k)) for each list i where rank_i is the document's rank in list i (1-based), and k is the rank constant.

Examples:

from gllm_retrieval.utils.rank_fusion import weighted_reciprocal_rank

filtered_results = [chunk1, chunk2, chunk3]  # Ranked by entity filtering
semantic_results = [chunk2, chunk1, chunk4]  # Ranked by semantic similarity
fused = weighted_reciprocal_rank(
    [filtered_results, semantic_results],
    weights=[0.2, 0.8],
    rank_constant=60
)
# Returns chunks ordered by combined RRF scores

Parameters:

Name	Type	Description	Default
`doc_lists`	`list[list[FuseableT]]`	Ranked lists of Fuseable objects to merge. Must match the length of `weights`.	required
`weights`	`list[float]`	Weights for each rank list. Higher weights give more importance to that retrieval source.	required
`rank_constant`	`int`	The rank constant (k) that controls the influence of rank position. Higher values reduce the impact of rank differences. Defaults to 60.	`60`
`set_scores`	`bool`	If True, sets the `score` attribute on each document to its RRF score. Defaults to False.	`False`

Returns:

Type	Description
`list[FuseableT]`	list[FuseableT]: The final aggregated list of unique documents sorted by their weighted RRF scores in descending order. Documents with higher scores appear first.

Raises:

Type	Description
`ValueError`	If the number of rank lists doesn't match the weights count.

Note

Documents are deduplicated by their id field
The rank_constant parameter controls the influence of rank position
Higher rank_constant values reduce the impact of rank differences
The algorithm is commutative - order of doc_lists doesn't matter if weights are adjusted
Lists may overlap (same ID in multiple lists); duplicates are merged
Empty inner lists are handled gracefully

Utils

Fuseable

id property

score property writable

concat_fusion(chunk_lists, **kwargs)

format_sql_query(query)

passthrough_fusion(chunk_lists, **kwargs)

resolve_fusion_fn(fn)

rrf_fusion(chunk_lists, **kwargs)

validate_query(query, dialect='postgres')

weighted_reciprocal_rank(doc_lists, weights, rank_constant=60, set_scores=False)

`Fuseable`

`id` `property`

`score` `property` `writable`

`concat_fusion(chunk_lists, **kwargs)`

`format_sql_query(query)`

`passthrough_fusion(chunk_lists, **kwargs)`

`resolve_fusion_fn(fn)`

`rrf_fusion(chunk_lists, **kwargs)`

`validate_query(query, dialect='postgres')`

`weighted_reciprocal_rank(doc_lists, weights, rank_constant=60, set_scores=False)`