Skip to content

Relevance Filter

Modules used to filter chunks based on its relevance to a given query.

LMBasedRelevanceFilter(lm_request_processor, batch_size=DEFAULT_BATCH_SIZE, on_failure_keep_all=True, metadata=None, chunk_format=DEFAULT_CHUNK_TEMPLATE)

Bases: BaseRelevanceFilter, UsesLM

Relevance filter that uses an LM to determine chunk relevance.

This filter processes chunks in batches, sending them to an LM for relevance determination. It handles potential LM processing failures with a simple strategy controlled by the 'on_failure_keep_all' parameter.

The LM is expected to return a specific output format for each chunk, indicating its relevance to the given query.

The expected LM output format is:

    {
        "results": [
            {
                "explanation": str,
                "is_relevant": bool
            },
            ...
        ]
    }

The number of items in "results" should match the number of input chunks.

Attributes:

Name Type Description
lm_request_processor LMRequestProcessor

The LM request processor used for LM calls.

batch_size int

The number of chunks to process in each LM call.

on_failure_keep_all bool

If True, keep all chunks when LM processing fails. If False, discard all chunks from the failed batch.

metadata list[str] | None

List of metadata fields to include. If None, no metadata is included.

chunk_format str | Callable[[Chunk], str]

Either a format string or a callable for custom chunk formatting. If using a format string: - Use {content} for chunk content - Use {metadata} for auto-formatted metadata block - Or reference metadata fields directly: {field_name}

Initialize the LMBasedRelevanceFilter.

Parameters:

Name Type Description Default
lm_request_processor LMRequestProcessor

The LM request processor to use for LM calls.

required
batch_size int

The number of chunks to process in each LM call. Defaults to DEFAULT_BATCH_SIZE.

DEFAULT_BATCH_SIZE
on_failure_keep_all bool

If True, keep all chunks when LM processing fails. If False, discard all chunks from the failed batch. Defaults to True.

True
metadata list[str] | None

List of metadata fields to include. If None, no metadata is included.

None
chunk_format str | Callable[[Chunk], str]

Either a format string or a callable for custom chunk formatting. If using a format string: - Use {content} for chunk content - Use {metadata} for auto-formatted metadata block - Or reference metadata fields directly: {field_name} Defaults to DEFAULT_CHUNK_TEMPLATE.

DEFAULT_CHUNK_TEMPLATE

SimilarityBasedRelevanceFilter(em_invoker, threshold=0.5)

Bases: BaseRelevanceFilter

Relevance filter that uses semantic similarity to determine chunk relevance.

Attributes:

Name Type Description
em_invoker BaseEMInvoker

The embedding model invoker to use for vectorization.

threshold float

The similarity threshold for relevance (0 to 1). Defaults to 0.5.

Initialize the SimilarityBasedRelevanceFilter.

Parameters:

Name Type Description Default
em_invoker BaseEMInvoker

The embedding model invoker to use for vectorization.

required
threshold float

The similarity threshold for relevance (0 to 1). Defaults to 0.5.

0.5

Raises:

Type Description
ValueError

If the threshold is not between 0 and 1.