Skip to content

Aggregation strategy

Strategy objects for repeated-judge aggregation.

This module provides polymorphic aggregation strategies for repeated-judge metric evaluation while preserving compatibility with the public AggregationMethod enum and string selectors.

AverageAggregationStrategy

Bases: BaseJudgeAggregator

Aggregate repeated judge results using arithmetic mean.

strategy property

Return the aggregation identifier for arithmetic averaging.

Returns:

Name Type Description
AggregationMethod AggregationMethod

AggregationMethod.AVERAGE.

aggregate(all_results, total_judges)

Aggregate judge results by computing the arithmetic mean score.

The representative result is chosen as the valid judge output whose numeric score is closest to the computed average, using input order as the tie-breaker.

Parameters:

Name Type Description Default
all_results list[MetricOutput]

Raw results produced by each judge, including successful outputs and optional error payloads.

required
total_judges int

Total number of judges configured for the evaluation run.

required

Returns:

Name Type Description
MetricOutput MetricOutput

Representative result annotated with average-based metadata.

Raises:

Type Description
ValueError

If no valid judge results exist, or if any valid score is non-numeric.

BaseJudgeAggregator

Bases: ABC

Abstract strategy for repeated-judge result aggregation.

strategy abstractmethod property

Return the canonical aggregation strategy identifier.

Returns:

Name Type Description
AggregationMethod AggregationMethod

Enum value that identifies the aggregation strategy.

aggregate(all_results, total_judges) abstractmethod

Aggregate repeated metric results into one representative result.

Parameters:

Name Type Description Default
all_results list[MetricOutput]

Raw results produced by each judge, including successful outputs and optional error payloads.

required
total_judges int

Total number of judges configured for the evaluation run.

required

Returns:

Name Type Description
MetricOutput MetricOutput

Representative aggregated result containing the selected score, metadata, and supporting judge context.

Raises:

Type Description
ValueError

If the implementation cannot produce a valid aggregate from the input.

TypeError

If the implementation rejects the provided input type or strategy.

aggregate_repeated_results(all_results, total_judges)

Extract valid repeated-judge results and supporting metadata.

Parameters:

Name Type Description Default
all_results list[MetricOutput]

Raw results produced by each judge, including successful outputs and optional error payloads.

required
total_judges int

Total number of judges configured for the evaluation run.

required

Returns:

Name Type Description
RepeatedResults RepeatedResults

Tuple containing valid results, valid scores, valid original indices, and collected judge error messages.

Raises:

Type Description
ValueError

If no judge results are provided, or if every result is invalid after filtering out missing scores and explicit errors.

MajorityVoteAggregationStrategy

Bases: BaseJudgeAggregator

Aggregate repeated judge results using majority vote.

strategy property

Return the aggregation identifier for majority vote.

Returns:

Name Type Description
AggregationMethod AggregationMethod

AggregationMethod.MAJORITY_VOTE.

aggregate(all_results, total_judges)

Aggregate judge results by selecting the most frequent numeric score.

Ties are resolved by delegating to the median strategy so the result still maps to a real judge output.

Parameters:

Name Type Description Default
all_results list[MetricOutput]

Raw results produced by each judge, including successful outputs and optional error payloads.

required
total_judges int

Total number of judges configured for the evaluation run.

required

Returns:

Name Type Description
MetricOutput MetricOutput

Representative result annotated with majority-vote metadata.

Raises:

Type Description
ValueError

If no valid judge results exist, or if any valid score is non-numeric.

MedianAggregationStrategy

Bases: BaseJudgeAggregator

Aggregate repeated judge results using the observed median.

strategy property

Return the aggregation identifier for median selection.

Returns:

Name Type Description
AggregationMethod AggregationMethod

AggregationMethod.MEDIAN.

aggregate(all_results, total_judges)

Aggregate judge results by selecting the observed median score.

For even-sized inputs, this strategy chooses the upper median instead of the arithmetic mean so the representative output still corresponds to an actual judge result.

Parameters:

Name Type Description Default
all_results list[MetricOutput]

Raw results produced by each judge, including successful outputs and optional error payloads.

required
total_judges int

Total number of judges configured for the evaluation run.

required

Returns:

Name Type Description
MetricOutput MetricOutput

Representative result annotated with median-selection metadata.

Raises:

Type Description
ValueError

If no valid judge results exist, or if any valid score is non-numeric.

build_aggregation_strategy(aggregation_method)

Build an aggregation strategy from enum, string, or strategy input.

Parameters:

Name Type Description Default
aggregation_method AggregationSelector

Aggregation strategy expressed as an AggregationMethod enum, a compatible string value, or an existing strategy instance.

required

Returns:

Name Type Description
BaseJudgeAggregator BaseJudgeAggregator

Aggregation strategy instance matching the requested method.

Raises:

Type Description
ValueError

If aggregation_method is None or a string that does not map to a supported AggregationMethod value.

TypeError

If aggregation_method has an unsupported type.