Aggregation strategy
Strategy objects for repeated-judge aggregation.
This module provides polymorphic aggregation strategies for repeated-judge
metric evaluation while preserving compatibility with the public
AggregationMethod enum and string selectors.
AverageAggregationStrategy
Bases: BaseJudgeAggregator
Aggregate repeated judge results using arithmetic mean.
strategy
property
Return the aggregation identifier for arithmetic averaging.
Returns:
| Name | Type | Description |
|---|---|---|
AggregationMethod |
AggregationMethod
|
|
aggregate(all_results, total_judges)
Aggregate judge results by computing the arithmetic mean score.
The representative result is chosen as the valid judge output whose numeric score is closest to the computed average, using input order as the tie-breaker.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
all_results
|
list[MetricOutput]
|
Raw results produced by each judge, including successful outputs and optional error payloads. |
required |
total_judges
|
int
|
Total number of judges configured for the evaluation run. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
MetricOutput |
MetricOutput
|
Representative result annotated with average-based metadata. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no valid judge results exist, or if any valid score is non-numeric. |
BaseJudgeAggregator
Bases: ABC
Abstract strategy for repeated-judge result aggregation.
strategy
abstractmethod
property
Return the canonical aggregation strategy identifier.
Returns:
| Name | Type | Description |
|---|---|---|
AggregationMethod |
AggregationMethod
|
Enum value that identifies the aggregation strategy. |
aggregate(all_results, total_judges)
abstractmethod
Aggregate repeated metric results into one representative result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
all_results
|
list[MetricOutput]
|
Raw results produced by each judge, including successful outputs and optional error payloads. |
required |
total_judges
|
int
|
Total number of judges configured for the evaluation run. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
MetricOutput |
MetricOutput
|
Representative aggregated result containing the selected score, metadata, and supporting judge context. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the implementation cannot produce a valid aggregate from the input. |
TypeError
|
If the implementation rejects the provided input type or strategy. |
aggregate_repeated_results(all_results, total_judges)
Extract valid repeated-judge results and supporting metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
all_results
|
list[MetricOutput]
|
Raw results produced by each judge, including successful outputs and optional error payloads. |
required |
total_judges
|
int
|
Total number of judges configured for the evaluation run. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
RepeatedResults |
RepeatedResults
|
Tuple containing valid results, valid scores, valid original indices, and collected judge error messages. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no judge results are provided, or if every result is invalid after filtering out missing scores and explicit errors. |
MajorityVoteAggregationStrategy
Bases: BaseJudgeAggregator
Aggregate repeated judge results using majority vote.
strategy
property
Return the aggregation identifier for majority vote.
Returns:
| Name | Type | Description |
|---|---|---|
AggregationMethod |
AggregationMethod
|
|
aggregate(all_results, total_judges)
Aggregate judge results by selecting the most frequent numeric score.
Ties are resolved by delegating to the median strategy so the result still maps to a real judge output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
all_results
|
list[MetricOutput]
|
Raw results produced by each judge, including successful outputs and optional error payloads. |
required |
total_judges
|
int
|
Total number of judges configured for the evaluation run. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
MetricOutput |
MetricOutput
|
Representative result annotated with majority-vote metadata. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no valid judge results exist, or if any valid score is non-numeric. |
MedianAggregationStrategy
Bases: BaseJudgeAggregator
Aggregate repeated judge results using the observed median.
strategy
property
Return the aggregation identifier for median selection.
Returns:
| Name | Type | Description |
|---|---|---|
AggregationMethod |
AggregationMethod
|
|
aggregate(all_results, total_judges)
Aggregate judge results by selecting the observed median score.
For even-sized inputs, this strategy chooses the upper median instead of the arithmetic mean so the representative output still corresponds to an actual judge result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
all_results
|
list[MetricOutput]
|
Raw results produced by each judge, including successful outputs and optional error payloads. |
required |
total_judges
|
int
|
Total number of judges configured for the evaluation run. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
MetricOutput |
MetricOutput
|
Representative result annotated with median-selection metadata. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no valid judge results exist, or if any valid score is non-numeric. |
build_aggregation_strategy(aggregation_method)
Build an aggregation strategy from enum, string, or strategy input.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
aggregation_method
|
AggregationSelector
|
Aggregation strategy expressed as an
|
required |
Returns:
| Name | Type | Description |
|---|---|---|
BaseJudgeAggregator |
BaseJudgeAggregator
|
Aggregation strategy instance matching the requested method. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
TypeError
|
If |