Skip to content

Metric

Base class for metrics.

Authors

Douglas Raevan Faisal (douglas.r.faisal@gdplabs.id) Surya Mahadi (made.r.s.mahadi@gdplabs.id)

References

NONE

BaseMetric

Bases: ABC

Abstract class for metrics.

This class defines the interface for all metrics.

Attributes:
    name (str): The name of the metric.
    required_fields (set[str]): The required fields for this metric to evaluate data.
    input_type (type | None): The type of the input data.

Example:
    Adding custom prompts to existing evaluator metrics:

    ```python
    from gllm_evals import load_simple_qa_dataset
    from gllm_evals.evaluate import evaluate
    from gllm_evals.evaluator.geval_generation_evaluator import GEvalGenerationEvaluator
    from gllm_evals.utils.shared_functionality import inference_fn


    async def main():
        # Main function with custom prompts

        # Load your dataset
        dataset = load_simple_qa_dataset()

        # Create evaluator with default metrics
        evaluator = GEvalGenerationEvaluator(
            model_credentials=os.getenv("GOOGLE_API_KEY")
        )

        # Add custom prompts polymorphically (works for any metric)
        for metric in evaluator.metrics:
            if hasattr(metric, 'name'):  # Ensure metric has name attribute
                # Add custom prompts based on metric name
                if metric.name == "geval_completeness":
                    # Add domain-specific few-shot examples
                    metric.additional_context += "

CUSTOM MEDICAL EXAMPLES: ..." elif metric.name == "geval_groundedness": # Add grounding examples metric.additional_context += "

MEDICAL GROUNDING EXAMPLES: ..."

        # Evaluate with custom prompts applied automatically
        results = await evaluate(
            data=dataset,
            inference_fn=inference_fn,
            evaluators=[evaluator],  # ← Custom prompts applied to metrics
        )
    ```

can_evaluate(data)

Check if this metric can evaluate the given data.

Parameters:

Name Type Description Default
data MetricInput

The input data to check.

required

Returns:

Name Type Description
bool bool

True if the metric can evaluate the data, False otherwise.

evaluate(data) async

Evaluate the metric on the given dataset (single item or batch).

Automatically handles batch processing by default. Subclasses can override _evaluate to accept lists for optimized batch processing.

Parameters:

Name Type Description Default
data MetricInput | list[MetricInput]

The data to evaluate the metric on. Can be a single item or a list for batch processing.

required

Returns:

Type Description
MetricOutput | list[MetricOutput]

MetricOutput | list[MetricOutput]: A dictionary where the key are the namespace and the value are the scores. Returns a list if input is a list.

get_input_fields() classmethod

Return declared input field names if input_type is provided; otherwise None.

Returns:

Type Description
list[str] | None

list[str] | None: The input fields.

get_input_spec() classmethod

Return structured spec for input fields if input_type is provided; otherwise None.

Returns:

Type Description
list[dict[str, Any]] | None

list[dict[str, Any]] | None: The input spec.