Metric
Base class for metrics.
References
NONE
BaseMetric
Bases: ABC
Abstract class for metrics.
This class defines the interface for all metrics.
Attributes:
name (str): The name of the metric.
required_fields (set[str]): The required fields for this metric to evaluate data.
input_type (type | None): The type of the input data.
Example:
Adding custom prompts to existing evaluator metrics:
```python
from gllm_evals import load_simple_qa_dataset
from gllm_evals.evaluate import evaluate
from gllm_evals.evaluator.geval_generation_evaluator import GEvalGenerationEvaluator
from gllm_evals.utils.shared_functionality import inference_fn
async def main():
# Main function with custom prompts
# Load your dataset
dataset = load_simple_qa_dataset()
# Create evaluator with default metrics
evaluator = GEvalGenerationEvaluator(
model_credentials=os.getenv("GOOGLE_API_KEY")
)
# Add custom prompts polymorphically (works for any metric)
for metric in evaluator.metrics:
if hasattr(metric, 'name'): # Ensure metric has name attribute
# Add custom prompts based on metric name
if metric.name == "geval_completeness":
# Add domain-specific few-shot examples
metric.additional_context += "
CUSTOM MEDICAL EXAMPLES: ..." elif metric.name == "geval_groundedness": # Add grounding examples metric.additional_context += "
MEDICAL GROUNDING EXAMPLES: ..."
# Evaluate with custom prompts applied automatically
results = await evaluate(
data=dataset,
inference_fn=inference_fn,
evaluators=[evaluator], # ← Custom prompts applied to metrics
)
```
can_evaluate(data)
Check if this metric can evaluate the given data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
MetricInput
|
The input data to check. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the metric can evaluate the data, False otherwise. |
evaluate(data)
async
Evaluate the metric on the given dataset (single item or batch).
Automatically handles batch processing by default. Subclasses can override
_evaluate to accept lists for optimized batch processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
MetricInput | list[MetricInput]
|
The data to evaluate the metric on. Can be a single item or a list for batch processing. |
required |
Returns:
| Type | Description |
|---|---|
MetricOutput | list[MetricOutput]
|
MetricOutput | list[MetricOutput]: A dictionary where the key are the namespace and the value are the scores. Returns a list if input is a list. |
get_input_fields()
classmethod
Return declared input field names if input_type is provided; otherwise None.
Returns:
| Type | Description |
|---|---|
list[str] | None
|
list[str] | None: The input fields. |
get_input_spec()
classmethod
Return structured spec for input fields if input_type is provided; otherwise None.
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]] | None
|
list[dict[str, Any]] | None: The input spec. |