Lm based metric
This module contains the LLM-based metric class.
References
None
LMBasedMetric(name, response_schema, prompt_builder, model=DefaultValues.MODEL, model_credentials=None, model_config=None, parse_response_fn=None, batch_status_check_interval=DefaultValues.BATCH_STATUS_CHECK_INTERVAL, batch_max_iterations=DefaultValues.BATCH_MAX_ITERATIONS)
Bases: BaseMetric
A multi purpose LM-based metric class.
This class is a multi purpose LM-based metric class that can be used to evaluate the performance of a LM-based metric. It can be used to evaluate the performance of a LM-based metric by providing a response schema, a prompt builder, a model id, and a model credentials.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the metric. |
response_schema |
ResponseSchema
|
The response schema to use for the metric. |
prompt_builder |
PromptBuilder
|
The prompt builder to use for the metric. |
model_credentials |
str
|
The model credentials to use for the metric. |
model |
Union[str, ModelId, BaseLMInvoker]
|
The model to use for the metric. |
model_config |
dict[str, Any] | None
|
The model config to use for the metric. Defaults to an empty. |
Initialize the LMBasedMetric class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the metric. |
required |
response_schema
|
ResponseSchema
|
The response schema to use for the metric. |
required |
prompt_builder
|
PromptBuilder
|
The prompt builder to use for the metric. |
required |
model
|
Union[str, ModelId, BaseLMInvoker]
|
The model to use for the metric. |
MODEL
|
model_credentials
|
str | None
|
The model credentials to use for the metric. Defaults to None. |
None
|
model_config
|
dict[str, Any] | None
|
The model config to use for the metric. Defaults to an empty dictionary. |
None
|
parse_response_fn
|
Callable[[str | LMOutput], MetricOutput] | None
|
The function to use to parse the response from the LM. Defaults to a function that parses the response from the LM. |
None
|
batch_status_check_interval
|
float
|
Time between batch status checks in seconds. Defaults to 30.0. |
BATCH_STATUS_CHECK_INTERVAL
|
batch_max_iterations
|
int
|
Maximum number of status check iterations before timeout. Defaults to 120 (60 minutes with default interval). |
BATCH_MAX_ITERATIONS
|
evaluate(data)
async
Evaluate with custom prompt lifecycle support.
Overrides BaseMetric.evaluate() to add custom prompt application and state management. This ensures custom prompts are applied before evaluation and state is restored after.
For batch processing, uses efficient batch API when all items have the same custom prompts. Falls back to per-item processing when items have different custom prompts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
MetricInput | list[MetricInput]
|
Single data item or list of data items to evaluate. |
required |
Returns:
| Type | Description |
|---|---|
MetricOutput | list[MetricOutput]
|
Evaluation results with scores namespaced by metric name. |
default_parse_response_fn(response)
Default function to parse the result of the LLM into a MetricOutput.
Assumes response contains 'score' and 'reason' or 'explanation' fields.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
response
|
str | LMOutput
|
The response from the LLM, which can be either a string containing JSON or an LMOutput object with structured output. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
MetricOutput |
MetricOutput
|
The parsed response as a dictionary. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the response cannot be parsed or is missing required fields. |