Skip to content

Lm based metric

This module contains the LLM-based metric class.

Authors

Douglas Raevan Faisal (douglas.r.faisal@gdplabs.id) Surya Mahadi (made.r.s.mahadi@gdplabs.id)

References

None

LMBasedMetric(name, response_schema, prompt_builder, model=DefaultValues.MODEL, model_credentials=None, model_config=None, parse_response_fn=None, batch_status_check_interval=DefaultValues.BATCH_STATUS_CHECK_INTERVAL, batch_max_iterations=DefaultValues.BATCH_MAX_ITERATIONS)

Bases: BaseMetric

A multi purpose LM-based metric class.

This class is a multi purpose LM-based metric class that can be used to evaluate the performance of a LM-based metric. It can be used to evaluate the performance of a LM-based metric by providing a response schema, a prompt builder, a model id, and a model credentials.

Attributes:

Name Type Description
name str

The name of the metric.

response_schema ResponseSchema

The response schema to use for the metric.

prompt_builder PromptBuilder

The prompt builder to use for the metric.

model_credentials str

The model credentials to use for the metric.

model Union[str, ModelId, BaseLMInvoker]

The model to use for the metric.

model_config dict[str, Any] | None

The model config to use for the metric. Defaults to an empty.

Initialize the LMBasedMetric class.

Parameters:

Name Type Description Default
name str

The name of the metric.

required
response_schema ResponseSchema

The response schema to use for the metric.

required
prompt_builder PromptBuilder

The prompt builder to use for the metric.

required
model Union[str, ModelId, BaseLMInvoker]

The model to use for the metric.

MODEL
model_credentials str | None

The model credentials to use for the metric. Defaults to None.

None
model_config dict[str, Any] | None

The model config to use for the metric. Defaults to an empty dictionary.

None
parse_response_fn Callable[[str | LMOutput], MetricOutput] | None

The function to use to parse the response from the LM. Defaults to a function that parses the response from the LM.

None
batch_status_check_interval float

Time between batch status checks in seconds. Defaults to 30.0.

BATCH_STATUS_CHECK_INTERVAL
batch_max_iterations int

Maximum number of status check iterations before timeout. Defaults to 120 (60 minutes with default interval).

BATCH_MAX_ITERATIONS

evaluate(data) async

Evaluate with custom prompt lifecycle support.

Overrides BaseMetric.evaluate() to add custom prompt application and state management. This ensures custom prompts are applied before evaluation and state is restored after.

For batch processing, uses efficient batch API when all items have the same custom prompts. Falls back to per-item processing when items have different custom prompts.

Parameters:

Name Type Description Default
data MetricInput | list[MetricInput]

Single data item or list of data items to evaluate.

required

Returns:

Type Description
MetricOutput | list[MetricOutput]

Evaluation results with scores namespaced by metric name.

default_parse_response_fn(response)

Default function to parse the result of the LLM into a MetricOutput.

Assumes response contains 'score' and 'reason' or 'explanation' fields.

Parameters:

Name Type Description Default
response str | LMOutput

The response from the LLM, which can be either a string containing JSON or an LMOutput object with structured output.

required

Returns:

Name Type Description
MetricOutput MetricOutput

The parsed response as a dictionary.

Raises:

Type Description
ValueError

If the response cannot be parsed or is missing required fields.