Skip to content

Langchain openevals

This module contains the LangChainOpenEvalsMetric and LangChainOpenEvalsLLMAsAJudgeMetric.

Authors

Surya Mahadi (made.r.s.mahadi@gdplabs.id)

References

[1] https://github.com/langchain-ai/openevals

LangChainOpenEvalsLLMAsAJudgeMetric(name, prompt, model, system=None, credentials=None, config=None, schema=None, feedback_key='score', continuous=False, choices=None, use_reasoning=True, few_shot_examples=None)

Bases: LangChainOpenEvalsMetric

A metric that uses LangChain and OpenEvals to evaluate the LLM as a judge.

Attributes:

Name Type Description
name str

The name of the metric.

prompt str

The prompt to use.

model str | ModelId | BaseLMInvoker

The model to use.

Initialize the LangChainOpenEvalsLLMAsAJudgeMetric.

Parameters:

Name Type Description Default
name str

The name of the metric.

required
prompt str

The evaluation prompt, can be a string template, LangChain prompt template, or callable that returns a list of chat messages.

required
model str | ModelId | BaseLMInvoker

The model to use.

required
system str | None

Optional system message to prepend to the prompt.

None
credentials str | None

The credentials to use for the model. Defaults to None.

None
config dict[str, Any] | None

The config to use for the model. Defaults to None.

None
schema ResponseSchema | None

The schema to use for the model. Defaults to None.

None
feedback_key str

Key used to store the evaluation result, defaults to "score".

'score'
continuous bool

If True, score will be a float between 0 and 1. If False, score will be boolean. Defaults to False.

False
choices list[float] | None

Optional list of specific float values the score must be chosen from. Defaults to None.

None
use_reasoning bool

If True, includes explanation for the score in the output. Defaults to True.

True
few_shot_examples list[FewShotExample] | None

Optional list of example evaluations to append to the prompt. Defaults to None.

None

evaluate(data) async

Evaluate with custom prompt lifecycle support.

Overrides BaseMetric.evaluate() to add custom prompt application and state management. This ensures custom prompts are applied before evaluation and state is restored after.

For batch processing, checks for custom prompts and processes accordingly. Currently processes items individually; batch optimization can be added in future.

Parameters:

Name Type Description Default
data MetricInput | list[MetricInput]

Single data item or list of data items to evaluate.

required

Returns:

Type Description
MetricOutput | list[MetricOutput]

Evaluation results with scores namespaced by metric name.

LangChainOpenEvalsMetric(name, evaluator)

Bases: BaseMetric

A metric that uses LangChain and OpenEvals.

Attributes:

Name Type Description
name str

The name of the metric.

evaluator Union[SimpleAsyncEvaluator, Callable[..., Awaitable[Any]]]

The evaluator to use.

Initialize the LangChainOpenEvalsMetric.

Parameters:

Name Type Description Default
name str

The name of the metric.

required
evaluator Union[SimpleAsyncEvaluator, Callable[..., Awaitable[Any]]]

The evaluator to use.

required