Langchain agentevals
This module contains the LangChain AgentEvals Metric.
References
[1] https://github.com/langchain-ai/agentevals
LangChainAgentEvalsLLMAsAJudgeMetric(name, prompt, model, credentials=None, config=None, schema=None, feedback_key='trajectory_accuracy', continuous=False, choices=None, use_reasoning=True, few_shot_examples=None, batch_status_check_interval=DefaultValues.BATCH_STATUS_CHECK_INTERVAL, batch_max_iterations=DefaultValues.BATCH_MAX_ITERATIONS)
Bases: LangChainAgentEvalsMetric
A metric that uses LangChain AgentEvals to evaluate Agent as a judge.
Available Fields: - agent_trajectory (list[dict[str, Any]]): The agent trajectory. - expected_agent_trajectory (list[dict[str, Any]]): The expected agent trajectory.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the metric. |
evaluator |
SimpleAsyncEvaluator
|
The evaluator to use. |
Initialize the LangChainAgentEvalsLLMAsAJudgeMetric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the metric. |
required |
prompt
|
str
|
The evaluation prompt, can be a string template, LangChain prompt template, or callable that returns a list of chat messages. Note that the default prompt allows a rubric in addition to the typical "inputs", "outputs", and "reference_outputs" parameters. |
required |
model
|
str | ModelId | BaseLMInvoker
|
The model to use. |
required |
credentials
|
str | None
|
The credentials to use for the model. Defaults to None. |
None
|
config
|
dict[str, Any] | None
|
The config to use for the model. Defaults to None. |
None
|
schema
|
ResponseSchema | None
|
The schema to use for the model. Defaults to None. |
None
|
feedback_key
|
str
|
Key used to store the evaluation result, defaults to "trajectory_accuracy". |
'trajectory_accuracy'
|
continuous
|
bool
|
If True, score will be a float between 0 and 1. If False, score will be boolean. Defaults to False. |
False
|
choices
|
list[float] | None
|
Optional list of specific float values the score must be chosen from. Defaults to None. |
None
|
use_reasoning
|
bool
|
If True, includes explanation for the score in the output. Defaults to True. |
True
|
few_shot_examples
|
list[FewShotExample] | None
|
Optional list of example evaluations to append to the prompt. Defaults to None. |
None
|
batch_status_check_interval
|
float
|
Interval in seconds between batch status checks. Defaults to 30.0. |
BATCH_STATUS_CHECK_INTERVAL
|
batch_max_iterations
|
int
|
Maximum number of batch status check iterations. Defaults to 120. |
BATCH_MAX_ITERATIONS
|
evaluate(data)
async
Evaluate with custom prompt lifecycle support.
Overrides BaseMetric.evaluate() to add custom prompt application and state management. This ensures custom prompts are applied before evaluation and state is restored after.
For batch processing, uses efficient batch API when all items have the same custom prompts. Falls back to per-item processing when items have different custom prompts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
MetricInput | list[MetricInput]
|
Single data item or list of data items to evaluate. |
required |
Returns:
| Type | Description |
|---|---|
MetricOutput | list[MetricOutput]
|
Evaluation results with scores namespaced by metric name. |
LangChainAgentEvalsMetric(name, evaluator)
Bases: BaseMetric
A metric that uses LangChain AgentEvals to evaluate Agent.
Available Fields: - agent_trajectory (list[dict[str, Any]]): The agent trajectory. - expected_agent_trajectory (list[dict[str, Any]]): The expected agent trajectory.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the metric. |
evaluator |
SimpleAsyncEvaluator
|
The evaluator to use. |
Initialize the LangChainAgentEvalsMetric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the metric. |
required |
evaluator
|
SimpleAsyncEvaluator
|
The evaluator to use. |
required |