Skip to content

Agent evaluator

Agent Evaluator.

An evaluator for evaluating agent tasks using LangChain AgentEvals trajectory accuracy metric.

Authors

Apri Dwi Rachmadi (apri.d.rachmadi@gdplabs.id)

References

[1] https://github.com/langchain-ai/agentevals

AgentEvaluator(model=DefaultValues.AGENT_EVALS_MODEL, model_credentials=None, model_config=None, prompt=None, use_reference=True, continuous=False, choices=None, use_reasoning=True, few_shot_examples=None)

Bases: BaseEvaluator

Evaluator for agent tasks.

This evaluator uses the LangChain AgentEvals trajectory accuracy metric to evaluate the performance of AI agents based on their execution trajectories.

Default expected input
  • agent_trajectory (list[dict[str, Any]]): The agent trajectory containing the sequence of actions, tool calls, and responses.
  • expected_agent_trajectory (list[dict[str, Any]] | None, optional): The expected agent trajectory for reference-based evaluation.

Attributes:

Name Type Description
name str

The name of the evaluator.

trajectory_accuracy_metric LangChainAgentTrajectoryAccuracyMetric

The metric used to evaluate agent trajectory accuracy.

Initialize the AgentEvaluator.

Parameters:

Name Type Description Default
model str | ModelId | BaseLMInvoker

The model to use for the trajectory accuracy metric. Defaults to DefaultValues.AGENT_EVALS_MODEL.

AGENT_EVALS_MODEL
model_credentials str | None

The model credentials. Defaults to None. This is required for the metric to function properly.

None
model_config dict[str, Any] | None

The model configuration. Defaults to None.

None
prompt str | None

Custom prompt for evaluation. If None, uses the default prompt from the metric. Defaults to None.

None
use_reference bool

Whether to use expected_agent_trajectory for reference-based evaluation. Defaults to True.

True
continuous bool

If True, score will be a float between 0 and 1. If False, score will be boolean. Defaults to False.

False
choices list[float] | None

Optional list of specific float values the score must be chosen from. Defaults to None.

None
use_reasoning bool

If True, includes explanation for the score in the output. Defaults to True.

True
few_shot_examples list[Any] | None

Optional list of example evaluations to append to the prompt. Defaults to None.

None

Raises:

Type Description
ValueError

If model_credentials is not provided.

required_fields: set[str] property

Returns the required fields for the data.

Returns:

Type Description
set[str]

set[str]: The required fields for the data.