Evaluate

Evaluate Module.

This module is used to evaluate precomputed model outputs using a convenience function.

The evaluation pipeline requires input data to already contain model outputs (e.g., actual_output). It does not perform live model inference.

`evaluate(data, evaluators, experiment_tracker=None, batch_size=10, allow_batch_evaluation=False, summary_evaluators=None, **kwargs)` `async`

Evaluate the model.

Input data must already contain model outputs (e.g. actual_output). This function does not perform live model inference.

Parameters:

Name	Type	Description	Default
`data`	`str \| BaseDataset \| list[EvalInput]`	The data to evaluate. When a list is given, LLMTestData rows are normalized and wrapped in a DictDataset before being passed to the Runner.	required
`evaluators`	`list[BaseEvaluator \| BaseMetric]`	The evaluators to use.	required
`experiment_tracker`	`BaseExperimentTracker \| None`	The experiment tracker to use.	`None`
`batch_size`	`int`	The batch size to use for evaluation (runner-level chunking for memory management). Defaults to 10.	`10`
`allow_batch_evaluation`	`bool`	Enable batch processing mode for LLM API calls. When True, the runner passes entire chunks to evaluators for batch processing. Defaults to False.	`False`
`summary_evaluators`	`list[SummaryEvaluatorCallable] \| None`	Custom summary evaluators to compute batch-level statistics. Each callable receives (evaluation_results, data) and returns a dict of summary metrics. Defaults to None.	`None`
`**kwargs`	`Any`	Additional configuration parameters.	`{}`

Returns:

Name	Type	Description
`EvaluationResult`	`EvaluationResult`	Structured result containing evaluation results and experiment URLs/paths.