Skip to content

Runner

Runner module for executing batch evaluation workflows.

This module provides runner classes that orchestrate the execution of evaluation workflows, handling batch processing, async operations, and result collection. Runners coordinate between datasets, evaluators, and experiment trackers to perform comprehensive evaluations.

Components: - BaseRunner: Abstract base class for all runners - Runner: Main runner implementation for batch evaluation execution

BaseRunner(data, inference_fn, evaluators, experiment_tracker=None, batch_size=10, **kwargs)

Bases: ABC

Abstract class for runner.

This class defines the interface for all runner.

Attributes:

Name Type Description
data str | BaseDataset

The data to evaluate.

inference_fn Callable

The inference function to use.

evaluators list[BaseEvaluator]

The evaluators to use.

experiment_tracker BaseExperimentTracker | None

The experiment tracker.

batch_size int

The batch size to use for evaluation.

**kwargs int

Additional configuration parameters.

Initialize the runner.

Parameters:

Name Type Description Default
data str | BaseDataset

The data to evaluate.

required
inference_fn Callable

The inference function to use.

required
evaluators list[BaseEvaluator]

The evaluators to use.

required
experiment_tracker BaseExperimentTracker | None

The experiment tracker.

None
batch_size int

The batch size to use for evaluation.

10
**kwargs Any

Additional configuration parameters.

{}

evaluate() abstractmethod async

Run the evaluator on the dataset.

The dataset is evaluated in batches of the given batch size.

Returns:

Name Type Description
EvaluationResult EvaluationResult

Structured result containing evaluation results and experiment URLs/paths.

Runner(data, inference_fn, evaluators, experiment_tracker=None, batch_size=10, **kwargs)

Bases: BaseRunner

Runner class for evaluating datasets.

Attributes:

Name Type Description
data str | BaseDataset

The data to evaluate.

inference_fn Callable

The inference function to use.

evaluators list[BaseEvaluator]

The evaluators to use.

experiment_tracker ExperimentTrackerAdapter | type[BaseExperimentTracker] | None

The experiment tracker for logging evaluation results. Can be: - None: Uses SimpleExperimentTracker (default) - A tracker class: Will be instantiated with provided kwargs - A tracker instance: Will be used directly

**kwargs ExperimentTrackerAdapter | type[BaseExperimentTracker] | None

Additional configuration parameters.

Initialize the Runner.

Parameters:

Name Type Description Default
data str | BaseDataset

The data to evaluate.

required
inference_fn Callable

The inference function to use.

required
evaluators list[BaseEvaluator]

The evaluators to use.

required
experiment_tracker BaseExperimentTracker | type[BaseExperimentTracker] | None

The experiment tracker for logging evaluation results. Can be: - None: Uses SimpleExperimentTracker (default) - A tracker class: Will be instantiated with provided kwargs - A tracker instance: Will be used directly Defaults to None.

None
batch_size int

The batch size to use for evaluation.

10
**kwargs Any

Additional configuration parameters.

{}

evaluate() async

Run the evaluators on the dataset.

The dataset is evaluated using the new architecture: 1. Convert dataset to standard format 2. Run inference using InferenceHandler 3. Prepare data for tracking 4. Log to tracker

Returns:

Name Type Description
EvaluationResult EvaluationResult

Structured result containing evaluation results and experiment URLs/paths.

get_run_results(**kwargs)

Get the results of a run.

Parameters:

Name Type Description Default
**kwargs Any

Additional configuration parameters.

{}

Returns:

Type Description
list[dict[str, Any]]

list[dict[str, Any]]: The results of the run.