Runner
Runner module for executing batch evaluation workflows.
This module provides runner classes that orchestrate the execution of evaluation workflows, handling batch processing, async operations, and result collection. Runners coordinate between datasets, evaluators, and experiment trackers to perform comprehensive evaluations.
Components: - BaseRunner: Abstract base class for all runners - Runner: Main runner implementation for batch evaluation execution
BaseRunner(data, inference_fn, evaluators, experiment_tracker=None, batch_size=10, **kwargs)
Bases: ABC
Abstract class for runner.
This class defines the interface for all runner.
Attributes:
| Name | Type | Description |
|---|---|---|
data |
str | BaseDataset
|
The data to evaluate. |
inference_fn |
Callable
|
The inference function to use. |
evaluators |
list[BaseEvaluator]
|
The evaluators to use. |
experiment_tracker |
BaseExperimentTracker | None
|
The experiment tracker. |
batch_size |
int
|
The batch size to use for evaluation. |
**kwargs |
int
|
Additional configuration parameters. |
Initialize the runner.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data |
str | BaseDataset
|
The data to evaluate. |
required |
inference_fn |
Callable
|
The inference function to use. |
required |
evaluators |
list[BaseEvaluator]
|
The evaluators to use. |
required |
experiment_tracker |
BaseExperimentTracker | None
|
The experiment tracker. |
None
|
batch_size |
int
|
The batch size to use for evaluation. |
10
|
**kwargs |
Any
|
Additional configuration parameters. |
{}
|
evaluate()
abstractmethod
async
Run the evaluator on the dataset.
The dataset is evaluated in batches of the given batch size.
Returns:
| Name | Type | Description |
|---|---|---|
EvaluationResult |
EvaluationResult
|
Structured result containing evaluation results and experiment URLs/paths. |
Runner(data, inference_fn, evaluators, experiment_tracker=None, batch_size=10, **kwargs)
Bases: BaseRunner
Runner class for evaluating datasets.
Attributes:
| Name | Type | Description |
|---|---|---|
data |
str | BaseDataset
|
The data to evaluate. |
inference_fn |
Callable
|
The inference function to use. |
evaluators |
list[BaseEvaluator]
|
The evaluators to use. |
experiment_tracker |
ExperimentTrackerAdapter | type[BaseExperimentTracker] | None
|
The experiment tracker for logging evaluation results. Can be: - None: Uses SimpleExperimentTracker (default) - A tracker class: Will be instantiated with provided kwargs - A tracker instance: Will be used directly |
**kwargs |
ExperimentTrackerAdapter | type[BaseExperimentTracker] | None
|
Additional configuration parameters. |
Initialize the Runner.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data |
str | BaseDataset
|
The data to evaluate. |
required |
inference_fn |
Callable
|
The inference function to use. |
required |
evaluators |
list[BaseEvaluator]
|
The evaluators to use. |
required |
experiment_tracker |
BaseExperimentTracker | type[BaseExperimentTracker] | None
|
The experiment tracker for logging evaluation results. Can be: - None: Uses SimpleExperimentTracker (default) - A tracker class: Will be instantiated with provided kwargs - A tracker instance: Will be used directly Defaults to None. |
None
|
batch_size |
int
|
The batch size to use for evaluation. |
10
|
**kwargs |
Any
|
Additional configuration parameters. |
{}
|
evaluate()
async
Run the evaluators on the dataset.
The dataset is evaluated using the new architecture: 1. Convert dataset to standard format 2. Run inference using InferenceHandler 3. Prepare data for tracking 4. Log to tracker
Returns:
| Name | Type | Description |
|---|---|---|
EvaluationResult |
EvaluationResult
|
Structured result containing evaluation results and experiment URLs/paths. |
get_run_results(**kwargs)
Get the results of a run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs |
Any
|
Additional configuration parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
list[dict[str, Any]]: The results of the run. |