Skip to content

Dataset

Base class for all datasets.

Authors

Surya Mahadi (made.r.s.mahadi@gdplabs.id)

References

NONE

BaseDataset(version=None, hash=None, name=None, description=None, schema=None, additional_metadata=None, attachments_config=None)

Bases: ABC, Iterable

Base class for all datasets.

Attributes:

Name Type Description
dataset list[MetricInput]

The dataset to evaluate.

version str | None

The version of the dataset.

hash str | None

The hash of the dataset.

name str | None

The name of the dataset.

description str | None

The description of the dataset.

schema type[BaseModel] | None

The schema of the dataset.

additional_metadata dict[str, Any] | None

Additional metadata of the dataset.

Initialize the dataset.

Parameters:

Name Type Description Default
version str | None

The version of the dataset. Defaults to None.

None
hash str | None

The hash of the dataset. Defaults to None.

None
name str | None

The name of the dataset. Defaults to None.

None
description str | None

The description of the dataset. Defaults to None.

None
schema type[BaseModel] | None

The schema of the dataset. Defaults to None.

None
additional_metadata dict[str, Any] | None

Additional metadata of the dataset. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None

__getitem__(index)

Get the item at the given index.

Parameters:

Name Type Description Default
index int

The index of the item to get.

required

Returns:

Type Description
MetricInput | list[MetricInput]

MetricInput | list[MetricInput]: The item at the given index or a list of items if the index is a list.

__iter__()

Iterate over the dataset.

Returns:

Type Description
Iterator[MetricInput]

Iterator[MetricInput]: An iterator over the dataset.

__len__()

Get the length of the dataset.

Returns:

Name Type Description
int int

The length of the dataset.

filter(filter_fn)

Filter the dataset.

Parameters:

Name Type Description Default
filter_fn Callable[[MetricInput], bool]

The filter function.

required

load() abstractmethod

Load the dataset.

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The loaded dataset.

Raises:

Type Description
NotImplementedError

If the load method is not implemented.

map(map_fn)

Map the dataset.

Parameters:

Name Type Description Default
map_fn Callable[[MetricInput], MetricInput]

The map function.

required

prepare_row_for_inference(row, **kwargs) async

Prepare a single row for inference.

This method allows dataset-specific preprocessing before inference. For example, LangfuseDataset might create/sync dataset items here.

Parameters:

Name Type Description Default
row dict[str, Any]

The row to prepare.

required
**kwargs Any

Additional arguments (e.g., dataset_item_id for Langfuse).

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: The prepared row.

sample(n=3)

Sample n items from the dataset.

Parameters:

Name Type Description Default
n int

The number of items to sample.

3

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The sampled items.

shuffle()

Shuffle the dataset.

to_standard_format()

Convert dataset to standard format for inference.

For most datasets, this is a no-op as they already return standard format. LangfuseDataset overrides this to convert from Langfuse format.

Returns:

Type Description
list[MetricInput]

list[MetricInput]: Dataset in standard format.

validate() abstractmethod

Validate the dataset.

Raises:

Type Description
NotImplementedError

If the validate method is not implemented.