Skip to content

Dict dataset

Dict-Based Dataset.

DictDataset(dataset, dataset_name=None, attachments_config=None)

Bases: BaseDataset

Dict-Based Dataset.

This class is a subclass of the BaseDataset class. It is used to store a dataset in a dictionary format.

Attributes:

Name Type Description
dataset list[dict]

The dataset to evaluate.

Initialize the DictDataset class.

Parameters:

Name Type Description Default
dataset list[MetricInput]

The dataset to use for the evaluation.

required
dataset_name str | None

The name of the dataset.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None

from_csv(path, dataset_name=None, attachments_config=None, **kwargs) classmethod

Load a dataset from a CSV file.

Parameters:

Name Type Description Default
path str

The path to the CSV file.

required
dataset_name str | None

The name of the dataset. If None, defaults to filename. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None
**kwargs Any

Additional arguments to pass to pandas read_csv.

{}

Returns:

Name Type Description
DictDataset DictDataset

The loaded dataset.

from_gsheets(sheet_id, worksheet_name, client_email, private_key, dataset_name=None, attachments_config=None) async staticmethod

Load a dataset from Google Sheets.

Parameters:

Name Type Description Default
sheet_id str

The ID of the Google Sheet.

required
worksheet_name str

The name of the worksheet within the Google Sheet.

required
client_email str

The client email for Google Sheets API.

required
private_key str

Base64-encoded private key for Google Sheets API.

required
dataset_name str | None

The name of the dataset. If None, defaults to worksheet_name. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None

Returns:

Name Type Description
DictDataset DictDataset

The loaded dataset.

from_huggingface_hub(path_or_name, split, dataset_name=None, attachments_config=None, **kwargs) staticmethod

Load a dataset from HuggingFace Hub.

Parameters:

Name Type Description Default
path_or_name str

The path or name of the dataset on HuggingFace Hub.

required
split str

The split of the dataset (e.g. "train", "test").

required
dataset_name str | None

The name of the dataset. If None, defaults to path_or_name. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None
**kwargs Any

Additional arguments to pass to datasets.load_dataset.

{}

Returns:

Name Type Description
DictDataset DictDataset

The loaded dataset.

from_jsonl(path, dataset_name=None, attachments_config=None, **kwargs) classmethod

Load a dataset from a JSONL file.

Parameters:

Name Type Description Default
path str

The path to the JSONL file.

required
dataset_name str | None

The name of the dataset. If None, defaults to filename. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None
**kwargs Any

Additional arguments to pass to the constructor (deprecated, use attachments_config instead).

{}

Returns:

Name Type Description
DictDataset DictDataset

The loaded dataset.

from_langfuse(langfuse_client, dataset_name, attachments_config=None) staticmethod

Load a dataset from Langfuse (read-only).

Parameters:

Name Type Description Default
langfuse_client Any

The Langfuse client instance.

required
dataset_name str

The name of the dataset in Langfuse.

required
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None

Returns:

Name Type Description
DictDataset DictDataset

The loaded dataset.

Raises:

Type Description
ValueError

If the dataset is not found or has no items.

load()

Load the dataset.

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The loaded dataset.

validate()

Validate the dataset.

Raises:

Type Description
ValueError

If the dataset is not a list of MetricInput.