Skip to content

Schema

Schema package for gllm_training.

This module re-exports all schema components from various modules for backward compatibility.

DPOColumnMappingConfig

Bases: BaseModel

Defines the structure for column mapping configuration.

Attributes:

Name Type Description
input_columns Dict[str, str]

Mapping of template variables to DataFrame column names.

chosen Optional[str]

Column name for chosen.

rejected Optional[str]

Column name for rejected.

image_columns Optional[Dict[str, str]]

Mapping of template variables to DataFrame column names containing image paths or URLs.

ExperimentConfig

Bases: BaseModel

Defines the configuration for a single fine-tuning experiment.

Attributes:

Name Type Description
experiment_id str

The ID of the experiment.

hyperparameters Hyperparameters

The hyperparameters for the experiment.

hyperparameters_id str

The ID of the hyperparameters.

topic str

The topic of the experiment.

model_name str

The name of the model.

framework FinetuningLibraryTypes

The fine-tuning framework to use.

finetuning_technique FinetuningTechniques

The fine-tuning technique to use. Options: "sft", "grpo". Defaults to "sft".

multimodal bool

Whether the model is multimodal.

datasets_path str | None

The path to the datasets directory.

train_filename str | None

CSV filename for training data.

validation_filename str | None

CSV filename for validation data.

prompt_filename str | None

CSV filename for prompt templates.

spreadsheet_id str | None

The ID of the Google Sheets spreadsheet.

google_sheets_client_email str | None

Google Sheets client email.

google_sheets_private_key str | None

Google Sheets private key.

google_scopes list[str] | None

Google scopes for service account.

google_token_uri str | None

Google token URI for service account.

train_sheet str

The name of the training sheet.

validation_sheet str | None

The name of the validation sheet.

prompt_sheet str

The name of the prompt sheet.

prompt_name str | None

The name of the prompt.

dataset_text_field str

The text field in the dataset.

sft_column_mapping_config dict[str, Any] | None

Custom configuration for column mappings.

save_processed_dataset bool

Whether to save the processed dataset.

output_processed_dir str

The output directory for the processed dataset.

model_path str | None

The path to the fine-tuned adapter or full model.

FinetunedHyperparameters

Bases: BaseModel

Defines the hyperparameters for a fine-tuning experiment.

Attributes:

Name Type Description
hyperparameters_id str

The ID of the hyperparameters.

model_settings ModelConfig

The model configuration.

lora_config LoraConfig

The LoRA configuration.

training_config TrainingConfig

The training configuration.

storage_config StorageConfig

The storage configuration.

FrameworkType

Bases: StrEnum

Defines valid fine-tuning frameworks.

GRPOColumnMappingConfig

Bases: BaseModel

Defines the structure for column mapping configuration.

Attributes:

Name Type Description
input_columns Dict[str, str]

Mapping of template variables to DataFrame column names.

label_columns Optional[str]

Column name for labels.

image_columns Optional[Dict[str, str]]

Mapping of template variables to DataFrame column names containing image paths or URLs.

GoogleSheetsAuthentication

Bases: BaseModel

Defines valid authentication parameters for Google Sheets API interaction.

Attributes:

Name Type Description
spreadsheet_id str

The ID of the Google Sheets spreadsheet.

client_email str

The client email for the Google Sheets API.

private_key str

The private key for the Google Sheets API.

google_token_uri str

The Google token URI for the Google Sheets API.

SFTColumnMappingConfig

Bases: BaseModel

Defines the structure for column mapping configuration.

Attributes:

Name Type Description
input_columns Dict[str, str]

Mapping of template variables to DataFrame column names.

dataset_column_names str

Column name that will be created in the final processed dataset. This column contains the formatted prompt text and maps to dataset_text_field.

label_columns Optional[str]

Column name for labels.

image_columns Optional[Dict[str, str]]

Mapping of template variables to DataFrame column names containing image paths or URLs.

StorageConfig

Bases: BaseModel

Defines the storage configuration for model uploads.

Attributes:

Name Type Description
bucket_name str | None

The name of the bucket. None (default) means no bucket is configured and cloud uploads will be skipped.

upload_to_cloud bool

Whether to upload the model to the cloud.

object_prefix str

The object prefix in the bucket.

endpoint_url str | None

The complete URL to use for the storage client.

provider str

The cloud provider.

replace_existing_artifact bool

When True, upload proceeds even if the remote artifact already exists, relying on the provider's overwrite semantics. When False (default), uploading to an existing remote path raises FileExistsError before the upload is attempted.

VllmSamplingParamsConfig

Bases: BaseModel

Defines the sampling parameters for VLLM.

Attributes:

Name Type Description
min_p float

The minimum probability for the top-p sampling.

top_p float

The top-p value for sampling.

top_k int

The top-k value for sampling.

seed int

The seed for the random number generator.

max_tokens int

The maximum number of tokens to generate.

stop list[str]

The stop tokens for generation.

include_stop_str_in_output bool

Whether to include the stop string in the output.

temperature float

The temperature for sampling.