Skip to content

Cloud Storage

Cloud storage module for GLLM training.

This module provides functionality for storing model artifacts in various cloud storage providers such as AWS S3, Google Cloud Storage, and Azure Blob Storage.

CloudStorageClient(config)

Bases: ABC

Abstract base class for cloud storage clients.

This class defines the interface that all cloud storage implementations must follow. It provides methods for uploading and downloading files from cloud storage services.

Initialize the cloud storage client with provider-specific configuration.

Parameters:

Name Type Description Default
config Dict[str, str]

Configuration parameters for the storage client. The exact keys will depend on the specific cloud provider implementation.

required

delete_file(remote_path) abstractmethod

Delete a file from cloud storage.

Parameters:

Name Type Description Default
remote_path str

Path to the file in cloud storage.

required

Returns:

Name Type Description
bool bool

True if the file was deleted successfully, False otherwise.

Raises:

Type Description
FileNotFoundError

If the remote file does not exist.

RuntimeError

If the deletion fails.

download_file(remote_path, local_path) abstractmethod

Download a file from cloud storage.

Parameters:

Name Type Description Default
remote_path str

Path to the file in cloud storage.

required
local_path str | Path

Path where the downloaded file should be saved.

required

Returns:

Name Type Description
Path Path

Path to the downloaded file.

Raises:

Type Description
FileNotFoundError

If the remote file does not exist.

RuntimeError

If the download fails.

file_exists(remote_path) abstractmethod

Check whether a file exists in cloud storage.

Parameters:

Name Type Description Default
remote_path str

Path to check in cloud storage.

required

Returns:

Name Type Description
bool bool

True if the file exists, False if it does not.

Raises:

Type Description
RuntimeError

If the existence check fails due to an auth or API error.

get_storage_client(provider, config) staticmethod

Factory method to get the appropriate storage client.

Parameters:

Name Type Description Default
provider str

The storage provider to use (e.g., "s3", "gcs", "azure").

required
config Dict[str, str]

Configuration for the storage client.

required

Returns:

Name Type Description
CloudStorageClient CloudStorageClient

The storage client for the specified provider.

Raises:

Type Description
ValueError

If the provider is not supported.

upload_file(local_path, remote_path, check_conflict=False) abstractmethod

Upload a file to cloud storage.

Parameters:

Name Type Description Default
local_path str | Path

Path to the local file to upload.

required
remote_path str

Path in the cloud storage where the file should be saved.

required
check_conflict bool

When True, fail atomically if the remote object already exists (avoids TOCTOU race between existence check and upload). Defaults to False.

False

Returns:

Name Type Description
str str

URL or identifier of the uploaded file in the cloud storage.

Raises:

Type Description
FileNotFoundError

If the local file does not exist.

FileExistsError

If check_conflict is True and the remote object already exists.

RuntimeError

If the upload fails.

CloudStorageRegistry

Registry for available cloud storage clients.

get_client(provider, config) classmethod

Get an instantiated storage client by provider name.

Parameters:

Name Type Description Default
provider str

Name of the storage provider.

required
config Dict[str, str]

Configuration parameters for the storage client.

required

Returns:

Name Type Description
CloudStorageClient CloudStorageClient

An instance of the storage client.

Raises:

Type Description
ValueError

If the provider name is not recognized.

list_available_providers() classmethod

List all registered storage provider names.

Returns:

Type Description
list[str]

list[str]: A list of storage provider names.

register(provider, client_class) classmethod

Register a cloud storage client class.

Parameters:

Name Type Description Default
provider str

Identifier for the storage provider (e.g., "s3", "gcs", "azure").

required
client_class Type[CloudStorageClient]

The storage client class.

required

S3StorageClient(config)

Bases: CloudStorageClient

AWS S3 and S3-compatible storage client implementation.

This class implements the CloudStorageClient interface for AWS S3 and other S3-compatible storage providers like MinIO.

Initialize the S3 storage client.

Parameters:

Name Type Description Default
config Dict[str, str]

S3 configuration parameters: - bucket_name: Name of the S3 bucket. - prefix: Optional prefix (folder) within the bucket. - endpoint_url: Optional endpoint URL for S3-compatible services.

required

Raises:

Type Description
ValueError

If required configuration parameters are missing.

delete_file(remote_path)

Delete a file from S3 storage.

Parameters:

Name Type Description Default
remote_path str

Path to the file in S3.

required

Returns:

Name Type Description
bool bool

True if the file was deleted successfully.

Raises:

Type Description
FileNotFoundError

If the remote file does not exist.

RuntimeError

If the deletion fails.

download_file(remote_path, local_path)

Download a file from S3 storage.

Parameters:

Name Type Description Default
remote_path str

Path to the file in S3.

required
local_path str | Path

Path where the downloaded file should be saved.

required

Returns:

Name Type Description
Path Path

Path to the downloaded file.

Raises:

Type Description
FileNotFoundError

If the remote file does not exist.

RuntimeError

If the download fails.

file_exists(remote_path)

Check whether a file exists in S3 storage.

Uses HeadObject to probe existence without downloading the file.

Parameters:

Name Type Description Default
remote_path str

Path to check in S3.

required

Returns:

Name Type Description
bool bool

True if the object exists, False if it returns a 404.

Raises:

Type Description
RuntimeError

If the check fails due to an auth or API error (e.g. 403, ExpiredToken).

upload_file(local_path, remote_path, check_conflict=False)

Upload a file to S3 storage.

For files up to 5 GB the upload uses a single PutObject request. When check_conflict=True, IfNoneMatch="*" is passed so that S3 rejects the upload atomically if the key already exists.

For files larger than 5 GB the upload uses boto3's managed multipart transfer (upload_fileobj), which has no practical size limit. When check_conflict=True for large files, a HeadObject pre-check is performed (non-atomic, but the only option for multipart uploads).

Parameters:

Name Type Description Default
local_path str | Path

Path to the local file to upload.

required
remote_path str

Path in S3 where the file should be saved. If a relative path is provided, it will be appended to the prefix.

required
check_conflict bool

When True, prevent overwriting an existing remote key. Defaults to False.

False

Returns:

Name Type Description
str str

S3 URI for the uploaded file (s3://{bucket_name}/{full_remote_path}).

Raises:

Type Description
FileNotFoundError

If the local file does not exist.

FileExistsError

If check_conflict is True and the remote key already exists.

RuntimeError

If the upload fails.

get_storage_client(provider, config)

Get a storage client instance by provider type.

Parameters:

Name Type Description Default
provider str

The type of storage provider.

required
config Dict[str, str]

Configuration parameters for the storage client.

required

Returns:

Name Type Description
CloudStorageClient CloudStorageClient

A storage client instance.