Cloud Storage
Cloud storage module for GLLM training.
This module provides functionality for storing model artifacts in various cloud storage providers such as AWS S3, Google Cloud Storage, and Azure Blob Storage.
CloudStorageClient(config)
Bases: ABC
Abstract base class for cloud storage clients.
This class defines the interface that all cloud storage implementations must follow. It provides methods for uploading and downloading files from cloud storage services.
Initialize the cloud storage client with provider-specific configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
Dict[str, str]
|
Configuration parameters for the storage client. The exact keys will depend on the specific cloud provider implementation. |
required |
delete_file(remote_path)
abstractmethod
Delete a file from cloud storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path
|
str
|
Path to the file in cloud storage. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the file was deleted successfully, False otherwise. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the remote file does not exist. |
RuntimeError
|
If the deletion fails. |
download_file(remote_path, local_path)
abstractmethod
Download a file from cloud storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path
|
str
|
Path to the file in cloud storage. |
required |
local_path
|
str | Path
|
Path where the downloaded file should be saved. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Path to the downloaded file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the remote file does not exist. |
RuntimeError
|
If the download fails. |
file_exists(remote_path)
abstractmethod
Check whether a file exists in cloud storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path
|
str
|
Path to check in cloud storage. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the file exists, False if it does not. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the existence check fails due to an auth or API error. |
get_storage_client(provider, config)
staticmethod
Factory method to get the appropriate storage client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider
|
str
|
The storage provider to use (e.g., "s3", "gcs", "azure"). |
required |
config
|
Dict[str, str]
|
Configuration for the storage client. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
CloudStorageClient |
CloudStorageClient
|
The storage client for the specified provider. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the provider is not supported. |
upload_file(local_path, remote_path, check_conflict=False)
abstractmethod
Upload a file to cloud storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_path
|
str | Path
|
Path to the local file to upload. |
required |
remote_path
|
str
|
Path in the cloud storage where the file should be saved. |
required |
check_conflict
|
bool
|
When True, fail atomically if the remote object already exists (avoids TOCTOU race between existence check and upload). Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
URL or identifier of the uploaded file in the cloud storage. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the local file does not exist. |
FileExistsError
|
If check_conflict is True and the remote object already exists. |
RuntimeError
|
If the upload fails. |
CloudStorageRegistry
Registry for available cloud storage clients.
get_client(provider, config)
classmethod
Get an instantiated storage client by provider name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider
|
str
|
Name of the storage provider. |
required |
config
|
Dict[str, str]
|
Configuration parameters for the storage client. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
CloudStorageClient |
CloudStorageClient
|
An instance of the storage client. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the provider name is not recognized. |
list_available_providers()
classmethod
List all registered storage provider names.
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: A list of storage provider names. |
register(provider, client_class)
classmethod
Register a cloud storage client class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider
|
str
|
Identifier for the storage provider (e.g., "s3", "gcs", "azure"). |
required |
client_class
|
Type[CloudStorageClient]
|
The storage client class. |
required |
S3StorageClient(config)
Bases: CloudStorageClient
AWS S3 and S3-compatible storage client implementation.
This class implements the CloudStorageClient interface for AWS S3 and other S3-compatible storage providers like MinIO.
Initialize the S3 storage client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
Dict[str, str]
|
S3 configuration parameters: - bucket_name: Name of the S3 bucket. - prefix: Optional prefix (folder) within the bucket. - endpoint_url: Optional endpoint URL for S3-compatible services. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required configuration parameters are missing. |
delete_file(remote_path)
Delete a file from S3 storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path
|
str
|
Path to the file in S3. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the file was deleted successfully. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the remote file does not exist. |
RuntimeError
|
If the deletion fails. |
download_file(remote_path, local_path)
Download a file from S3 storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path
|
str
|
Path to the file in S3. |
required |
local_path
|
str | Path
|
Path where the downloaded file should be saved. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Path to the downloaded file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the remote file does not exist. |
RuntimeError
|
If the download fails. |
file_exists(remote_path)
Check whether a file exists in S3 storage.
Uses HeadObject to probe existence without downloading the file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path
|
str
|
Path to check in S3. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the object exists, False if it returns a 404. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the check fails due to an auth or API error (e.g. 403, ExpiredToken). |
upload_file(local_path, remote_path, check_conflict=False)
Upload a file to S3 storage.
For files up to 5 GB the upload uses a single PutObject request.
When check_conflict=True, IfNoneMatch="*" is passed so that S3
rejects the upload atomically if the key already exists.
For files larger than 5 GB the upload uses boto3's managed multipart
transfer (upload_fileobj), which has no practical size limit.
When check_conflict=True for large files, a HeadObject pre-check
is performed (non-atomic, but the only option for multipart uploads).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_path
|
str | Path
|
Path to the local file to upload. |
required |
remote_path
|
str
|
Path in S3 where the file should be saved. If a relative path is provided, it will be appended to the prefix. |
required |
check_conflict
|
bool
|
When True, prevent overwriting an existing remote key. Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
S3 URI for the uploaded file (s3://{bucket_name}/{full_remote_path}). |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the local file does not exist. |
FileExistsError
|
If check_conflict is True and the remote key already exists. |
RuntimeError
|
If the upload fails. |
get_storage_client(provider, config)
Get a storage client instance by provider type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider
|
str
|
The type of storage provider. |
required |
config
|
Dict[str, str]
|
Configuration parameters for the storage client. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
CloudStorageClient |
CloudStorageClient
|
A storage client instance. |