Utils
Utility functions for multimodal modules.
combine_strings(texts)
Combine multiple strings into a single string with newline separators.
This function takes a list of strings and returns a single string where each string from the list is on a new line. It filters out any empty or whitespace-only strings from the list before joining them.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts |
list[str]
|
A list of strings to combine. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
A single string containing all valid strings, where each string is on a new line. |
convert_audio_to_mono_flac(input_audio_bytes)
Convert audio binary data to mono FLAC format.
This method standardizes the audio format to mono FLAC to simplify processing by having a consistent file extension and single audio channel, while preserving the audio information.
This method performs two operations: 1. Converts the input audio to mono (single channel) 2. Encodes the audio in FLAC format for optimal speech recognition
FLAC (Free Lossless Audio Codec) is chosen because: 1. Lossless compression preserves audio quality for accurate transcription 2. More bandwidth efficient compared to uncompressed formats like LINEAR16 3. supports variable bit depths (16/24-bit) automatically -> no need to specify sample rate
Mono channel is chosen because: 1. Speech recognition models are optimized for single-channel audio 2. Reduces bandwidth and processing overhead 3. Simplifies processing by avoiding multi-channel complexity 4. Ensures consistent results across different input formats
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_audio_bytes |
bytes
|
Input audio data in binary format. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bytes |
bytes
|
Audio data converted to mono FLAC format. |
get_audio_duration(audio_binary_data)
Get the duration of the audio in seconds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_binary_data |
bytes
|
The binary data of the audio. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The duration of the audio in seconds. |
get_audio_from_base64(audio_source)
Attempt to decode a base64 encoded audio string and verify if it's valid audio data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_source |
str
|
The potential base64 encoded audio string to decode. |
required |
Returns:
| Type | Description |
|---|---|
bytes | None
|
bytes | None: The decoded audio data if successful and valid, None otherwise. |
get_audio_from_downloadable_url(audio_source, timeout=1 * 60)
Get the audio from a downloadable URL and return its binary data if valid.
This function attempts to download audio content from a downloadable URL (e.g. Google Drive, OneDrive) and validates that the downloaded content is audio data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_source |
str
|
The downloadable URL of the audio file. |
required |
timeout |
int
|
The timeout for the HTTP request in seconds. Defaults to 1 minute. |
1 * 60
|
Returns:
| Type | Description |
|---|---|
bytes | None
|
bytes | None: Binary data of the audio file if valid audio content is downloaded, None if the request fails or content is not valid audio. |
get_audio_from_file_path(audio_source)
Read audio file and return its binary data if valid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_source |
str
|
Path to the audio file. |
required |
Returns:
| Type | Description |
|---|---|
bytes | None
|
bytes | None: Binary data of the audio file if valid, None otherwise. |
get_audio_from_youtube_url(audio_source, proxy=None)
Extract audio from a YouTube video URL and return it as binary data.
This function downloads a YouTube video and extracts its audio track in M4A format. The audio is stored in memory and validated before being returned.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_source |
str
|
The YouTube video URL to extract audio from. |
required |
proxy |
str | None
|
The proxy URL to use for the YouTube request. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
bytes | None
|
bytes | None: Binary audio data if successfully downloaded and valid, None if download fails or audio is invalid. |
get_file_from_file_path(source)
Read image file and return its binary data if valid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source |
str
|
Path to the image file. |
required |
Returns:
| Type | Description |
|---|---|
bytes | None
|
bytes | None: Binary data of the image file if valid, None otherwise. |
get_file_from_gdrive(gdrive_source)
Download a file from Google Drive given a file ID or URL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gdrive_source |
str
|
Google Drive file ID or full URL. |
required |
Returns:
| Type | Description |
|---|---|
bytes | None
|
bytes | None: The file's binary content if successful, None otherwise. |
get_file_from_s3(url)
Get file from S3 bucket.
This function attempts to get a file from an S3 bucket using: 1. Default credentials (from AWS CLI or instance profile). 2. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY). 3. Session token if available.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url |
str
|
The S3 URL to get the file from. Can be in the format: 1. s3://bucket/key. 2. https://bucket.s3.amazonaws.com/key. |
required |
Returns:
| Type | Description |
|---|---|
bytes | None
|
bytes | None: The file contents if successful, None otherwise |
Raises:
| Type | Description |
|---|---|
ValueError
|
If AWS credentials are not found or invalid. |
get_file_from_url(file_source, timeout=30, session=None)
async
Asynchronously download and validate an image from a URL.
This function performs the following steps: 1. Attempts to download the content from the provided URL. 2. Validates that the downloaded content is a valid image. 3. Returns the binary data if both steps succeed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_source |
str
|
The URL of the file to download. Supports HTTP and HTTPS protocols. |
required |
timeout |
int
|
The timeout for the HTTP request in seconds. Defaults to 30 seconds. |
30
|
session |
Optional[ClientSession]
|
An existing aiohttp session to use. If None, a new session will be created. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
bytes | None
|
bytes | None: The downloaded image binary data if successful and valid, None if the download fails or the content is not a valid image. |
get_image_metadata(image_binary)
Extract metadata from image binary data.
This function extracts metadata from the image, including: 1. GPS coordinates (latitude/longitude) if available in EXIF data
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_binary |
bytes
|
The binary data of the image. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Dictionary containing image metadata. |
get_unique_non_empty_strings(texts)
Get unique non-empty strings from a list of strings and remove whitespace.
This function takes a list of strings and returns a list of strings where each string from the list is not empty or whitespace-only. It also removes duplicates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts |
list[str]
|
A list of strings to combine. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: A list of strings where each string is not empty or whitespace-only. |
is_binary_data_audio(audio_binary_data)
Check if the binary data is a valid audio file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_binary_data |
bytes
|
The binary data to check. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the binary data is a valid audio file, False otherwise. |
is_youtube_url(source)
Check if the audio source is a YouTube URL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source |
str
|
The audio source to check. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the audio source is a YouTube URL, False otherwise. |