Skip to content

Utils

Utility functions for multimodal modules.

combine_strings(texts)

Combine multiple strings into a single string with newline separators.

This function takes a list of strings and returns a single string where each string from the list is on a new line. It filters out any empty or whitespace-only strings from the list before joining them.

Parameters:

Name Type Description Default
texts list[str]

A list of strings to combine.

required

Returns:

Name Type Description
str str

A single string containing all valid strings, where each string is on a new line.

convert_audio_to_mono_flac(input_audio_bytes)

Convert audio binary data to mono FLAC format.

This method standardizes the audio format to mono FLAC to simplify processing by having a consistent file extension and single audio channel, while preserving the audio information.

This method performs two operations: 1. Converts the input audio to mono (single channel) 2. Encodes the audio in FLAC format for optimal speech recognition

FLAC (Free Lossless Audio Codec) is chosen because: 1. Lossless compression preserves audio quality for accurate transcription 2. More bandwidth efficient compared to uncompressed formats like LINEAR16 3. supports variable bit depths (16/24-bit) automatically -> no need to specify sample rate

Mono channel is chosen because: 1. Speech recognition models are optimized for single-channel audio 2. Reduces bandwidth and processing overhead 3. Simplifies processing by avoiding multi-channel complexity 4. Ensures consistent results across different input formats

Parameters:

Name Type Description Default
input_audio_bytes bytes

Input audio data in binary format.

required

Returns:

Name Type Description
bytes bytes

Audio data converted to mono FLAC format.

get_audio_duration(audio_binary_data)

Get the duration of the audio in seconds.

Parameters:

Name Type Description Default
audio_binary_data bytes

The binary data of the audio.

required

Returns:

Name Type Description
float float

The duration of the audio in seconds.

get_audio_from_base64(audio_source)

Attempt to decode a base64 encoded audio string and verify if it's valid audio data.

Parameters:

Name Type Description Default
audio_source str

The potential base64 encoded audio string to decode.

required

Returns:

Type Description
bytes | None

bytes | None: The decoded audio data if successful and valid, None otherwise.

get_audio_from_downloadable_url(audio_source, timeout=1 * 60)

Get the audio from a downloadable URL and return its binary data if valid.

This function attempts to download audio content from a downloadable URL (e.g. Google Drive, OneDrive) and validates that the downloaded content is audio data.

Parameters:

Name Type Description Default
audio_source str

The downloadable URL of the audio file.

required
timeout int

The timeout for the HTTP request in seconds. Defaults to 1 minute.

1 * 60

Returns:

Type Description
bytes | None

bytes | None: Binary data of the audio file if valid audio content is downloaded, None if the request fails or content is not valid audio.

get_audio_from_file_path(audio_source)

Read audio file and return its binary data if valid.

Parameters:

Name Type Description Default
audio_source str

Path to the audio file.

required

Returns:

Type Description
bytes | None

bytes | None: Binary data of the audio file if valid, None otherwise.

get_audio_from_youtube_url(audio_source, proxy=None)

Extract audio from a YouTube video URL and return it as binary data.

This function downloads a YouTube video and extracts its audio track in M4A format. The audio is stored in memory and validated before being returned.

Parameters:

Name Type Description Default
audio_source str

The YouTube video URL to extract audio from.

required
proxy str | None

The proxy URL to use for the YouTube request. Defaults to None.

None

Returns:

Type Description
bytes | None

bytes | None: Binary audio data if successfully downloaded and valid, None if download fails or audio is invalid.

get_file_from_file_path(source)

Read image file and return its binary data if valid.

Parameters:

Name Type Description Default
source str

Path to the image file.

required

Returns:

Type Description
bytes | None

bytes | None: Binary data of the image file if valid, None otherwise.

get_file_from_gdrive(gdrive_source)

Download a file from Google Drive given a file ID or URL.

Parameters:

Name Type Description Default
gdrive_source str

Google Drive file ID or full URL.

required

Returns:

Type Description
bytes | None

bytes | None: The file's binary content if successful, None otherwise.

get_file_from_s3(url)

Get file from S3 bucket.

This function attempts to get a file from an S3 bucket using: 1. Default credentials (from AWS CLI or instance profile). 2. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY). 3. Session token if available.

Parameters:

Name Type Description Default
url str

The S3 URL to get the file from. Can be in the format: 1. s3://bucket/key. 2. https://bucket.s3.amazonaws.com/key.

required

Returns:

Type Description
bytes | None

bytes | None: The file contents if successful, None otherwise

Raises:

Type Description
ValueError

If AWS credentials are not found or invalid.

get_file_from_url(file_source, timeout=30, session=None) async

Asynchronously download and validate an image from a URL.

This function performs the following steps: 1. Attempts to download the content from the provided URL. 2. Validates that the downloaded content is a valid image. 3. Returns the binary data if both steps succeed.

Parameters:

Name Type Description Default
file_source str

The URL of the file to download. Supports HTTP and HTTPS protocols.

required
timeout int

The timeout for the HTTP request in seconds. Defaults to 30 seconds.

30
session Optional[ClientSession]

An existing aiohttp session to use. If None, a new session will be created. Defaults to None.

None

Returns:

Type Description
bytes | None

bytes | None: The downloaded image binary data if successful and valid, None if the download fails or the content is not a valid image.

get_image_metadata(image_binary)

Extract metadata from image binary data.

This function extracts metadata from the image, including: 1. GPS coordinates (latitude/longitude) if available in EXIF data

Parameters:

Name Type Description Default
image_binary bytes

The binary data of the image.

required

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Dictionary containing image metadata.

get_unique_non_empty_strings(texts)

Get unique non-empty strings from a list of strings and remove whitespace.

This function takes a list of strings and returns a list of strings where each string from the list is not empty or whitespace-only. It also removes duplicates.

Parameters:

Name Type Description Default
texts list[str]

A list of strings to combine.

required

Returns:

Type Description
list[str]

list[str]: A list of strings where each string is not empty or whitespace-only.

is_binary_data_audio(audio_binary_data)

Check if the binary data is a valid audio file.

Parameters:

Name Type Description Default
audio_binary_data bytes

The binary data to check.

required

Returns:

Name Type Description
bool bool

True if the binary data is a valid audio file, False otherwise.

is_youtube_url(source)

Check if the audio source is a YouTube URL.

Parameters:

Name Type Description Default
source str

The audio source to check.

required

Returns:

Name Type Description
bool bool

True if the audio source is a YouTube URL, False otherwise.