Utils

Utility functions for multimodal modules.

`combine_strings(texts)`

Combine multiple strings into a single string with newline separators.

This function takes a list of strings and returns a single string where each string from the list is on a new line. It filters out any empty or whitespace-only strings from the list before joining them.

Parameters:

Name	Type	Description	Default
`texts`	`list[str]`	A list of strings to combine.	required

Returns:

Name	Type	Description
`str`	`str`	A single string containing all valid strings, where each string is on a new line.

`configure_gst_plugin_path()`

Configures GST_PLUGIN_PATH if running in a Conda environment or virtual environment.

This function checks if GST_PLUGIN_PATH is not already set. If it is not set, it attempts to locate the GStreamer plugins directory within the current environment (checking CONDA_PREFIX and sys.prefix) and sets the GST_PLUGIN_PATH environment variable accordingly.

This ensures that GStreamer plugins installed in the environment are correctly found.

`convert_audio_to_mono_flac(input_audio_bytes)`

Convert audio binary data to mono FLAC format.

This method standardizes the audio format to mono FLAC to simplify processing by having a consistent file extension and single audio channel, while preserving the audio information.

This method performs two operations: 1. Converts the input audio to mono (single channel) 2. Encodes the audio in FLAC format for optimal speech recognition

FLAC (Free Lossless Audio Codec) is chosen because: 1. Lossless compression preserves audio quality for accurate transcription 2. More bandwidth efficient compared to uncompressed formats like LINEAR16 3. supports variable bit depths (16/24-bit) automatically -> no need to specify sample rate

Mono channel is chosen because: 1. Speech recognition models are optimized for single-channel audio 2. Reduces bandwidth and processing overhead 3. Simplifies processing by avoiding multi-channel complexity 4. Ensures consistent results across different input formats

Parameters:

Name	Type	Description	Default
`input_audio_bytes`	`bytes`	Input audio data in binary format.	required

Returns:

Name	Type	Description
`bytes`	`bytes`	Audio data converted to mono FLAC format.

`extract_video_frame_at_timestamp(video_path, timestamp, output_format='PNG')`

Extract a single frame from a video at a specific timestamp.

This function extracts a frame from a video file at the specified timestamp and returns it as raw image bytes in the specified format.

Parameters:

Name	Type	Description	Default
`video_path`	`str`	Path to the video file.	required
`timestamp`	`float`	Time offset in seconds from which to extract the frame.	required
`output_format`	`str`	Image format for the output (PNG, JPEG, etc.). Defaults to "PNG".	`'PNG'`

Returns:

Name	Type	Description
`bytes`	`bytes`	Raw image bytes in the specified format.

Raises:

Type	Description
`FileNotFoundError`	If the video file doesn't exist.
`ValueError`	If timestamp is negative or frame extraction fails.
`ImportError`	If required libraries (cv2/opencv-python) are not installed.

Examples:

>>> frame_bytes = extract_video_frame_at_timestamp("video.mp4", 5.5)
>>> frame_bytes = extract_video_frame_at_timestamp("video.mp4", 10.0, "JPEG")

`get_audio_duration(audio_binary_data)`

Get the duration of the audio in seconds.

Parameters:

Name	Type	Description	Default
`audio_binary_data`	`bytes`	The binary data of the audio.	required

Returns:

Name	Type	Description
`float`	`float`	The duration of the audio in seconds.

`get_audio_from_base64(audio_source)`

Attempt to decode a base64 encoded audio string and verify if it's valid audio data.

Parameters:

Name	Type	Description	Default
`audio_source`	`str`	The potential base64 encoded audio string to decode.	required

Returns:

Type	Description
`bytes \| None`	bytes \| None: The decoded audio data if successful and valid, None otherwise.

`get_audio_from_downloadable_url(audio_source, timeout=1 * 60)`

Get the audio from a downloadable URL and return its binary data if valid.

This function attempts to download audio content from a downloadable URL (e.g. Google Drive, OneDrive) and validates that the downloaded content is audio data.

Parameters:

Name	Type	Description	Default
`audio_source`	`str`	The downloadable URL of the audio file.	required
`timeout`	`int`	The timeout for the HTTP request in seconds. Defaults to 1 minute.	`1 * 60`

Returns:

Type	Description
`bytes \| None`	bytes \| None: Binary data of the audio file if valid audio content is downloaded, None if the request fails or content is not valid audio.

`get_audio_from_file_path(audio_source)`

Read audio file and return its binary data if valid.

Parameters:

Name	Type	Description	Default
`audio_source`	`str`	Path to the audio file.	required

Returns:

Type	Description
`bytes \| None`	bytes \| None: Binary data of the audio file if valid, None otherwise.

`get_audio_from_youtube_url(audio_source, proxy=None)`

Extract audio from a YouTube video URL and return it as binary data.

This function downloads a YouTube video and extracts its audio track in MP3 format. The audio is stored in memory and validated before being returned.

Parameters:

Name	Type	Description	Default
`audio_source`	`str`	The YouTube video URL to extract audio from.	required
`proxy`	`str \| None`	The proxy URL to use for the YouTube request. Defaults to None.	`None`

Returns:

Type	Description
`bytes \| None`	bytes \| None: Binary audio data if successfully downloaded and valid, None if download fails or audio is invalid.

`get_file_from_file_path(source)`

Read image file and return its binary data if valid.

Parameters:

Name	Type	Description	Default
`source`	`str`	Path to the image file.	required

Returns:

Type	Description
`bytes \| None`	bytes \| None: Binary data of the image file if valid, None otherwise.

`get_file_from_gdrive(gdrive_source)`

Download a file from Google Drive given a file ID or URL.

Parameters:

Name	Type	Description	Default
`gdrive_source`	`str`	Google Drive file ID or full URL.	required

Returns:

Type	Description
`bytes \| None`	bytes \| None: The file's binary content if successful, None otherwise.

`get_file_from_s3(url)`

Get file from S3 bucket.

This function attempts to get a file from an S3 bucket using: 1. Default credentials (from AWS CLI or instance profile). 2. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY). 3. Session token if available.

Parameters:

Name	Type	Description	Default
`url`	`str`	The S3 URL to get the file from. Can be in the format: 1. s3://bucket/key. 2. https://bucket.s3.amazonaws.com/key.	required

Returns:

Type	Description
`bytes \| None`	bytes \| None: The file contents if successful, None otherwise

Raises:

Type	Description
`ValueError`	If AWS credentials are not found or invalid.

`get_file_from_url(file_source, timeout=30, session=None)` `async`

Asynchronously download and validate an image from a URL.

This function performs the following steps: 1. Attempts to download the content from the provided URL. 2. Validates that the downloaded content is a valid image. 3. Returns the binary data if both steps succeed.

Parameters:

Name	Type	Description	Default
`file_source`	`str`	The URL of the file to download. Supports HTTP and HTTPS protocols.	required
`timeout`	`int`	The timeout for the HTTP request in seconds. Defaults to 30 seconds.	`30`
`session`	`Optional[ClientSession]`	An existing aiohttp session to use. If None, a new session will be created. Defaults to None.	`None`

Returns:

Type	Description
`bytes \| None`	bytes \| None: The downloaded image binary data if successful and valid, None if the download fails or the content is not a valid image.

`get_image_metadata(image_binary)`

Extract metadata from image binary data.

This function extracts metadata from the image, including: 1. GPS coordinates (latitude/longitude) if available in EXIF data

Parameters:

Name	Type	Description	Default
`image_binary`	`bytes`	The binary data of the image.	required

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Dictionary containing image metadata.

`get_unique_non_empty_strings(texts)`

Get unique non-empty strings from a list of strings and remove whitespace.

This function takes a list of strings and returns a list of strings where each string from the list is not empty or whitespace-only. It also removes duplicates.

Parameters:

Name	Type	Description	Default
`texts`	`list[str]`	A list of strings to combine.	required

Returns:

Type	Description
`list[str]`	list[str]: A list of strings where each string is not empty or whitespace-only.

`is_binary_data_audio(audio_binary_data)`

Check if the binary data is a valid audio file.

Parameters:

Name	Type	Description	Default
`audio_binary_data`	`bytes`	The binary data to check.	required

Returns:

Name	Type	Description
`bool`	`bool`	True if the binary data is a valid audio file, False otherwise.

`is_youtube_url(source)`

Check if the audio source is a YouTube URL.

Parameters:

Name	Type	Description	Default
`source`	`str`	The audio source to check.	required

Returns:

Name	Type	Description
`bool`	`bool`	True if the audio source is a YouTube URL, False otherwise.

Utils

combine_strings(texts)

configure_gst_plugin_path()

convert_audio_to_mono_flac(input_audio_bytes)

extract_video_frame_at_timestamp(video_path, timestamp, output_format='PNG')

get_audio_duration(audio_binary_data)

get_audio_from_base64(audio_source)

get_audio_from_downloadable_url(audio_source, timeout=1 * 60)

get_audio_from_file_path(audio_source)

get_audio_from_youtube_url(audio_source, proxy=None)

get_file_from_file_path(source)

get_file_from_gdrive(gdrive_source)

get_file_from_s3(url)

get_file_from_url(file_source, timeout=30, session=None) async

get_image_metadata(image_binary)

get_unique_non_empty_strings(texts)

is_binary_data_audio(audio_binary_data)

is_youtube_url(source)

`combine_strings(texts)`

`configure_gst_plugin_path()`

`convert_audio_to_mono_flac(input_audio_bytes)`

`extract_video_frame_at_timestamp(video_path, timestamp, output_format='PNG')`

`get_audio_duration(audio_binary_data)`

`get_audio_from_base64(audio_source)`

`get_audio_from_downloadable_url(audio_source, timeout=1 * 60)`

`get_audio_from_file_path(audio_source)`

`get_audio_from_youtube_url(audio_source, proxy=None)`

`get_file_from_file_path(source)`

`get_file_from_gdrive(gdrive_source)`

`get_file_from_s3(url)`

`get_file_from_url(file_source, timeout=30, session=None)` `async`

`get_image_metadata(image_binary)`

`get_unique_non_empty_strings(texts)`

`is_binary_data_audio(audio_binary_data)`

`is_youtube_url(source)`