Compressor
Modules used to compress prompt components.
LLMLinguaCompressor(model_name='NousResearch/Llama-2-7b-hf', device_map='cuda', rate=0.5, target_token=-1, use_sentence_level_filter=False, use_context_level_filter=True, use_token_level_filter=True, rank_method='longllmlingua')
Bases: BaseCompressor
LLMLinguaCompressor is a wrapper for LongLLMLingua's PromptCompressor.
This class provides a simplified interface for using LongLLMLingua's compression capabilities within the GLLM series of libraries, with a focus on the 'longllmlingua' ranking method.
Initialize the LLMLinguaCompressor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str
|
The name of the language model to be used. Defaults to "NousResearch/Llama-2-7b-hf". |
'NousResearch/Llama-2-7b-hf'
|
device_map
|
str
|
The device to load the model onto, e.g., "cuda" for GPU. Defaults to "cuda". |
'cuda'
|
rate
|
float
|
The default compression rate to be used. Defaults to 0.5. |
0.5
|
target_token
|
int
|
The default target token count. Defaults to -1 (no specific target). |
-1
|
use_sentence_level_filter
|
bool
|
Whether to use sentence-level filtering. Defaults to False. |
False
|
use_context_level_filter
|
bool
|
Whether to use context-level filtering. Defaults to True. |
True
|
use_token_level_filter
|
bool
|
Whether to use token-level filtering. Defaults to True. |
True
|
rank_method
|
str
|
The ranking method to use. Recommended is "longllmlingua". Defaults to "longllmlingua". |
'longllmlingua'
|