Smart search retriever

Defines a web search retriever using SmartSearch SDK.

This module provides a retriever that uses the SmartSearch SDK to perform web searches and retrieve relevant content from the web.

`SmartSearchWebRetriever(base_url=None, token=None)`

Bases: BaseRetriever[list[Chunk]]

A web search retriever using SmartSearch SDK.

This retriever uses the SmartSearch SDK to perform web searches and retrieve relevant content from the web. It supports multiple search modes including web search, URL retrieval, page fetching, and content extraction.

Examples:

# Initialize the retriever
retriever = SmartSearchWebRetriever(
    base_url="https://your-smartsearch-endpoint",
    token="your-access-token"
)

# Perform a basic web search
results = await retriever.retrieve(
    "What is cloud computing?",
    top_k=5,
    result_type="snippets"
)

# Search with site filter
results = await retriever.retrieve(
    "machine learning frameworks",
    site="https://github.com",
    top_k=5
)

# Batch search
batch_results = await retriever.retrieve(
    ["query 1", "query 2"],
    top_k=5
)

Attributes:

Name	Type	Description
`client`	`WebSearchClient`	The SmartSearch web search client.
`base_url`	`str`	The base URL for the SmartSearch API.

Note

This class uses an async factory pattern. Use the create() class method to instantiate and authenticate in one step:

retriever = await SmartWebSearchRetriever.create(
    base_url="https://api.example.com",
    token="your-token"
)

Initialize the SmartSearchWebRetriever.

Note

This constructor does not authenticate. For automatic authentication, use the create() class method instead.

Parameters:

Name	Type	Description	Default
`base_url`	`str \| None`	The base URL for the SmartSearch API. If not provided, will use SMART_SEARCH_BASE_URL environment variable.	`None`
`token`	`str \| None`	The authentication token for the SmartSearch API. If not provided, will use SMART_SEARCH_TOKEN environment variable.	`None`

Raises:

Type	Description
`ValueError`	If base_url or token is not provided and environment variables are not set.

`create(base_url=None, token=None)` `async` `classmethod`

Create and authenticate a SmartSearchWebRetriever instance.

This is the recommended way to instantiate the retriever as it handles authentication during initialization.

Examples:

# Create with explicit credentials
retriever = await SmartSearchWebSearchRetriever.create(
    base_url="https://api.example.com",
    token="your-token"
)

# Create using environment variables
retriever = await SmartWebSearchRetriever.create()

Parameters:

Name	Type	Description	Default
`base_url`	`str \| None`	The base URL for the SmartSearch API. If not provided, will use SMART_SEARCH_BASE_URL environment variable.	`None`
`token`	`str \| None`	The authentication token for the SmartSearch API. If not provided, will use SMART_SEARCH_TOKEN environment variable.	`None`

Returns:

Name	Type	Description
`SmartSearchWebRetriever`	`SmartSearchWebRetriever`	An authenticated retriever instance.

Raises:

Type	Description
`ValueError`	If base_url or token is not provided and environment variables are not set.

`fetch_page(source, return_html=False, json_schema=None)` `async`

Fetch the content of a specific web page.

Parameters:

Name	Type	Description	Default
`source`	`str`	The URL of the web page to fetch.	required
`return_html`	`bool`	Whether to return raw HTML or cleaned text. Defaults to False.	`False`
`json_schema`	`dict[str, Any] \| None`	JSON schema for custom structured data extraction.	`None`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: The fetched page content.

`get_page_keypoints(query, source, top_k=SMARTSEARCH_DEFAULT_KEYPOINT_COUNT, json_schema=None)` `async`

Extract keypoints summarizing the content of a web page.

Parameters:

Name	Type	Description	Default
`query`	`str`	The focus topic for extracting keypoints.	required
`source`	`str`	The web page URL to analyze.	required
`top_k`	`int`	Number of keypoints to return. Defaults to SMARTSEARCH_DEFAULT_KEYPOINT_COUNT (3).	`SMARTSEARCH_DEFAULT_KEYPOINT_COUNT`
`json_schema`	`dict[str, Any] \| None`	JSON schema for custom extraction.	`None`

Returns:

Type	Description
`list[Chunk]`	list[Chunk]: List of Chunk objects containing the extracted keypoints.

`get_page_snippets(query, source, top_k=SMARTSEARCH_DEFAULT_SNIPPET_COUNT, snippet_style='paragraph', json_schema=None)` `async`

Extract relevant text snippets from a web page.

Parameters:

Name	Type	Description	Default
`query`	`str`	The text to match against the web page content.	required
`source`	`str`	The URL of the web page.	required
`top_k`	`int`	Number of snippets to extract. Defaults to SMARTSEARCH_DEFAULT_SNIPPET_COUNT (3).	`SMARTSEARCH_DEFAULT_SNIPPET_COUNT`
`snippet_style`	`SmartSearchSnippetStyle`	Style of snippet extraction. "paragraph" or "sentence". Defaults to "paragraph".	`'paragraph'`
`json_schema`	`dict[str, Any] \| None`	JSON schema for custom extraction.	`None`

Returns:

Type	Description
`list[Chunk]`	list[Chunk]: List of Chunk objects containing the extracted snippets.

`map_website(base_url, top_k=SMARTSEARCH_DEFAULT_MAP_SIZE, include_subdomains=False, query=None)` `async`

Map a website and discover its URL structure.

Parameters:

Name	Type	Description	Default
`base_url`	`str`	The base URL of the website to map.	required
`top_k`	`int`	Maximum number of URLs to return. Defaults to SMARTSEARCH_DEFAULT_MAP_SIZE (20).	`SMARTSEARCH_DEFAULT_MAP_SIZE`
`include_subdomains`	`bool`	Whether to include subdomains. Defaults to False.	`False`
`query`	`str \| None`	Search query to filter URLs by keywords.	`None`

Returns:

Type	Description
`list[str]`	list[str]: A list of URLs from the website map.

`retrieve(query, query_filter=None, top_k=SMARTSEARCH_DEFAULT_TOP_K, result_type='snippets', site=None, engine=None, **kwargs)` `async`

retrieve(query: str, query_filter: FilterClause | QueryFilter | None = None, top_k: int = SMARTSEARCH_DEFAULT_TOP_K, result_type: SmartSearchResultType = 'snippets', site: str | list[str] | None = None, engine: str | None = None, **kwargs: Any) -> list[Chunk]

retrieve(query: list[str], query_filter: FilterClause | QueryFilter | None = None, top_k: int = SMARTSEARCH_DEFAULT_TOP_K, result_type: SmartSearchResultType = 'snippets', site: str | list[str] | None = None, engine: str | None = None, **kwargs: Any) -> list[list[Chunk]]

Retrieve web search results based on the query.

This method performs a web search using the SmartSearch SDK and returns the results as a list of Chunk objects.

Parameters:

Name	Type	Description	Default
`query`	`str \| list[str]`	The query string or list of query strings to search for. If a list is provided, retrieval is performed for each query concurrently.	required
`query_filter`	`FilterClause \| QueryFilter \| None`	Filter criteria for the retrieval. Note: This parameter is not used by the SmartSearch API but is kept for interface consistency. Defaults to None.	`None`
`top_k`	`int`	The maximum number of results to retrieve. Defaults to SMARTSEARCH_DEFAULT_TOP_K (5).	`SMARTSEARCH_DEFAULT_TOP_K`
`result_type`	`ResultType`	Type of output format. Supported values: "snippets", "keypoints", "summary", "description". Defaults to "snippets".	`'snippets'`
`site`	`str \| list[str] \| None`	URL or list of URLs to limit search results to specific sites or domains. Defaults to None.	`None`
`engine`	`str \| None`	Search engine to use: "auto", "firecrawl", or "perplexity". Defaults to "auto".	`None`
`**kwargs`	`Any`	Additional parameters for the retrieval process.	`{}`

Returns:

Type	Description
`list[Chunk] \| list[list[Chunk]]`	list[Chunk] \| list[list[Chunk]]: Retrieved web search results as Chunk objects. Returns list[list[Chunk]] if query is a list of strings.

`search_urls(query, top_k=SMARTSEARCH_DEFAULT_TOP_K, site=None, engine=None)` `async`

Retrieve a list of URLs that match the given query.

Parameters:

Name	Type	Description	Default
`query`	`str`	The query string to search for.	required
`top_k`	`int`	The maximum number of URLs to retrieve. Defaults to SMARTSEARCH_DEFAULT_TOP_K (5).	`SMARTSEARCH_DEFAULT_TOP_K`
`site`	`str \| list[str] \| None`	URL or list of URLs to limit search.	`None`
`engine`	`str \| None`	Search engine to use.	`None`

Returns:

Type	Description
`list[str]`	list[str]: A list of URLs matching the query.

Smart search retriever

SmartSearchWebRetriever(base_url=None, token=None)

create(base_url=None, token=None) async classmethod

fetch_page(source, return_html=False, json_schema=None) async

get_page_keypoints(query, source, top_k=SMARTSEARCH_DEFAULT_KEYPOINT_COUNT, json_schema=None) async

get_page_snippets(query, source, top_k=SMARTSEARCH_DEFAULT_SNIPPET_COUNT, snippet_style='paragraph', json_schema=None) async

map_website(base_url, top_k=SMARTSEARCH_DEFAULT_MAP_SIZE, include_subdomains=False, query=None) async

retrieve(query, query_filter=None, top_k=SMARTSEARCH_DEFAULT_TOP_K, result_type='snippets', site=None, engine=None, **kwargs) async

search_urls(query, top_k=SMARTSEARCH_DEFAULT_TOP_K, site=None, engine=None) async

`SmartSearchWebRetriever(base_url=None, token=None)`

`create(base_url=None, token=None)` `async` `classmethod`

`fetch_page(source, return_html=False, json_schema=None)` `async`

`get_page_keypoints(query, source, top_k=SMARTSEARCH_DEFAULT_KEYPOINT_COUNT, json_schema=None)` `async`

`get_page_snippets(query, source, top_k=SMARTSEARCH_DEFAULT_SNIPPET_COUNT, snippet_style='paragraph', json_schema=None)` `async`

`map_website(base_url, top_k=SMARTSEARCH_DEFAULT_MAP_SIZE, include_subdomains=False, query=None)` `async`

`retrieve(query, query_filter=None, top_k=SMARTSEARCH_DEFAULT_TOP_K, result_type='snippets', site=None, engine=None, **kwargs)` `async`

`search_urls(query, top_k=SMARTSEARCH_DEFAULT_TOP_K, site=None, engine=None)` `async`