Imagine clients

Imagine clients#

The Imagine SDK exposes two clients, each with a different programming paradigm: synchronous and asynchronous.

ImagineClient is the synchronous Imagine client. If you don’t need asynchronous programming on your Python code, or simply you are not familiar with asynchronous programming, this is the client you want to use.

Otherwise, if you are leveraging asyncio on your codebase, ImagineAsyncClient might be a better choice.

Synchronous client#

class imagine.ImagineClient(endpoint=None, api_key=None, max_retries=3, timeout=60, verify=False, proxy=None, debug=False, ctx=None)[source]#

Synchronous Imagine client. Provides methods for communicating with the Imagine API.

chat(messages, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None, tools=None)[source]#

Invokes chat endpoint non-streaming version that returns ChatCompletionResponse for a given prompt

Parameters:

messages (Sequence[ChatMessage | dict[str, str]]) – A list of chat-messages comprising the conversation so far
model (str | None) – the model to use for chat
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None
tools (list[dict[str, str | dict[str, Any]]] | None) – A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.

Raises:

ImagineException imagine.ImagineException

Returns:

ChatCompletionResponse

Return type:

ChatCompletionResponse

chat_stream(messages, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#

Invokes chat endpoint streaming version that returns Iterable ChatCompletionStreamResponse for a given prompt

Parameters:

messages (Sequence[ChatMessage | dict[str, str]]) – A list of chat-messages comprising the conversation so far
model (str | None) – the model to use for chat
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None

Raises:

ImagineException imagine.ImagineException

Returns:

ChatCompletionStreamResponse

Return type:

Iterable[ChatCompletionStreamResponse]

completion(prompt, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#

Invokes completions endpoint non-streaming version that returns CompletionResponse for a given prompt

Parameters:

prompt (str) – prompt text for which completion needs to be generated
model (str | None) – the model to use for completion
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None

Raises:

ImagineException imagine.ImagineException

Returns:

CompletionResponse object

Return type:

CompletionResponse

completion_stream(prompt, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#

Invokes completions endpoint streaming version that returns CompletionResponse for a given prompt

Parameters:

prompt (str) – prompt text for which completion needs to be generated
model (str | None) – the model to use for completion
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None

Raises:

ImagineException imagine.ImagineException

Returns:

CompletionStreamResponse object

Return type:

Iterable[CompletionStreamResponse]

embeddings(texts, model=None)[source]#

An embeddings endpoint that returns embeddings for a single text

Parameters:

text – The text to embed
model (str | None) – The embedding model to use

Raises:

ImagineException imagine.ImagineException

Returns:

EmbeddingResponse: A response object containing the embeddings.

Return type:

EmbeddingResponse

get_available_models(model_type=None)[source]#

Returns a list of available models.

Parameters:: model_type (ModelType | None) – Filter models by model type.
Raises:: ImagineException imagine.ImagineException
Returns:: Available models.
Return type:: list[str]

get_available_models_by_type(model_type=None)[source]#

Returns a mapping of available models by model type.

Parameters:: model_type (ModelType | None) – imagine.ModelType Filter models by model type.
Raises:: ImagineException imagine.ImagineException
Returns:: Available models grouped by model type.
Return type:: dict[ModelType, list[str]]

get_chat_history(max_items=1)[source]#

Returns a list of Chat (response, request) pairs made by the user.

Parameters:: max_items (int) – The number of items to retrieve
Raises:: ImagineException imagine.ImagineException
Returns:: Returns a list of Chat response, request pairs made by the user.
Return type:: list[list[ChatCompletionResponse | ChatCompletionRequest]]

get_completion_history(max_items=1)[source]#

Returns a list of Completion (response, request) pairs made by the user.

Parameters:: max_items (int) – The number of items to retrieve
Raises:: ImagineException imagine.ImagineException
Returns:: Returns a list of Completion response, request pairs made by the user.
Return type:: list[list[CompletionResponse | CompletionRequest]]

get_embedding_history(max_items=1)[source]#

Returns a list of Embedding (response, request) pairs made by the user.

Parameters:: max_items (int) – The number of items to retrieve
Raises:: ImagineException imagine.ImagineException
Returns:: Returns a list of Embedding response, request pairs made by the user.
Return type:: list[list[EmbeddingResponse | EmbeddingRequest]]

get_reranker_history(max_items=1)[source]#

Returns a list of ReRanker response, request pairs made by the user.

Parameters:: max_items (int) – The number of items to retrieve
Raises:: ImagineException imagine.ImagineException
Returns:: Returns a list of ReRanker response, request pairs made by the user.
Return type:: list[list[ReRankerResponse | ReRankerRequest]]

health_check()[source]#

Check the health of the server, including databases ands models.

Raises:: ImagineException imagine.ImagineException
Returns:: A HealthResponse object containing status of the server.
Return type:: HealthResponse

images_generate(prompt, model=None, negative_prompt='blurry', seed=27, seed_increment=100, n=1, num_inference_steps=20, size='512x512', guidance_scale=6.5, cache_interval=None, response_format='b64_json')[source]#

Invokes images generate endpoint non-streaming version and returns an ImageResponse object

Parameters:

prompt (str) – The prompt to guide the image generation
model (str | None) – The model to be used for generation, defaults to None
negative_prompt (str | None) – Characteristics to avoid in the image being generated , defaults to “blurry”
seed (int | None) – The initial value used to generate random numbers. Set a unique seed for reproducible image results., defaults to 27
seed_increment (int | None) – The amount by which the seed value increases with each iteration. Adjust this to create a series of visually consistent, yet unique images, defaults to 100
n (int | None) – Number of images to be generated, defaults to 1
num_inference_steps (int | None) – — The total inference steps taken during image generation. More steps usually lead to a higher quality image at the expense of slower inference., defaults to 20
size (str | None) – The width x height in pixels of the generated image. defaults to 512x512
guidance_scale (float | None) – Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality., defaults to 6.5
cache_interval (int | None) – _description_, defaults to None
response_format (str | None) – “url” or “b64_json”, defaults to “b64_json”

Raises:

ImagineException – imagine.ImagineException

Returns:

ImageResponse object

Return type:

ImageResponse

images_generate_stream(prompt, model=None, negative_prompt='blurry', seed=27, seed_increment=100, n=1, num_inference_steps=20, size='512x512', guidance_scale=6.5, cache_interval=None, response_format='b64_json')[source]#

Invokes images generate endpoint streaming version and returns an Iterable ImageResponse object

Parameters:

prompt (str) – The prompt to guide the image generation
model (str | None) – The model to be used for generation, defaults to None
negative_prompt (str | None) – Characteristics to avoid in the image being generated , defaults to “blurry”
seed (int | None) – The initial value used to generate random numbers. Set a unique seed for reproducible image results., defaults to 27
seed_increment (int | None) – The amount by which the seed value increases with each iteration. Adjust this to create a series of visually consistent, yet unique images, defaults to 100
n (int | None) – Number of images to be generated, defaults to 1
num_inference_steps (int | None) – — The total inference steps taken during image generation. More steps usually lead to a higher quality image at the expense of slower inference., defaults to 20
size (str | None) – The width x height in pixels of the generated image. defaults to 512x512
guidance_scale (float | None) – Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality., defaults to 6.5
cache_interval (int | None) – _description_, defaults to None
response_format (str | None) – “url” or “b64_json”, defaults to “b64_json”

Raises:

ImagineException – imagine.ImagineException

Returns:

ImageResponse object

Return type:

Iterable[ImageResponse]

ping()[source]#

Ping the API to check if the Imagine server is reachable.

Raises:: ImagineException imagine.ImagineException
Returns:: A PingResponse object containing status of the server.
Return type:: PingResponse

reranker(query, documents, model=None, top_n=None, return_documents=None)[source]#

Reranker endpoint receives as input a query, a list of documents, and other arguments such as the model name, and returns a response containing the reranking results.

Parameters:

query (str) – The query as a string
documents (list[str]) – The documents to be reranked as a list of strings.
model (str | None) – The reranker model to use.
top_n (int | None) – The number of most relevant documents to return. If not specified, the reranking results of all documents will be returned.
return_documents (bool | None) – Whether to return the documents in the response. Defaults to false

Raises:

ImagineException imagine.ImagineException

Returns:

ReRankerResponse object: A response object containing the Similarity Score.

Return type:

ReRankerResponse

transcribe(input_file, model=None)[source]#

Transcribe an audio file to text.

Parameters:

input_file (str | BinaryIO) – File object or path to the audio file.
model (str | None) – Name of the model generating the text.

Returns:

Response with the transcribed audio.

Return type:

TranscribeResponse

translate(prompt, model, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#

Invokes translate endpoint that returns TranslateResponse for a given prompt

Parameters:

prompt (str) – prompt text that needs to be translated
model (str) – the model to use for translation
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None

Raises:

ImagineException imagine.ImagineException

Returns:

TranslateResponse object

Return type:

TranslateResponse

usage(aggregation_duration=None, since=None, until=None, model=None)[source]#

Report usage statistics for the user.

Parameters:

aggregation_duration (str | None)
since (datetime | None) – Since date to report usage statistics for.
until (datetime | None) – Until date to report usage statistics for.
model (str | None) – Filter usage statistics by model type.

Raises:

ImagineException imagine.ImagineException

Returns:

The usage report as a UsageRespone object

Return type:

UsageResponse

Asynchronous client#

class imagine.ImagineAsyncClient(endpoint=None, api_key=None, max_retries=3, timeout=60, verify=False, max_concurrent_requests=64, proxy=None, debug=False)[source]#

Asynchronous Imagine client. Provides methods for communicating with the Imagine API using asyncio.

async chat(messages, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None, tools=None)[source]#

Invokes chat endpoint non-streaming version that returns ChatCompletionResponse for a given prompt

Parameters:

messages (list[Any]) – A list of chat-messages comprising the conversation so far
model (str | None) – the model to use for chat
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None
tools (list[dict[str, str | dict[str, Any]]] | None) – A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.

Raises:

ImagineException imagine.ImagineException

Returns:

ChatCompletionResponse object

Return type:

ChatCompletionResponse

async chat_stream(messages, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#

Invokes chat endpoint streaming version that returns ChatCompletionStreamResponse for a given prompt

Parameters:

messages (list[Any]) – A list of chat-messages comprising the conversation so far
model (str | None) – the model to use for chat
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None

Raises:

ImagineException imagine.ImagineException

Returns:

ChatCompletionStreamResponse object

Return type:

AsyncGenerator[ChatCompletionStreamResponse, None]

async completion(prompt, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#

Invokes completions endpoint non-streaming version that returns CompletionResponse for a given prompt

Parameters:

prompt (str) – prompt text for which completion needs to be generated
model (str | None) – the model to use for completion
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None

Raises:

ImagineException imagine.ImagineException

Returns:

CompletionResponse object

Return type:

CompletionResponse

async completion_stream(prompt, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#

Invokes completions endpoint streaming version that returns CompletionResponse for a given prompt

Parameters:

prompt (str) – prompt text for which completion needs to be generated
model (str | None) – the model to use for completion
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None

Raises:

ImagineException imagine.ImagineException

Returns:

CompletionStreamResponse object

Return type:

AsyncGenerator[CompletionStreamResponse, None]

async embeddings(text, model=None)[source]#

An embeddings endpoint that returns embeddings for a single text

Parameters:

text (str) – The text to embed
model (str | None) – The embedding model to use

Raises:

ImagineException imagine.ImagineException

Returns:

EmbeddingResponse: A response object containing the embeddings.

Return type:

EmbeddingResponse

async get_available_models(model_type=None)[source]#

Returns a list of available models.

Parameters:: model_type (ModelType | None) – Filter models by model type.
Raises:: ImagineException imagine.ImagineException
Returns:: Available models.
Return type:: list[str]

async get_available_models_by_type(model_type=None)[source]#

Returns a mapping of available models by model type.

Parameters:: model_type (ModelType | None) – imagine.ModelType Filter models by model type.
Raises:: ImagineException imagine.ImagineException
Returns:: Available models grouped by model type.
Return type:: dict[ModelType, list[str]]

async get_chat_history(max_items=1)[source]#

Returns a list of Chat (response, request) pairs made by the user.

Parameters:: max_items (int) – The number of items to retrieve
Raises:: ImagineException imagine.ImagineException
Returns:: Returns a list of Chat response, request pairs made by the user.
Return type:: list[list[ChatCompletionResponse | ChatCompletionRequest]]

async get_completion_history(max_items=1)[source]#

Returns a list of Completion (response, request) pairs made by the user.

Parameters:: max_items (int) – The number of items to retrieve
Raises:: ImagineException imagine.ImagineException
Returns:: Returns a list of Completion response, request pairs made by the user.
Return type:: list[list[CompletionResponse | CompletionRequest]]

async get_embedding_history(max_items=1)[source]#

Returns a list of Embedding (response, request) pairs made by the user.

Parameters:: max_items (int) – The number of items to retrieve
Raises:: ImagineException imagine.ImagineException
Returns:: Returns a list of Embedding response, request pairs made by the user.
Return type:: list[list[EmbeddingResponse | EmbeddingRequest]]

async get_reranker_history(max_items=1)[source]#

Returns a list of ReRanker response, request pairs made by the user.

Parameters:: max_items (int) – The number of items to retrieve
Raises:: ImagineException imagine.ImagineException
Returns:: Returns a list of ReRanker response, request pairs made by the user.
Return type:: list[list[ReRankerResponse | ReRankerRequest]]

async health_check()[source]#

Check the health of the server, including databases ands models.

Raises:: ImagineException imagine.ImagineException
Returns:: A HealthResponse object containing status of the server.
Return type:: HealthResponse

async images_generate(prompt, model=None, negative_prompt='blurry', seed=27, seed_increment=100, n=1, num_inference_steps=20, size='512x512', guidance_scale=6.5, cache_interval=None, response_format='b64_json')[source]#

Invokes images generate endpoint non-streaming version and returns an ImageResponse object

Parameters:

prompt (str) – The prompt to guide the image generation
model (str | None) – The model to be used for generation, defaults to None
negative_prompt (str | None) – Characteristics to avoid in the image being generated , defaults to “blurry”
seed (int | None) – The initial value used to generate random numbers. Set a unique seed for reproducible image results., defaults to 27
seed_increment (int | None) – The amount by which the seed value increases with each iteration. Adjust this to create a series of visually consistent, yet unique images, defaults to 100
n (int | None) – Number of images to be generated, defaults to 1
num_inference_steps (int | None) – — The total inference steps taken during image generation. More steps usually lead to a higher quality image at the expense of slower inference., defaults to 20
size (str | None) – The width x height in pixels of the generated image. defaults to 512x512
guidance_scale (float | None) – Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality., defaults to 6.5
cache_interval (int | None) – _description_, defaults to None
response_format (str | None) – “url” or “b64_json”, defaults to “b64_json”

Raises:

ImagineException – imagine.ImagineException

Returns:

ImageResponse object

Return type:

ImageResponse

async images_generate_stream(prompt, model=None, negative_prompt='blurry', seed=27, seed_increment=100, n=1, num_inference_steps=20, size='512x512', guidance_scale=6.5, cache_interval=None, response_format='b64_json')[source]#

Invokes images generate endpoint streaming version and returns an Iterable ImageResponse object

Parameters:

prompt (str) – The prompt to guide the image generation
model (str | None) – The model to be used for generation, defaults to None
negative_prompt (str | None) – Characteristics to avoid in the image being generated , defaults to “blurry”
seed (int | None) – The initial value used to generate random numbers. Set a unique seed for reproducible image results., defaults to 27
seed_increment (int | None) – The amount by which the seed value increases with each iteration. Adjust this to create a series of visually consistent, yet unique images, defaults to 100
n (int | None) – Number of images to be generated, defaults to 1
num_inference_steps (int | None) – — The total inference steps taken during image generation. More steps usually lead to a higher quality image at the expense of slower inference., defaults to 20
size (str | None) – The width x height in pixels of the generated image. defaults to 512x512
guidance_scale (float | None) – Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality., defaults to 6.5
cache_interval (int | None) – _description_, defaults to None
response_format (str | None) – “url” or “b64_json”, defaults to “b64_json”

Raises:

ImagineException – imagine.ImagineException

Returns:

ImageResponse object

Return type:

AsyncGenerator[ImageResponse, None]

async ping()[source]#

Ping the API to check if the Imagine server is reachable.

Raises:: ImagineException imagine.ImagineException
Returns:: A PingResponse object containing status of the server.
Return type:: PingResponse

async reranker(query, documents, model=None, top_n=None, return_documents=None)[source]#

A ReRanker endpoint that returns similarity score for an input pair

Parameters:

query (str) – The query as a string
documents (list[str]) – The documents to be reranked as a list of strings.
model (str | None) – The reranker model to use.
top_n (int | None) – The number of most relevant documents to return. If not specified, the reranking results of all documents will be returned.
return_documents (bool | None) – Whether to return the documents in the response. Defaults to false

Raises:

ImagineException imagine.ImagineException

Returns:

ReRankerResponse object: A response object containing the Similarity Score

Return type:

ReRankerResponse

async transcribe(input_file, model=None)[source]#

Transcribe an audio file to text.

Parameters:

input_file (str | BinaryIO) – File object or path to the audio file.
model (str | None) – Name of the model generating the text.

Returns:

Response with the transcribed audio.

Return type:

TranscribeResponse

async translate(prompt, model, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#

Invokes translate endpoint that returns TranslateResponse for a given prompt

Parameters:

prompt (str) – prompt text that needs to be translated
model (str) – the model to use for translation
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None

Raises: