Imagine clients#
The Imagine SDK exposes two clients, each with a different programming paradigm: synchronous and asynchronous.
ImagineClient
is the synchronous Imagine client. If you don’t need asynchronous
programming on your Python code, or simply you are not familiar with asynchronous
programming, this is the client you want to use.
Otherwise, if you are leveraging asyncio
on your codebase, ImagineAsyncClient
might be a better choice.
Synchronous client#
- class imagine.ImagineClient(endpoint=None, api_key=None, max_retries=3, timeout=60, verify=False, proxy=None, debug=False, ctx=None)[source]#
Synchronous Imagine client. Provides methods for communicating with the Imagine API.
- chat(messages, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None, tools=None)[source]#
Invokes chat endpoint non-streaming version that returns ChatCompletionResponse for a given prompt
- Parameters:
messages (Sequence[ChatMessage | dict[str, str]]) – A list of chat-messages comprising the conversation so far
model (str | None) – the model to use for chat
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None
tools (list[dict[str, str | dict[str, Any]]] | None) – A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.
- Raises:
ImagineException
imagine.ImagineException
- Returns:
ChatCompletionResponse
- Return type:
- chat_stream(messages, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#
Invokes chat endpoint streaming version that returns Iterable ChatCompletionStreamResponse for a given prompt
- Parameters:
messages (Sequence[ChatMessage | dict[str, str]]) – A list of chat-messages comprising the conversation so far
model (str | None) – the model to use for chat
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None
- Raises:
ImagineException
imagine.ImagineException
- Returns:
ChatCompletionStreamResponse
- Return type:
- completion(prompt, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#
Invokes completions endpoint non-streaming version that returns CompletionResponse for a given prompt
- Parameters:
prompt (str) – prompt text for which completion needs to be generated
model (str | None) – the model to use for completion
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None
- Raises:
ImagineException
imagine.ImagineException
- Returns:
CompletionResponse object
- Return type:
- completion_stream(prompt, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#
Invokes completions endpoint streaming version that returns CompletionResponse for a given prompt
- Parameters:
prompt (str) – prompt text for which completion needs to be generated
model (str | None) – the model to use for completion
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None
- Raises:
ImagineException
imagine.ImagineException
- Returns:
CompletionStreamResponse object
- Return type:
- embeddings(texts, model=None)[source]#
An embeddings endpoint that returns embeddings for a single text
- Parameters:
text – The text to embed
model (str | None) – The embedding model to use
- Raises:
ImagineException
imagine.ImagineException
- Returns:
EmbeddingResponse: A response object containing the embeddings.
- Return type:
- get_available_models(model_type=None)[source]#
Returns a list of available models.
- Parameters:
model_type (ModelType | None) – Filter models by model type.
- Raises:
ImagineException
imagine.ImagineException
- Returns:
Available models.
- Return type:
- get_available_models_by_type(model_type=None)[source]#
Returns a mapping of available models by model type.
- Parameters:
model_type (ModelType | None) –
imagine.ModelType
Filter models by model type.- Raises:
ImagineException
imagine.ImagineException
- Returns:
Available models grouped by model type.
- Return type:
- get_chat_history(max_items=1)[source]#
Returns a list of Chat (response, request) pairs made by the user.
- Parameters:
max_items (int) – The number of items to retrieve
- Raises:
ImagineException
imagine.ImagineException
- Returns:
Returns a list of Chat response, request pairs made by the user.
- Return type:
- get_completion_history(max_items=1)[source]#
Returns a list of Completion (response, request) pairs made by the user.
- Parameters:
max_items (int) – The number of items to retrieve
- Raises:
ImagineException
imagine.ImagineException
- Returns:
Returns a list of Completion response, request pairs made by the user.
- Return type:
- get_embedding_history(max_items=1)[source]#
Returns a list of Embedding (response, request) pairs made by the user.
- Parameters:
max_items (int) – The number of items to retrieve
- Raises:
ImagineException
imagine.ImagineException
- Returns:
Returns a list of Embedding response, request pairs made by the user.
- Return type:
- get_reranker_history(max_items=1)[source]#
Returns a list of ReRanker response, request pairs made by the user.
- Parameters:
max_items (int) – The number of items to retrieve
- Raises:
ImagineException
imagine.ImagineException
- Returns:
Returns a list of ReRanker response, request pairs made by the user.
- Return type:
- health_check()[source]#
Check the health of the server, including databases ands models.
- Raises:
ImagineException
imagine.ImagineException
- Returns:
A HealthResponse object containing status of the server.
- Return type:
- images_generate(prompt, model=None, negative_prompt='blurry', seed=27, seed_increment=100, n=1, num_inference_steps=20, size='512x512', guidance_scale=6.5, cache_interval=None, response_format='b64_json')[source]#
Invokes images generate endpoint non-streaming version and returns an ImageResponse object
- Parameters:
prompt (str) – The prompt to guide the image generation
model (str | None) – The model to be used for generation, defaults to None
negative_prompt (str | None) – Characteristics to avoid in the image being generated , defaults to “blurry”
seed (int | None) – The initial value used to generate random numbers. Set a unique seed for reproducible image results., defaults to 27
seed_increment (int | None) – The amount by which the seed value increases with each iteration. Adjust this to create a series of visually consistent, yet unique images, defaults to 100
n (int | None) – Number of images to be generated, defaults to 1
num_inference_steps (int | None) – — The total inference steps taken during image generation. More steps usually lead to a higher quality image at the expense of slower inference., defaults to 20
size (str | None) – The width x height in pixels of the generated image. defaults to 512x512
guidance_scale (float | None) – Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality., defaults to 6.5
cache_interval (int | None) – _description_, defaults to None
response_format (str | None) – “url” or “b64_json”, defaults to “b64_json”
- Raises:
- Returns:
ImageResponse object
- Return type:
- images_generate_stream(prompt, model=None, negative_prompt='blurry', seed=27, seed_increment=100, n=1, num_inference_steps=20, size='512x512', guidance_scale=6.5, cache_interval=None, response_format='b64_json')[source]#
Invokes images generate endpoint streaming version and returns an Iterable ImageResponse object
- Parameters:
prompt (str) – The prompt to guide the image generation
model (str | None) – The model to be used for generation, defaults to None
negative_prompt (str | None) – Characteristics to avoid in the image being generated , defaults to “blurry”
seed (int | None) – The initial value used to generate random numbers. Set a unique seed for reproducible image results., defaults to 27
seed_increment (int | None) – The amount by which the seed value increases with each iteration. Adjust this to create a series of visually consistent, yet unique images, defaults to 100
n (int | None) – Number of images to be generated, defaults to 1
num_inference_steps (int | None) – — The total inference steps taken during image generation. More steps usually lead to a higher quality image at the expense of slower inference., defaults to 20
size (str | None) – The width x height in pixels of the generated image. defaults to 512x512
guidance_scale (float | None) – Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality., defaults to 6.5
cache_interval (int | None) – _description_, defaults to None
response_format (str | None) – “url” or “b64_json”, defaults to “b64_json”
- Raises:
- Returns:
ImageResponse object
- Return type:
- ping()[source]#
Ping the API to check if the Imagine server is reachable.
- Raises:
ImagineException
imagine.ImagineException
- Returns:
A PingResponse object containing status of the server.
- Return type:
- reranker(query, documents, model=None, top_n=None, return_documents=None)[source]#
Reranker endpoint receives as input a query, a list of documents, and other arguments such as the model name, and returns a response containing the reranking results.
- Parameters:
query (str) – The query as a string
documents (list[str]) – The documents to be reranked as a list of strings.
model (str | None) – The reranker model to use.
top_n (int | None) – The number of most relevant documents to return. If not specified, the reranking results of all documents will be returned.
return_documents (bool | None) – Whether to return the documents in the response. Defaults to false
- Raises:
ImagineException
imagine.ImagineException
- Returns:
ReRankerResponse object: A response object containing the Similarity Score.
- Return type:
- transcribe(input_file, model=None)[source]#
Transcribe an audio file to text.
- Parameters:
- Returns:
Response with the transcribed audio.
- Return type:
- translate(prompt, model, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#
Invokes translate endpoint that returns TranslateResponse for a given prompt
- Parameters:
prompt (str) – prompt text that needs to be translated
model (str) – the model to use for translation
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None
- Raises:
ImagineException
imagine.ImagineException
- Returns:
TranslateResponse object
- Return type:
- usage(aggregation_duration=None, since=None, until=None, model=None)[source]#
Report usage statistics for the user.
- Parameters:
- Raises:
ImagineException
imagine.ImagineException
- Returns:
The usage report as a UsageRespone object
- Return type:
Asynchronous client#
- class imagine.ImagineAsyncClient(endpoint=None, api_key=None, max_retries=3, timeout=60, verify=False, max_concurrent_requests=64, proxy=None, debug=False)[source]#
Asynchronous Imagine client. Provides methods for communicating with the Imagine API using
asyncio
.- async chat(messages, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None, tools=None)[source]#
Invokes chat endpoint non-streaming version that returns ChatCompletionResponse for a given prompt
- Parameters:
messages (list[Any]) – A list of chat-messages comprising the conversation so far
model (str | None) – the model to use for chat
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None
tools (list[dict[str, str | dict[str, Any]]] | None) – A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.
- Raises:
ImagineException
imagine.ImagineException
- Returns:
ChatCompletionResponse object
- Return type:
- async chat_stream(messages, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#
Invokes chat endpoint streaming version that returns ChatCompletionStreamResponse for a given prompt
- Parameters:
messages (list[Any]) – A list of chat-messages comprising the conversation so far
model (str | None) – the model to use for chat
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None
- Raises:
ImagineException
imagine.ImagineException
- Returns:
ChatCompletionStreamResponse object
- Return type:
- async completion(prompt, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#
Invokes completions endpoint non-streaming version that returns CompletionResponse for a given prompt
- Parameters:
prompt (str) – prompt text for which completion needs to be generated
model (str | None) – the model to use for completion
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None
- Raises:
ImagineException
imagine.ImagineException
- Returns:
CompletionResponse object
- Return type:
- async completion_stream(prompt, model=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#
Invokes completions endpoint streaming version that returns CompletionResponse for a given prompt
- Parameters:
prompt (str) – prompt text for which completion needs to be generated
model (str | None) – the model to use for completion
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None
- Raises:
ImagineException
imagine.ImagineException
- Returns:
CompletionStreamResponse object
- Return type:
- async embeddings(text, model=None)[source]#
An embeddings endpoint that returns embeddings for a single text
- Parameters:
- Raises:
ImagineException
imagine.ImagineException
- Returns:
EmbeddingResponse: A response object containing the embeddings.
- Return type:
- async get_available_models(model_type=None)[source]#
Returns a list of available models.
- Parameters:
model_type (ModelType | None) – Filter models by model type.
- Raises:
ImagineException
imagine.ImagineException
- Returns:
Available models.
- Return type:
- async get_available_models_by_type(model_type=None)[source]#
Returns a mapping of available models by model type.
- Parameters:
model_type (ModelType | None) –
imagine.ModelType
Filter models by model type.- Raises:
ImagineException
imagine.ImagineException
- Returns:
Available models grouped by model type.
- Return type:
- async get_chat_history(max_items=1)[source]#
Returns a list of Chat (response, request) pairs made by the user.
- Parameters:
max_items (int) – The number of items to retrieve
- Raises:
ImagineException
imagine.ImagineException
- Returns:
Returns a list of Chat response, request pairs made by the user.
- Return type:
- async get_completion_history(max_items=1)[source]#
Returns a list of Completion (response, request) pairs made by the user.
- Parameters:
max_items (int) – The number of items to retrieve
- Raises:
ImagineException
imagine.ImagineException
- Returns:
Returns a list of Completion response, request pairs made by the user.
- Return type:
- async get_embedding_history(max_items=1)[source]#
Returns a list of Embedding (response, request) pairs made by the user.
- Parameters:
max_items (int) – The number of items to retrieve
- Raises:
ImagineException
imagine.ImagineException
- Returns:
Returns a list of Embedding response, request pairs made by the user.
- Return type:
- async get_reranker_history(max_items=1)[source]#
Returns a list of ReRanker response, request pairs made by the user.
- Parameters:
max_items (int) – The number of items to retrieve
- Raises:
ImagineException
imagine.ImagineException
- Returns:
Returns a list of ReRanker response, request pairs made by the user.
- Return type:
- async health_check()[source]#
Check the health of the server, including databases ands models.
- Raises:
ImagineException
imagine.ImagineException
- Returns:
A HealthResponse object containing status of the server.
- Return type:
- async images_generate(prompt, model=None, negative_prompt='blurry', seed=27, seed_increment=100, n=1, num_inference_steps=20, size='512x512', guidance_scale=6.5, cache_interval=None, response_format='b64_json')[source]#
Invokes images generate endpoint non-streaming version and returns an ImageResponse object
- Parameters:
prompt (str) – The prompt to guide the image generation
model (str | None) – The model to be used for generation, defaults to None
negative_prompt (str | None) – Characteristics to avoid in the image being generated , defaults to “blurry”
seed (int | None) – The initial value used to generate random numbers. Set a unique seed for reproducible image results., defaults to 27
seed_increment (int | None) – The amount by which the seed value increases with each iteration. Adjust this to create a series of visually consistent, yet unique images, defaults to 100
n (int | None) – Number of images to be generated, defaults to 1
num_inference_steps (int | None) – — The total inference steps taken during image generation. More steps usually lead to a higher quality image at the expense of slower inference., defaults to 20
size (str | None) – The width x height in pixels of the generated image. defaults to 512x512
guidance_scale (float | None) – Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality., defaults to 6.5
cache_interval (int | None) – _description_, defaults to None
response_format (str | None) – “url” or “b64_json”, defaults to “b64_json”
- Raises:
- Returns:
ImageResponse object
- Return type:
- async images_generate_stream(prompt, model=None, negative_prompt='blurry', seed=27, seed_increment=100, n=1, num_inference_steps=20, size='512x512', guidance_scale=6.5, cache_interval=None, response_format='b64_json')[source]#
Invokes images generate endpoint streaming version and returns an Iterable ImageResponse object
- Parameters:
prompt (str) – The prompt to guide the image generation
model (str | None) – The model to be used for generation, defaults to None
negative_prompt (str | None) – Characteristics to avoid in the image being generated , defaults to “blurry”
seed (int | None) – The initial value used to generate random numbers. Set a unique seed for reproducible image results., defaults to 27
seed_increment (int | None) – The amount by which the seed value increases with each iteration. Adjust this to create a series of visually consistent, yet unique images, defaults to 100
n (int | None) – Number of images to be generated, defaults to 1
num_inference_steps (int | None) – — The total inference steps taken during image generation. More steps usually lead to a higher quality image at the expense of slower inference., defaults to 20
size (str | None) – The width x height in pixels of the generated image. defaults to 512x512
guidance_scale (float | None) – Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality., defaults to 6.5
cache_interval (int | None) – _description_, defaults to None
response_format (str | None) – “url” or “b64_json”, defaults to “b64_json”
- Raises:
- Returns:
ImageResponse object
- Return type:
AsyncGenerator[ImageResponse, None]
- async ping()[source]#
Ping the API to check if the Imagine server is reachable.
- Raises:
ImagineException
imagine.ImagineException
- Returns:
A PingResponse object containing status of the server.
- Return type:
- async reranker(query, documents, model=None, top_n=None, return_documents=None)[source]#
A ReRanker endpoint that returns similarity score for an input pair
- Parameters:
query (str) – The query as a string
documents (list[str]) – The documents to be reranked as a list of strings.
model (str | None) – The reranker model to use.
top_n (int | None) – The number of most relevant documents to return. If not specified, the reranking results of all documents will be returned.
return_documents (bool | None) – Whether to return the documents in the response. Defaults to false
- Raises:
ImagineException
imagine.ImagineException
- Returns:
ReRankerResponse object: A response object containing the Similarity Score
- Return type:
- async transcribe(input_file, model=None)[source]#
Transcribe an audio file to text.
- Parameters:
- Returns:
Response with the transcribed audio.
- Return type:
- async translate(prompt, model, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None)[source]#
Invokes translate endpoint that returns TranslateResponse for a given prompt
- Parameters:
prompt (str) – prompt text that needs to be translated
model (str) – the model to use for translation
frequency_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None
presence_penalty (float | None) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None
stop (list[str] | None) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None
max_seconds (int | None) – TBD, defaults to None
ignore_eos (bool | None) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None
skip_special_tokens (bool | None) – Whether to skip special tokens in the output., defaults to None
stop_token_ids (list[list[int]] | None) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None
max_tokens (int | None) – The maximum number of tokens that can be generated in translation, defaults to None
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None
top_k (int | None) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None
top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None
- Raises:
ImagineException
imagine.ImagineException
- Returns:
TranslateResponse object
- Return type:
- async usage(aggregation_duration=None, since=None, until=None, model=None)[source]#
Report usage statistics for the user.
- Parameters:
- Raises:
ImagineException
imagine.ImagineException
- Returns:
The usage report as a UsageRespone object
- Return type: