Data Transfer Objects and exceptions

Contents

Data Transfer Objects and exceptions#

This page describes several classes that work as Data Transfer Objects (DTOs), to be used as input arguments for methods or functions, or as return values of these.

Input Arguments#

Instances of these classes are expected as input arguments by some methods or functions of the SDK. They contain the configuration and other details about the inference to perform.

imagine.ModelType[source]#

Supported values: ModelType.EMBEDDING,`ModelType.LLM`, ModelType.RERANKER, ModelType.TEXT_TO_IMAGE, ModelType.TRANSCRIBE, ModelType.TRANSLATE.

class imagine.ReRankerRequest(*, query, documents, top_n=None, model, return_documents=None)[source]#
documents: list[str]#

A list of document IDs or text

model: str#

The name of the model to use for re-ranking

query: str#

The query string to be used for re-ranking

return_documents: bool | None#

Whether to return the documents themselves (default: False)

top_n: int | None#

The number of top results to return (default: 1)

class imagine.ChatMessage(*, role, content, tool_calls=None, tool_call_id=None, name=None)[source]#
content: str#

The content of the message

name: str | None#

An optional name for the participant.

role: str#

The role of the message user, assistant, system, tool

tool_call_id: str | None#

The ID of the tool call.

tool_calls: list[ChatMessageToolCall] | None#

The tool calls generated by the model, such as function calls

class imagine.ChatCompletionRequest(*, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None, messages, model, stream, tools=None)[source]#

Bases: LLMSamplingParams

frequency_penalty: float | None#

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None

ignore_eos: bool | None#

Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None

max_seconds: int | None#

TBD, defaults to None

max_tokens: int | None#

The maximum number of tokens that can be generated, defaults to None

messages: list[ChatMessage]#

A list of [messages]

model: str#

The model to be used for the chat completion.

presence_penalty: float | None#

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None

repetition_penalty: float | None#

Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None

skip_special_tokens: bool | None#

Whether to skip special tokens in the output., defaults to None

stop: list[str] | None#

Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None

stop_token_ids: list[list[int]] | None#

List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None

stream: bool#

If set, partial message deltas will be sent

temperature: float | None#

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None

top_k: int | None#

Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None

top_p: float | None#

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None

class imagine.EmbeddingRequest(*, id, input, model)[source]#
id: str | None#

Unique object identifier.

input: str#

Input string for which embedding should be generated

model: str#

Model to be used for generation of embedding

class imagine.CompletionRequest(*, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, stop=None, max_seconds=None, ignore_eos=None, skip_special_tokens=None, stop_token_ids=None, max_tokens=None, temperature=None, top_k=None, top_p=None, prompt, model, stream)[source]#

Bases: LLMSamplingParams

frequency_penalty: float | None#

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim, defaults to None

ignore_eos: bool | None#

Whether to ignore the EOS token and continue generating tokens after the EOS token is generated., defaults to None

max_seconds: int | None#

TBD, defaults to None

max_tokens: int | None#

The maximum number of tokens that can be generated, defaults to None

model: str#

Model to be used for Completion Request

presence_penalty: float | None#

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics, defaults to None

prompt: str#

Prompt for completion request

repetition_penalty: float | None#

Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens., defaults to None

skip_special_tokens: bool | None#

Whether to skip special tokens in the output., defaults to None

stop: list[str] | None#

Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence., defaults to None

stop_token_ids: list[list[int]] | None#

List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens., defaults to None

stream: bool#

Should it be a Streaming Request

temperature: float | None#

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic., defaults to None

top_k: int | None#

Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens., defaults to None

top_p: float | None#

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both., defaults to None

Responses#

Instances of these classes are returned by some methods or functions of the SDK. They contain the data the user is interested into.

class imagine.EmbeddingResponse(*, id, object, data, model, usage)[source]#
property first_embedding: list[float]#

Gets the first content from the response

Returns:

embedding content

id: str#

Unique object identifier.

model: str#

Model name used.

object: str#

The object type, which is always “list”.

class imagine.TranslateResponse(*, id, object, created, model, choices, usage=None, generation_time=None)[source]#
class imagine.ReRankerResponse(*, id, data, failure=None, model, object, usage)[source]#
data: list[ReRankerObject]#

A list of ReRankerObject objects

failure: str | None#

An error message if the request failed

id: str#

A unique identifier for the response

model: str#

The name of the model used to generate the response

object: str#

The type of object being returned

usage: UsageInfo#

Information about the usage of the model

class imagine.HealthResponse(*, postgres, redis, models)[source]#
models: str#

Status of Models

postgres: str#

Status of Postgres

redis: str#

Status of Redis

class imagine.PingResponse(*, message, status)[source]#
message: str#

Ping Message

status: str#

Status

class imagine.UsageResponse(*, usage, overall)[source]#
overall: list[UsageRecordAggregated]#

list of overall usage per record type

usage: list[UsageRecord]#

list of usage record

class imagine.CompletionResponse(*, id, object, created, model, choices, usage=None, generation_time=None)[source]#
choices: list[CompletionResponseChoice]#

A list of chat completion choices

created: float#

The Unix timestamp of when the completion was created.

property first_text: str | None#

Gets the first text from the response

Returns:

text

generation_time: float | None#

generation time.

id: str#

A unique identifier for the completion.

model: str#

The model used for the completion.

object: str#

The object type, which is always completion.

usage: UsageInfo | None#

Usage Statistics

class imagine.CompletionStreamResponse(*, id, model, choices, created=None, object=None, usage=None)[source]#
choices: list[CompletionResponseStreamChoice]#

A list of completion choices

created: float | None#

The Unix timestamp of when the completion was created.

property first_content: str | None#

Gets the first content from the response

Returns:

message content

id: str#

A unique identifier for the completion

model: str#

The model used for the completion.

object: str | None#

The object type, which is always chat.completion.chunk

usage: UsageInfo | None#

Usage statistics for the completion request.

class imagine.ChatCompletionResponse(*, id, object, created, model, choices, usage)[source]#
choices: list[ChatCompletionResponseChoice]#

A list of chat completion choices

created: float#

The Unix timestamp of when the chat completion was created.

property first_content: str | None#

Gets the first content from the response

Returns:

message content

id: str#

A unique identifier for the chat completion.

model: str#

The model used for the chat completion.

object: str#

The object type, which is always chat.completion.

usage: UsageInfo#

Usage statistics for the completion request.

class imagine.ChatCompletionResponseChoice(*, index, message, finish_reason)[source]#
finish_reason: FinishReason | None#

The reason the model stopped generating tokens. This will be stop if the model hit a natural stop point or a provided stop sequence, length if the maximum number of tokens specified in the request was reached error in case of error

index: int#

The index of the choice in the list of choices.

message: ChatMessage#

A chat completion message generated by the model.

class imagine.ChatCompletionStreamResponse(*, id, model, choices, created=None, object=None, usage=None)[source]#
choices: list[ChatCompletionResponseStreamChoice]#

A list of chat completion choices

created: float | None#

The Unix timestamp of when the chat completion was created.

property first_content: str | None#

Gets the first content from the response

Returns:

message content

id: str#

A unique identifier for the chat completion

model: str#

The model used for the chat completion.

object: str | None#

The object type, which is always chat.completion.chunk.

usage: UsageInfo | None#

Usage statistics for the completion request.

class imagine.TranscribeResponse(*, generation_time, id, text, ts)[source]#
generation_time: float#

Time taken to generate the response (In seconds)

id: str#

ID of the transcription request

text: str#

Audio transcription

ts: str | None#

The timestamp of when the response was generated

class imagine.ImageResponse(*, id, model, object, created, data)[source]#
created: float#

The Unix timestamp of when the completion was created.

data: list[Image]#

Data object which is a list of Image objects.

model: str#

Model used for Image Generation

object: str#

The object type, which is always text_to_image.

class imagine.ChatCompletionResponseStreamChoice(*, index, delta, finish_reason)[source]#
delta: DeltaMessage#

A chat completion delta generated by streamed model responses.

finish_reason: FinishReason | None#

The reason the model stopped generating tokens. This will be stop if the model hit a natural stop point or a provided stop sequence, length if the maximum number of tokens specified in the request was reached error in case of error

index: int#

The index of the choice in the list of choices.

class imagine.DeltaMessage(*, role=None, content=None)[source]#
content: str | None#

The content of the message

role: str | None#

The role of the message user, assistant, system

class imagine.FinishReason(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
class imagine.UsageInfo(*, prompt_tokens=None, total_tokens=None, completion_tokens=None)[source]#
completion_tokens: int | None#

Number of tokens in the generated completion

prompt_tokens: int | None#

Number of tokens in the prompt

total_tokens: int | None#

Total number of tokens used in the request (prompt + completion).

Exceptions#

The following are exceptions returned by the SDK:

class imagine.ImagineException(message=None)[source]#

Base Exception class, returned when nothing more specific applies

class imagine.ImagineAPITooManyRequestsException(message=None, http_status=None, headers=None)[source]#

Returned when we receive a 429 response from the API, indicating that we probably hit a rate limit