Basic usage

Basic usage#

The Imagine SDK exposes two clients, each with a different programming paradigm: synchronous and asynchronous.

ImagineClient is the synchronous Imagine client. If you don’t need asynchronous programming on your Python code, or simply you are not familiar with asynchronous programming, this is the client you want to use.

Otherwise, if you are leveraging asyncio on your codebase, ImagineAsyncClient might be a better choice.

The examples of this page are mostly focused on the synchronous client, as the async client offers a very similar interface. Check the API documentation for more details about their differences.

Before running any example from this documentation, two parameters have to be configured.

You must set the environment variable IMAGINE_API_KEY to your personal Imagine API key. Alternatively, you can pass your API key directly to the client with ImagineClient(api_key="my-api-key").
You must set the environment variable IMAGINE_ENDPOINT_URL pointing to the endpoint you are using. Alternatively, you can pass your endpoint directly to the client with ImagineClient(endpoint="https://my-endpoint/api/v2").

Danger

You should never share your personal Imagine API keys with anyone!

Likewise, you should never commit your personal Imagine API keys to any git repository!

How to get an API key

If you don’t have yet an Imagine API key, get it here.

How to get the endpoint URL

If you don’t know your endpoint URL, you can get it here.

Available models#

When calling any of the inference methods, you can pass a model name as a string to specify which model to use (for example, see the documentation for imagine.ImagineClient.chat).

You can get a list of available models with:

from pprint import pprint

from imagine import ImagineClient, ModelType

client = ImagineClient()

all_models = client.get_available_models_by_type()
pprint(all_models)

llm_models = client.get_available_models(model_type=ModelType.LLM)
pprint(llm_models)

Alternatively, if you don’t pass a model name explicitly when invoking the method, the default model will be used. The current default models are:

Model type	Default model
LLM	Llama-3.1-8B
Text to Image	sdxl-turbo
Translate	Helsinki-NLP/opus-mt-en-es
Transcribe	whisper-tiny
Embedding	BAAI/bge-large-en-v1.5
Reranker	BAAI/bge-reranker-base

Chat#

This is the most basic example of using a Large Language Model (LLM) to generate text. It instantiates the client ImagineClient and starts a new conversation by asking a question.

from imagine import ChatMessage, ImagineClient


client = ImagineClient()

chat_response = client.chat(
    messages=[ChatMessage(role="user", content="What is the best Spanish cheese?")],
    model="Llama-3.1-8B",
)

print(chat_response.first_content)

This will print something similar to:

Spain is renowned for its rich variety of cheeses, each with its unique flavor profile
and texture. The "best" Spanish cheese is subjective and often depends on personal
taste preferences. However, here are some of the most popular and highly-regarded
Spanish cheeses:

1. Manchego: A firm, crumbly cheese made from sheep's milk, Manchego is a classic
   Spanish cheese with a nutty, slightly sweet flavor.
2. Mahon: A semi-soft cheese from the island of Minorca, Mahon has a mild,
   creamy flavor and a smooth texture.
3. Idiazabal: A smoked cheese from the Basque region, Idiazabal has a strong, savory
   flavor and a firm texture.
4. Garrotxa: A soft, creamy cheese from Catalonia, Garrotxa has a mild, buttery flavor
   and a delicate aroma.
...

Streaming response#

The example above returns the response all at once. But on your application you might want to get the result in small chunks, so that you can start providing some feedback to the user as soon as possible. This is particularly useful for long responses that might take a long time to complete.

from imagine import ChatMessage, ImagineClient


client = ImagineClient()

for chunk in client.chat_stream(
    messages=[
        ChatMessage(role="system", content="You are an expert programmer."),
        ChatMessage(
            role="user", content="Write a quick sort implementation in python."
        ),
    ],
    max_tokens=1024,
):
    if chunk.first_content is not None:
        print(chunk.first_content, end="", flush=True)

print("\n")

This will provide an output similar to the example above, but the text will be printed progressively instead of all at once.

Asynchronous client#

If you are interested in the async client, this is the non-streaming example for it:

import asyncio

from imagine import ChatMessage, ImagineAsyncClient


async def main():
    client = ImagineAsyncClient()

    chat_response = await client.chat(
        messages=[ChatMessage(role="user", content="What is the best Spanish cheese?")],
    )
    print(chat_response.first_content)


if __name__ == "__main__":
    asyncio.run(main())

And with streaming enabled:

import asyncio

from imagine import ChatMessage, ImagineAsyncClient


async def main():
    client = ImagineAsyncClient()

    async for chunk in client.chat_stream(
        messages=[ChatMessage(role="user", content="What is the best French cheese?")],
    ):
        if chunk.first_content is not None:
            print(chunk.first_content, end="", flush=True)


if __name__ == "__main__":
    asyncio.run(main())

Notice how on both cases the methods and the input arguments are the same, making it very easy to transition from synchronous code to async code.

Code completion#

Code completion works very similar to the Chat feature. The following example illustrates how to generate some Python code:

from imagine import ImagineClient


client = ImagineClient()

completion_response = client.completion(
    prompt="Write a Python function to get the fibonacci series"
)

print(completion_response.first_text)

This will print a response similar to:

Here is a Python function that generates the Fibonacci series up to a given number:

```Python
def fibonacci(n):
    fib_series = [0, 1]
    while fib_series[-1] + fib_series[-2] <= n:
        fib_series.append(fib_series[-1] + fib_series[-2])
    return fib_series

n = int(input("Enter a number: "))
print(fibonacci(n))
```

The equivalent streaming and async outputs are available as shown on the Chat examples. Check the API documentation for details.

Translation#

The Imagine SDK can be used to translate text between languages. Make sure to select the right model for the desired input and output language. This is the example with non-streaming using the synchronous client:

from imagine import ImagineClient


client = ImagineClient()

english_to_spanish = "Helsinki-NLP/opus-mt-en-es"

translate_response = client.translate(
    prompt="San Diego is one of the most beautiful cities in America!",
    model=english_to_spanish,
)

print(translate_response.first_text)

This will print a response similar to:

San Diego es una de las ciudades más hermosas de América!

Images#

This is an example of how to generate one image using a text prompt and save it to disk.

import base64

from imagine import ImagineClient


client = ImagineClient()

images_response = client.images_generate(
    prompt="A cat sleeping on planet Mars",
    n=2,
    negative_prompt="disfigured, ugly, bad, immature, cartoon, anime, 3d, painting, b&w",
    response_format="b64_json",
)

# Save image to file

for i in range(len(images_response.data)):
    with open(f"MyImage_{i}.png", "wb") as f:
        f.write(base64.decodebytes(images_response.data[0].b64_json.encode()))

Transcribe audio#

This is an example of how to convert an audio file to text.

from imagine import ImagineClient

client = ImagineClient()

response = client.transcribe("my_audio.mp3")

print(response.text)

That will print a text with the transcription of the mp3 file.

Embeddings#

Imagine SDK supports creating embeddings from a given text input. Embeddings are numerical representations of text that capture semantic meaning, making them useful for various natural language processing (NLP) tasks.

Use Cases:

Similarity Search: Find texts that are semantically similar.
Clustering: Group similar texts together.
Classification: Improve the performance of text classification models.
Recommendation Systems: Enhance content recommendations based on text similarity.

See the following example to generate embeddings:

from imagine import ImagineClient

client = ImagineClient()

embedding_response = client.embeddings(["What a beautiful day", "this is amazing"])

print(len(embedding_response.data))

Reranking#

If you need reranking as part of your RAG workflow, you can use it like this:

from pprint import pprint

from imagine import ImagineClient, ModelType


client = ImagineClient()

reranker_models = client.get_available_models_by_type(model_type=ModelType.RERANKER)
print(reranker_models)

reranker_response = client.reranker(
    query="what is a panda?",
    documents=[
        "The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear",
        "Paris is in France",
        "Kung fu panda is a movie",
        "Pandas are animals that live in cold climate",
    ],
    return_documents=True,
    top_n=3,
)

pprint(reranker_response.data)