Basic usage#
The Imagine SDK exposes two clients, each with a different programming paradigm: synchronous and asynchronous.
ImagineClient
is the synchronous Imagine client. If you don’t need
asynchronous programming on your Python code, or simply you are not familiar with
asynchronous programming, this is the client you want to use.
Otherwise, if you are leveraging asyncio
on your codebase,
ImagineAsyncClient
might be a better choice.
The examples of this page are mostly focused on the synchronous client, as the async client offers a very similar interface. Check the API documentation for more details about their differences.
Before running any example from this documentation, two parameters have to be configured.
You must set the environment variable
IMAGINE_API_KEY
to your personal Imagine API key. Alternatively, you can pass your API key directly to the client withImagineClient(api_key="my-api-key")
.You must set the environment variable
IMAGINE_ENDPOINT_URL
pointing to the endpoint you are using. Alternatively, you can pass your endpoint directly to the client withImagineClient(endpoint="https://my-endpoint/api/v2")
.
Danger
You should never share your personal Imagine API keys with anyone!
Likewise, you should never commit your personal Imagine API keys to any git repository!
How to get an API key
If you don’t have yet an Imagine API key, get it here.
How to get the endpoint URL
If you don’t know your endpoint URL, you can get it here.
Available models#
When calling any of the inference methods, you can pass a model name as a string to
specify which model to use (for example, see the documentation for imagine.ImagineClient.chat
).
You can get a list of available models with:
from pprint import pprint
from imagine import ImagineClient, ModelType
client = ImagineClient()
all_models = client.get_available_models_by_type()
pprint(all_models)
llm_models = client.get_available_models(model_type=ModelType.LLM)
pprint(llm_models)
Alternatively, if you don’t pass a model name explicitly when invoking the method, the default model will be used. The current default models are:
Model type |
Default model |
---|---|
LLM |
Llama-3.1-8B |
Text to Image |
sdxl-turbo |
Translate |
Helsinki-NLP/opus-mt-en-es |
Transcribe |
whisper-tiny |
Embedding |
BAAI/bge-large-en-v1.5 |
Reranker |
BAAI/bge-reranker-base |
Chat#
This is the most basic example of using a Large Language Model (LLM) to generate text.
It instantiates the client ImagineClient
and starts a new conversation by asking a
question.
from imagine import ChatMessage, ImagineClient
client = ImagineClient()
chat_response = client.chat(
messages=[ChatMessage(role="user", content="What is the best Spanish cheese?")],
model="Llama-3.1-8B",
)
print(chat_response.first_content)
This will print something similar to:
Spain is renowned for its rich variety of cheeses, each with its unique flavor profile
and texture. The "best" Spanish cheese is subjective and often depends on personal
taste preferences. However, here are some of the most popular and highly-regarded
Spanish cheeses:
1. Manchego: A firm, crumbly cheese made from sheep's milk, Manchego is a classic
Spanish cheese with a nutty, slightly sweet flavor.
2. Mahon: A semi-soft cheese from the island of Minorca, Mahon has a mild,
creamy flavor and a smooth texture.
3. Idiazabal: A smoked cheese from the Basque region, Idiazabal has a strong, savory
flavor and a firm texture.
4. Garrotxa: A soft, creamy cheese from Catalonia, Garrotxa has a mild, buttery flavor
and a delicate aroma.
...
Streaming response#
The example above returns the response all at once. But on your application you might want to get the result in small chunks, so that you can start providing some feedback to the user as soon as possible. This is particularly useful for long responses that might take a long time to complete.
from imagine import ChatMessage, ImagineClient
client = ImagineClient()
for chunk in client.chat_stream(
messages=[
ChatMessage(role="system", content="You are an expert programmer."),
ChatMessage(
role="user", content="Write a quick sort implementation in python."
),
],
max_tokens=1024,
):
if chunk.first_content is not None:
print(chunk.first_content, end="", flush=True)
print("\n")
This will provide an output similar to the example above, but the text will be printed progressively instead of all at once.
Asynchronous client#
If you are interested in the async client, this is the non-streaming example for it:
import asyncio
from imagine import ChatMessage, ImagineAsyncClient
async def main():
client = ImagineAsyncClient()
chat_response = await client.chat(
messages=[ChatMessage(role="user", content="What is the best Spanish cheese?")],
)
print(chat_response.first_content)
if __name__ == "__main__":
asyncio.run(main())
And with streaming enabled:
import asyncio
from imagine import ChatMessage, ImagineAsyncClient
async def main():
client = ImagineAsyncClient()
async for chunk in client.chat_stream(
messages=[ChatMessage(role="user", content="What is the best French cheese?")],
):
if chunk.first_content is not None:
print(chunk.first_content, end="", flush=True)
if __name__ == "__main__":
asyncio.run(main())
Notice how on both cases the methods and the input arguments are the same, making it very easy to transition from synchronous code to async code.
Code completion#
Code completion works very similar to the Chat feature. The following example illustrates how to generate some Python code:
from imagine import ImagineClient
client = ImagineClient()
completion_response = client.completion(
prompt="Write a Python function to get the fibonacci series"
)
print(completion_response.first_text)
This will print a response similar to:
Here is a Python function that generates the Fibonacci series up to a given number:
```Python
def fibonacci(n):
fib_series = [0, 1]
while fib_series[-1] + fib_series[-2] <= n:
fib_series.append(fib_series[-1] + fib_series[-2])
return fib_series
n = int(input("Enter a number: "))
print(fibonacci(n))
```
The equivalent streaming and async outputs are available as shown on the Chat examples. Check the API documentation for details.
Translation#
The Imagine SDK can be used to translate text between languages. Make sure to select the right model for the desired input and output language. This is the example with non-streaming using the synchronous client:
from imagine import ImagineClient
client = ImagineClient()
english_to_spanish = "Helsinki-NLP/opus-mt-en-es"
translate_response = client.translate(
prompt="San Diego is one of the most beautiful cities in America!",
model=english_to_spanish,
)
print(translate_response.first_text)
This will print a response similar to:
San Diego es una de las ciudades más hermosas de América!
Images#
This is an example of how to generate one image using a text prompt and save it to disk.
import base64
from imagine import ImagineClient
client = ImagineClient()
images_response = client.images_generate(
prompt="A cat sleeping on planet Mars",
n=2,
negative_prompt="disfigured, ugly, bad, immature, cartoon, anime, 3d, painting, b&w",
response_format="b64_json",
)
# Save image to file
for i in range(len(images_response.data)):
with open(f"MyImage_{i}.png", "wb") as f:
f.write(base64.decodebytes(images_response.data[0].b64_json.encode()))
Transcribe audio#
This is an example of how to convert an audio file to text.
from imagine import ImagineClient
client = ImagineClient()
response = client.transcribe("my_audio.mp3")
print(response.text)
That will print a text with the transcription of the mp3 file.
Embeddings#
Imagine SDK supports creating embeddings from a given text input. Embeddings are numerical representations of text that capture semantic meaning, making them useful for various natural language processing (NLP) tasks.
Use Cases:
Similarity Search: Find texts that are semantically similar.
Clustering: Group similar texts together.
Classification: Improve the performance of text classification models.
Recommendation Systems: Enhance content recommendations based on text similarity.
See the following example to generate embeddings:
from imagine import ImagineClient
client = ImagineClient()
embedding_response = client.embeddings(["What a beautiful day", "this is amazing"])
print(len(embedding_response.data))
Reranking#
If you need reranking as part of your RAG workflow, you can use it like this:
from pprint import pprint
from imagine import ImagineClient, ModelType
client = ImagineClient()
reranker_models = client.get_available_models_by_type(model_type=ModelType.RERANKER)
print(reranker_models)
reranker_response = client.reranker(
query="what is a panda?",
documents=[
"The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear",
"Paris is in France",
"Kung fu panda is a movie",
"Pandas are animals that live in cold climate",
],
return_documents=True,
top_n=3,
)
pprint(reranker_response.data)