LangChain

LangChain#

Imagine SDK can be used with LangChain in the same way as other language models like the ones offered by OpenAI, Anthropic, etc. This documentation does not pretend to be a tutorial about LangChain, but to demonstrate how to use Imagine SDK with LangChain. Please refer to the LangChain documentation for any questions about it.

Using the language model#

If you are familiar with LangChain, you will know that it offers a standard interface to use with language models from different vendors, as seen on their list language models.

Imagine SDK can be used exactly in the same way.

The examples of this page are mostly focused on the synchronous client, as the async client offers a very similar interface. Check the API documentation for more details about their differences.

Before running any example from this documentation, two parameters have to be configured.

You must set the environment variable IMAGINE_API_KEY to your personal Imagine API key. Alternatively, you can pass your API key directly to the client with ImagineClient(api_key="my-api-key").
You must set the environment variable IMAGINE_ENDPOINT_URL pointing to the endpoint you are using. Alternatively, you can pass your endpoint directly to the client with ImagineClient(endpoint="https://my-endpoint/api/v2").

Danger

You should never share your personal Imagine API keys with anyone!

Likewise, you should never commit your personal Imagine API keys to any git repository!

How to get an API key

If you don’t have yet an Imagine API key, get it here.

How to get the endpoint URL

If you don’t know your endpoint URL, you can get it here.

from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
from imagine.langchain import ImagineChat

model = ImagineChat(model="Llama-3-8B")
response = model.invoke(
        [
            SystemMessage(content="Translate the following from English into Italian"),
            HumanMessage(content="hello"),
        ]
    )

print(response.content)

for chunk in model.stream(
    [
        HumanMessage(content="hello!"),
        AIMessage(content="Hi there human!"),
        HumanMessage(
            content="Write a program to sort a list of numbers in python!"
        ),
    ], max_tokens=512):

    print(chunk.content, end="", flush=True)

Chat#

Chat models are a variation on language models. While chat models use language models under the hood, the interface they use is a bit different. Rather than using a “text in, text out” API, they use an interface where “chat messages” are the inputs and outputs.

This is the most basic example that instantiates the client ChatClient and starts a new conversation by asking a question.

from langchain_core.messages import HumanMessage, SystemMessage

from imagine.langchain import ImagineChat


model = ImagineChat(model="Llama-3.1-8B", max_tokens=200)
messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(content="What is the purpose of model regularization?"),
]

response = model.invoke(messages)

print(response.content)

This will print something similar to:

The purpose of model regularization is to prevent overfitting in machine learning
models. Overfitting occurs when a model becomes too complex and starts to fit the noise
in the training data, leading to poor generalization on unseen data. Regularization
techniques introduce additional constraints or penalties to the model's objective
function, discouraging it from becoming overly complex and promoting simpler and more
generalizable models. Regularization helps to strike a balance between fitting the
training data well and avoiding overfitting, leading to better performance on new,
unseen data.

Streaming response#

The example above returns the response all at once. But on your application you might want to get the result in small chunks, so that you can start providing some feedback to the user as soon as possible. This is particularly useful for long responses that might take a long time to complete.

from langchain_core.messages import HumanMessage, SystemMessage

from imagine.langchain import ImagineChat


model = ImagineChat(model="Llama-3-8B")
messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(content="What is the purpose of model regularization?"),
]

for chunk in model.stream(messages):
    print(chunk.content, end="", flush=True)

print("\n")

This will provide an output similar to the example above, but the text will be printed progressively instead of all at once.

Asynchronous client#

If you are interested in the async client, this is the non-streaming example for it:

import asyncio

from langchain_core.messages import HumanMessage, SystemMessage

from imagine.langchain import ImagineChat


async def main():
    model = ImagineChat(model="Llama-3-8B", max_tokens=512)
    messages = [
        SystemMessage(content="You're a helpful assistant"),
        HumanMessage(content="What is the purpose of model regularization?"),
    ]

    response = await model.ainvoke(messages)

    print(response.content)


if __name__ == "__main__":
    asyncio.run(main())

And with streaming enabled:

import asyncio

from langchain_core.messages import HumanMessage, SystemMessage

from imagine.langchain import ImagineChat


async def main():
    model = ImagineChat(model="Llama-3-8B", max_tokens=512)
    messages = [
        SystemMessage(content="You're a helpful assistant"),
        HumanMessage(content="What is the purpose of model regularization?"),
    ]

    async for chunk in model.astream(messages, max_tokens=100):
        print(chunk.content, end="", flush=True)


if __name__ == "__main__":
    asyncio.run(main())

Notice how on both cases the methods and the input arguments are very similar, making it very easy to transition from synchronous code to async code.

Prompt templates#

Prompt templates can be used to make formatting a bit easier.

from langchain_core.prompts import ChatPromptTemplate

from imagine.langchain import ImagineChat


model = ImagineChat(model="Llama-3-8B")


# Example 1: Create a ChatPromptTemplate using a template string

print("-----Prompt from Template-----")
template = "Tell me a joke about {topic}."
prompt_template = ChatPromptTemplate.from_template(template)

prompt = prompt_template.invoke({"topic": "cats"})
result = model.invoke(prompt)
print(result.content)


# Example 2: Prompt with Multiple Placeholders

print("\n----- Prompt with Multiple Placeholders -----\n")
template_multiple = """You are a helpful assistant.
Human: Tell me a {adjective} short story about a {animal}.
Assistant:"""
prompt_multiple = ChatPromptTemplate.from_template(template_multiple)
prompt = prompt_multiple.invoke({"adjective": "funny", "animal": "panda"})

result = model.invoke(prompt)
print(result.content)


# Example 3: Prompt with System and Human Messages (Using Tuples)

print("\n----- Prompt with System and Human Messages (Tuple) -----\n")
messages = [
    ("system", "You are a comedian who tells jokes about {topic}."),
    ("human", "Tell me {joke_count} jokes."),
]
prompt_template = ChatPromptTemplate.from_messages(messages)
prompt = prompt_template.invoke({"topic": "lawyers", "joke_count": 3})
result = model.invoke(prompt)
print(result.content)

Chains#

Chains refer to sequences of calls - whether to an LLM, a tool, or a data preprocessing step. The primary supported way to do this is with LCEL.

See the following example that creates a chain with the steps:

Defines a prompt template.
Passes it to a model.
Parses the output of the model to extract the text of the most likely output.
Uses a custom function to uppercase the text.
Counts then number of words on that text.

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda

from imagine.langchain import ImagineChat


model = ImagineChat(model="Llama-3-8B")

# Define prompt templates
prompt_template = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a comedian who tells jokes about {topic}."),
        ("human", "Tell me {joke_count} jokes."),
    ]
)

# Define additional processing steps using RunnableLambda
uppercase_output = RunnableLambda(lambda x: x.upper())
count_words = RunnableLambda(lambda x: f"Word count: {len(x.split())}\n{x}")

# Create the combined chain using LangChain Expression Language (LCEL)
chain = prompt_template | model | StrOutputParser() | uppercase_output | count_words

# Run the chain
result = chain.invoke({"topic": "lawyers", "joke_count": 3})

# Output
print(result)

LLM#

This interface is one that takes as input a string and returns a string. The following example showcases both the regular case and the stream case:

from imagine.langchain import ImagineLLM


llm = ImagineLLM(max_tokens=1024)

res_query = llm.invoke(
    "What are some theories about the relationship between unemployment and inflation?",
    max_tokens=100,
)
print(res_query)

for chunk in llm.stream(
    "What are some theories about the relationship between unemployment and inflation?"
):
    print(chunk, end="", flush=True)
print("\n")

Embeddings#

The Embeddings class is a class designed for interfacing with text embedding models.

Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a query. The former takes as input multiple texts, while the latter takes a single text.

Again, this can be done synchronously:

from imagine.langchain import ImagineEmbeddings


embedding = ImagineEmbeddings()

# Embed list of texts
res_documents = embedding.embed_documents(
    [
        "Hi there!",
        "Oh, hello!",
        "What's your name?",
        "My friends call me World",
        "Hello World!",
    ]
)
# print(res_documents)
print(len(res_documents))
print([len(d) for d in res_documents])


# Embed a single piece of text for the purpose of comparing to other embedded pieces of texts
res_query = embedding.embed_query("What was the name mentioned in the conversation?")
# print(res_query)
print(len(res_query))

Or asynchronously:

import asyncio

from imagine.langchain import ImagineEmbeddings


async def main():
    embedding = ImagineEmbeddings()

    # Embed list of texts
    res_documents = await embedding.aembed_documents(
        [
            "Hi there!",
            "Oh, hello!",
            "What's your name?",
            "My friends call me World",
            "Hello World!",
        ]
    )
    # print(res_documents)
    print(len(res_documents))
    print([len(d) for d in res_documents])

    # Embed a single piece of text for the purpose of comparing to other embedded pieces of texts
    res_query = await embedding.aembed_query(
        "What was the name mentioned in the conversation?"
    )
    # print(res_query)
    print(len(res_query))


if __name__ == "__main__":
    asyncio.run(main())

LangChain

Contents

LangChain#

Using the language model#

Chat#

Streaming response#

Asynchronous client#

Prompt templates#

Chains#

LLM#

Embeddings#

RAG#

Agents#

Table of contents#