Integrating Orquesta for RAG Workflows Step-by-Step
Post date :
Jan 29, 2024
Orquesta is a suite of collaboration and no-code building blocks for product teams building their custom technology solutions.
Our intuitive platform enables technical and non-technical team members to manage and run business rules, remote configurations, and AI prompts in a highly intuitive environment. This process is supported by powerful tools such as versioning, simulators, code generators, and logs.
In this guide, we show you how to implement a RAG pipeline with our Python SDK using an OpenAI LLM in combination with a Weaviate vector database and an OpenAI embedding model. LangChain is used for orchestration, generating responses from Orquesta, and logging additional data.
For you to be able to follow along in this tutorial, you will need the following:
Jupyter Notebook (or any IDE of your choice)
OpenAIfor the embedding model and LLM
weaviate-clientfor the vector database
Install SDK and Packages
Install the orquesta-sdk package
Install the langchain package
Install the openai package
# Install the weaviate-client package
Grab your OpenAI API keys.
Enable models in the Model Garden
Orquesta allows you to pick and enable the models of your choice and work with them. Enabling a model(s) is very easy; all you have to do is navigate to the Model Garden and toggle on the model of your choice.
Collect and load data
The raw text document is available in LangChain’s GitHub repository.
requestslibrary for making HTTP requests
TextLoadermodule from langchain to load text data from
Define the URL from which to fetch the text data"
Make an HTTP GET request to fetch the content from the specified URL
Open the local file named '
state_of_the_union.txt' in write mode ('w')
Create a TextLoader instance, specifying the path to the local text file
Load the text data
Chunk your documents
LangChain has many built-in text splitters for this purpose. For this example, you can use
CharacterTextSplitter with a
chunk_size of about
1000 and a
0 to preserve text continuity between the chunks.
The code uses the
CharacterTextSplitterclass from the
langchain.text_splittermodule to split a given set of documents into smaller chunks.
chunk_sizeparameter determines the size of each chunk, and the
chunk_overlapparameter specifies the overlap between adjacent chunks.
Creating an instance of
CharacterTextSplitterallows for customization of chunking parameters based on the specific needs of the text data.
split_documentsmethod is called to perform the actual splitting, and the result is stored in the
chunksvariable, which now holds a list of text chunks.
Embed and store the chunks
To enable semantic search across the text chunks, you need to generate the vector embeddings for each chunk and then store them together with their embeddings.
Initialize a Weaviate client with embedded options. This client will be used to interact with the Weaviate service.
Utilize the Weaviate module from LangChain to create a Weaviate vector store. This involves providing the Weaviate client, the documents (chunks) to be processed, specifying the OpenAI embeddings for vectorization, and setting the option for processing non-text data (
Step 1: Retrieve
Populate the vector database and define it as the retriever component, which fetches the additional context based on the semantic similarity between the user query and the embedded chunks.
We convert the vector store into a retriever, enabling similarity searches.
Then perform a similarity search using the provided query ("What did the president say about Justice Breyer"). The result, stored in the variable
docs, is a list of documents ranked by similarity.
Finally, extract the content of the most similar document in the search.
Step 2: Augment
Create a client instance for Orquesta. You can instantiate as many client instances as necessary with the `OrquestaClient` class. You can find your API Key in your workspace: `
Prepare a Deployment in Orquesta and set up the primary model, fallback model, number of retries, and the prompt itself with variables. Whatever the information from the RAG process, you need to attach it as a variable when you call a Deployment in Orquesta. An example is shown below:
Request a variant by right-clicking on the row and generate the code snippet.
Invoke Orquesta Deployment, for the context, we set it to the similarity search result, chaining together the retriever and the prompt.
Step 3: Generate
Your LLM response is generated from Orquesta using the selected model from the Deployment, and you can print it out.
Logging additional metrics to the request
After a successful query, Orquesta will generate a log with the evaluation result. You can add metadata and score to the Deployment by using the
You can also fetch the deployment configuration if you are using Orquesta as a prompt management system.
Finally, you can head over to your Deployment in the Orquesta dashboard and click on logs, and you will be able to see your LLM response and other information about the LLM interaction.