RAG Example Implementation in the OpenAI Showcase Application

Last modified: April 10, 2024

1 Introduction

Retrieval augmented generation (RAG) is a framework for an AI-based search with a private or external knowledge base that combines embeddings-based knowledge retrieval with a text generation model. The starting point will be a collection of data to be considered as the private knowledge base. The final goal is that an end user of the app can ask questions about the data and the assistant responses will only be based on this knowledge base.

2 High-level Flow

The complete technical flow can be split up into the following three steps at a high level:

  1. Prepare the knowledge base (once per document)

    1. Data is chunked into smaller, partially overlapping, pieces of information.
    2. For each data chunk, the embedding vector will be retrieved from OpenAI’s embeddings API.
    3. Data chunks (or their identifier) are stored in a vector database together with their embedding vector.
  2. Query the knowledge base (once per search)

    1. User query is sent to the embeddings API to retrieve the embedding vector of the query.
    2. A pre-defined number of most-relevant data chunks is retrieved from the vector database. This set is selected based on cosine similarity to the user query embedding vector.
  3. Invoke the text generation model (once per search)

    1. User query and the relevant data chunks are sent to the chat completions API.
    2. Through prompt engineering, the text generation model is instructed to only base the answer on the data chunks that were sent as part of the request. This prevents the model from hallucinating.
    3. The assistant response is returned to the user.

In summary, in the first step, you need to provide the private knowledge base, such as a text snippet. You need to prepare the content for RAG, which happens only once. If the content changes, you need to provide it again for RAG. The last two steps happen every time an end-user triggers the RAG flow, for example, by asking a question about the data.

3 RAG Example in the OpenAI Showcase Application

3.1 Prerequisites

Before you start experimenting with the end-to-end process, make sure that you have covered the following prerequisites:

You have access to a (remote) PostgreSQL database with the pgvector extension available.

3.2 Steps

  1. Download, run, and login to the OpenAI Showcase App.

  2. Go to the Retrieval Augmented Generation example and read Step 1: Introduction.

  3. Setup a PostgreSQL vector database and configure the connection in Step 2: Vector Database Configuration.

  4. Go to Step 3: Knowledge Base and create embeddings from a text and store. You can use our default text about ornamental flowering plants, or paste your own content.

  5. Go to Step 4: Embedding Vectors. Verify the embedding vectors have been created in your new database. If you ever want to go back to load different content instead, any existing records are replaced automatically.

  6. Go to Step 5: User Prompt and do as follows:

    1. Ask something about the entered text. The system prompt is automatically enriched with the chunks of text from the knowledge base that are most relevant for the user query.
    2. Review the augmented prompt.
    3. Let the model run the retrieval augmented chat completion and view the results.

4 Building Your Own RAG Setup

This section lists some general key points that apply regardless of which architecture you choose.

If you would like to build your own RAG setup, feel free to learn from the OpenAI showcase application and start building your own app. Below you can find the key takeaways from the OpenAI showcase app:

  • For RAG, you need a storage space for high-dimensional embedding vectors outside of your normal Mendix app database. Typically, this is a remote vector database. In order to connect to it, the OpenAI showcase application uses the Mendix database connector. See Vector Database Setup for more details.

  • The OpenAI showcase application relies on a PostgreSQL database with the pgvector extension included. In such a setup you can similarly rely on the PgVector Knowledge Base module to take care of creating and executing the right queries. If you choose to use a different type of vector database, the database queries and statements, which you will have to perform, should cover at least the following:

    • Include the vector extension if applicable for the chosen database type (create extension).

    • Create tables to store the embedding vectors (create table).

    • Add new embedding vectors to tables (insert).

    • Find top-k nearest neighbors (select query; typically using cosine distance/similarity optimization as recommended by OpenAI).

    • Remove individual records (delete) or tables (drop table).

5 Read More