RAG in a Mendix App

Last modified: November 19, 2024

Introduction

Retrieval augmented generation (RAG) is a framework for an AI-based search using a private or external knowledge base that combines embeddings-based knowledge retrieval with a text generation model. The starting point is a collection of data to be considered as the private knowledge base. The final goal is that an end user of the app can ask questions about the data and the assistant’s responses are only be based on this knowledge base.

Terminology

To understand the basics of the RAG pattern, it is important to know the common terminology. As the showcase example and the relevant platform-supported modules depend on GenAI Commons, relevant entities will be linked for reference.

Embedding vector

Also called Embedding and sometimes shortened to Vector, this is a mathematical representation of an input string generated by the LLM of choice. It consists of an ordered set of numbers (typically written as [ 0.006, 0.108, …]), and the total number of elements is called the dimension. An embeddings model can convert any string into a vector of fixed dimension.

Every LLM will have its algorithm for generating vectors, but the convention is that conceptually similar strings result in similar vectors. This enables similarity search where strings can be matched to a given search string input in terms of semantic meaning (i.e. content/tone/style/…) instead of exact character matches. Minimizing the cosine distance between each element and the vector representation of the search string input is a common mathematical technique to search through a collection of vectors and to find the most similar elements.

Chunk

In the context of GenAI Commons in a Mendix app, embedding vectors are generated using a Chunk. Each object represents a discrete piece of information and contains its original string representation, as well as (after the embedding operation) the vector representation of that string according to the LLM of choice.

Knowledge base

This is the place to store discrete pieces of information. If information and its vector representation are stored together, a knowledge base can also be called a vector database. Common vector databases have built-in logic to execute similarity searches based on a search vector.

In the context of GenAI Commons in a Mendix app, we use the PgVector Knowledge Base module to store and retrieve vectors.

Knowledge base chunk

In most use cases, more information needs to be stored than just the original input string and its vector representation. A KnowledgeBaseChunk is an extension of Chunk that can hold additional information that is typically required for useful insertion and retrieval from a Mendix application.

Metadata

If additional conventional filtering is needed during similarity searches, such additional data can be stored in the knowledge base as well. Metadata objects are key-value pairs that are inserted along with the chunks and contain this additional information. The filtering is applied on an exact string-match basis for the key-value pair. Records are only retrieved if they match all records of the metadata in the collection provided as part of the search step.

High-level Flow

The complete technical flow can be split up into the following three steps at a high level:

  1. Prepare the knowledge base (once per document)

    1. Data is chunked into smaller, partially overlapping, pieces of information.
    2. For each data chunk, the embedding vector will be retrieved from the LLM’s embeddings operations.
    3. Data chunks (or their identifier) are stored in a vector database together with their embedding vector.
  2. Query the knowledge base (once per search)

    1. User query is sent to the embeddings API to retrieve the embedding vector of the query.
    2. A pre-defined number of most-relevant data chunks is retrieved from the vector database. This set is selected based on cosine similarity to the user query embedding vector.
  3. Invoke the text generation model (once per search)

    1. User query and the relevant data chunks are sent to the LLM’s chat completions operation.
    2. Through prompt engineering, the text generation model is instructed to only base the answer on the data chunks that were sent as part of the request. This prevents the model from hallucinating.
    3. The assistant response is returned to the user.

In summary, in the first step, you need to provide the private knowledge base, such as a text snippet. You need to prepare the content for RAG, which happens only once. If the content changes, you need to update the data in the knowledge base. The last two steps happen every time an end-user triggers the RAG flow, for example, by asking a question about the data.

RAG Example in the GenAI Showcase App

Prerequisites

Before you start experimenting with the end-to-end process, make sure that you have access to a (remote) PostgreSQL database with the pgvector extension available. If you do not have one yet, learn more about how a PostgreSQL vector database can be set up to explore use cases with knowledge bases.

Steps

  1. Download, run, and login to the GenAI Showcase App.

  2. Go to the Retrieval Augmented Generation example and read Step 1: Introduction.

  3. Set Up a PostgreSQL vector database and configure the connection in Step 2: Vector Database Configuration.

  4. Go to Step 3: Knowledge Base and create embeddings from a text and store. You can use our default text about ornamental flowering plants, or paste your own content.

  5. Go to Step 4: Embedding Vectors. Verify the embedding vectors have been created in your new database. If you ever want to go back to load different content instead, any existing records are replaced automatically.

  6. Go to Step 5: User Prompt and do as follows:

    1. Ask something about the entered text. The system prompt is automatically enriched with the chunks of text from the knowledge base that are most relevant for the user query.
    2. Review the augmented prompt.
    3. Let the model run the retrieval augmented chat completion and view the results.

Building Your Own RAG Setup

This section lists some general key points that apply regardless of which architecture you choose.

If you would like to build your own RAG setup, feel free to learn from the GenAI Showcase App and start building your own app. Below you can find the key takeaways from the GenAI Showcase App:

  • For RAG, you need a storage space for high-dimensional embedding vectors outside of your normal Mendix app database. Typically, this is a remote vector database. In order to connect to it, the GenAI Showcase App uses the Mendix database connector. See Vector Database Setup for more details.

  • The GenAI Showcase App relies on a PostgreSQL database with the pgvector extension included. In such a setup you can similarly rely on the PgVector Knowledge Base module to take care of creating and executing the right queries. If you choose to use a different type of vector database, the database queries and statements, which you will have to perform, should cover at least the following:

    • Include the vector extension if applicable for the chosen database type (create extension).

    • Create tables to store the embedding vectors (create table).

    • Add new embedding vectors to tables (insert).

    • Find top-k nearest neighbors (select query; typically using cosine distance/similarity optimization as recommended by OpenAI).

    • Remove individual records (delete) or tables (drop table).

  • Similarity search is only guaranteed to work if the embeddings model chosen for the retrieval step is the same as the model used at the time of population: different models use different algorithms to generate vectors and might even produce vectors of different dimensionalities, making cosine distance calculation impossible.

  • How you construct the input string affects similarity search results. In the similarity search example for tickets in the showcase application, the input string at the time of insertion is a concatenation of multiple attributes of each ticket record in the Mendix database. However, in the search step, the user’s input—possibly just a brief description—is used to find similar tickets. While this discrepancy may lower overall similarity, the most relevant records will still appear at the top.

Read More