Does RAG mean my data is used to train the AI model?

No. RAG retrieves your content at query time — it does not modify the underlying language model. Your data is used to find relevant passages, not to change how the model behaves globally.

How often does the RAG knowledge base get updated?

With Sitepilot, you trigger a re-crawl manually whenever you update your content. The new pages are re-chunked, re-embedded, and replace the old vectors within minutes.

What happens if the user asks something not in my content?

The RAG system will find no relevant chunks above the similarity threshold. The bot will then fall back to a polite 'I don't have that information' response rather than guessing — which is the correct behaviour.

What Is RAG and Why Your Support Bot Needs It

The problem with standard LLMs

Large language models like GPT-4 and Claude are trained on vast amounts of text from the internet and other sources, with a knowledge cutoff date — the point after which they have no awareness of events or changes. This creates an immediate problem for customer support: your product changes constantly. Pricing updates, new features ship, old features get deprecated, integrations change. None of that is in the model's training data.

But the knowledge cutoff isn't even the biggest issue. The bigger problem is that a standard LLM has no knowledge of your specific product at all. It might know your company exists if you're well-known enough to appear in training data, but it doesn't know your pricing tiers, your specific feature set, your cancellation policy, or your onboarding process. Ask a raw LLM about those things and it will do what LLMs do when they don't know something: it will guess, and it will guess confidently.

This phenomenon — where a language model produces fluent, confident-sounding text that is factually wrong — is called hallucination. For a customer support context, hallucinations aren't just annoying. They're trust-destroying. A bot that tells a customer the wrong price, invents a feature that doesn't exist, or says "yes we integrate with Salesforce" when you don't, creates real business damage.

The open-book exam analogy

Think about the difference between a closed-book exam and an open-book exam. In a closed-book exam, you can only answer based on what you memorised. If you didn't study a topic or your memory is fuzzy, you might guess wrong — and you might guess very confidently if you half-remember something incorrectly.

In an open-book exam, you look up the answer in your notes before writing it down. Your response is grounded in a source you can actually verify. If the answer isn't in your notes, you write "I don't have enough information on this" rather than making something up.

RAG is the open-book exam for AI. Instead of asking the language model to answer from memorised training data, RAG first retrieves the relevant pages from your knowledge base and hands them to the model as reference material. The model's job is then to synthesise a clear, conversational answer from those pages — not to invent one from scratch. This shifts the failure mode from "confident wrong answer" to "no relevant content found" — a much more honest and useful failure.

How RAG works, step by step

Here is the complete process from your website to an accurate chatbot answer:

Crawl your website. The system visits every public page on your site and extracts the text content — headings, paragraphs, list items, table cells. Navigation elements and footers are typically filtered out as low-signal noise.
Split content into chunks. Each page is divided into overlapping segments of roughly 200–400 tokens (about 150–300 words each). Overlapping means consecutive chunks share some content, which prevents answers from falling through the gaps at chunk boundaries.
Convert each chunk to a vector embedding. Each chunk of text is passed through an embedding model — Sitepilot uses Azure OpenAI's text-embedding-3-small — which converts the text into a list of 1,536 numbers. These numbers capture the semantic meaning of the text in a way that can be mathematically compared.
Store vectors in a database. The vectors are stored in a pgvector-enabled PostgreSQL database (Supabase) alongside the original chunk text and the URL it came from. This is your knowledge base.
At query time: convert the user's question to a vector. When a user types a question, that question goes through the same embedding model and is converted to a vector.
Find the most similar chunks. The system performs a cosine similarity search against all the stored vectors and retrieves the top 5 most semantically similar chunks. These are the sections of your website most relevant to the question.
Inject chunks into the LLM prompt and generate a response. The 5 retrieved chunks are included in the prompt sent to the language model, along with an instruction to answer only based on the provided context. The model writes a natural-language response grounded in your actual content.

The entire process from user question to displayed answer typically takes 1–3 seconds, depending on the size of the retrieved chunks and the response length.

What happens without RAG

To make the stakes concrete, here are real categories of failures that happen when a chatbot is built on a standard LLM without RAG:

A customer asks "what's the price of the Pro plan?" The LLM answers with a price it found in its training data — which might be from a competitor's pricing page, an old cached version of your site, or entirely fabricated. The customer buys expecting that price and is shocked at checkout.
A customer asks "do you support two-factor authentication?" The LLM says yes because most SaaS products support 2FA. Yours doesn't yet. The customer signs up expecting a security feature that isn't there.
A customer asks "does your API support webhooks?" The LLM says "yes, webhooks are available" because it's a common feature. You removed webhook support in v3.0. The customer builds an integration that breaks.

Each of these is a real support ticket, a real refund request, or a real churn event waiting to happen. RAG doesn't eliminate all errors, but it eliminates this entire category of errors — the ones caused by the model not knowing your product specifically.

RAG vs fine-tuning: clearing up the confusion

A common misconception is that you can "train" a chatbot on your website content by fine-tuning the underlying language model. Fine-tuning is a real technique — it adjusts the weights of a neural network using new training examples — but it's almost never the right approach for a customer support chatbot, for three reasons.

First, fine-tuning is expensive and slow. Running a fine-tuning job on a production-grade LLM can cost hundreds of dollars and take hours. You'd need to do this every time your content changes.

Second, fine-tuned models are unreliable for factual recall. They learn patterns and styles from the training data, but they don't reliably memorise specific facts. A fine-tuned model might learn the tone of your documentation, but it still might get your pricing wrong.

Third, fine-tuning doesn't solve the update problem. A model fine-tuned on your docs last quarter still doesn't know about the feature you shipped last week. RAG, by contrast, updates instantly when you re-crawl — the knowledge base is queried at runtime, not baked into model weights.

Fine-tuning has legitimate uses — changing how a model responds, teaching it a specialised domain vocabulary, adjusting its tone. But for "answer questions accurately about my specific product," RAG is the right tool, every time.

How Sitepilot implements RAG

Sitepilot's RAG pipeline is built on three components: Azure OpenAI for embeddings and generation, Supabase with the pgvector extension for vector storage and similarity search, and a retrieval layer that fetches the top 5 chunks per query using cosine similarity with a minimum threshold to filter irrelevant results.

The embedding model used is text-embedding-3-small, which produces 1,536-dimensional vectors. At query time, the similarity search runs as a single SQL query against the pgvector index — typically completing in under 50ms. The retrieved chunks are passed to the generation model with a system prompt that explicitly instructs it to answer only using the provided context and to say "I don't know" if the context doesn't contain the answer.

This design means Sitepilot bots never answer from the LLM's general training data — every response is grounded in your content. When you re-crawl your site, old vectors are replaced with new ones, and the updated knowledge base is live immediately with no deployment step required.