RAG and Ingestion Overview¶

Retrieval-Augmented Generation (RAG) is the process of combining external knowledge with an AI model’s reasoning ability. Instead of relying only on what the model was trained on, RAG lets the model look up relevant data at the right time.

This section explains the key concepts and shows how Assistant Engine ingests, stores, and retrieves knowledge.

What is RAG¶

RAG is a framework where a model retrieves information from a knowledge base before answering.

Without RAG: The model relies on memory and can “hallucinate.”
With RAG: The model grounds its answers in actual documents, databases, or code you provide.

What is Semantic Searching¶

Semantic search goes beyond keywords. Instead of asking “Does this text contain X?”, it asks “Does this text mean something similar to X?”.

Example: Searching “How do I connect to SQL?” may also return results containing “Database connection string”.
This is powered by embeddings (numeric representations of meaning).

What is Chunking¶

Large files can’t be used as-is. Chunking breaks them into smaller, manageable pieces while preserving order.

Example: A 300-page PDF might become 600 smaller text segments.
Each chunk is linked to its position and metadata (e.g., file name, code namespace).
This makes retrieval faster and more precise.

What is Describing¶

Before embedding, a descriptor model can add context.

Example: It can auto-comment code, summarize table schemas, or highlight key insights.
This ensures embeddings capture not just raw text, but its intended meaning.

What is Vectorizing¶

Vectorizing is the process of converting chunks into dense vectors (arrays of numbers).

Each vector encodes meaning in a mathematical space.
Similar meanings = vectors that are close together.
Stored in a vector database, these embeddings enable fast semantic search.

Got it — you want the flow section rewritten so it feels Assistant Engine-specific, not just generic RAG. Here’s a refined version tailored to Assistant Engine’s architecture and terminology:

How It All Actually Comes Together in Assistant Engine¶

Data Storage (Ingestion Flow – Simplified)¶

You choose what Assistant Engine should know.
Point to file paths, code directories, or databases.
These sources are added to Assistant Engine’s knowledge base.
Descriptor Model adds human-friendly context.
Code gets auto-commented or namespace tags added.
Tables are described with purpose, example queries, and relationships.
Text documents get summaries.
Assistant Engine chunks the content.
Large files are broken down into smaller, ordered chunks.
Each chunk is linked with metadata: file path, namespace, schema, etc.
Embedding Model vectorizes the chunks.
Every chunk becomes a semantic vector (a number array representing meaning).
Vectors and metadata are stored in Assistant Engine’s SQLite vector stores (text-chunks, code-chunks, sql-table-chunks, etc.).

Data Retrieval (Question → Answer Flow in Assistant Engine)¶

You ask Assistant Engine a question.
e.g. “Where is the NASDAQ orderbook logic defined?”
Assistant Model evaluates context.
It decides if internal reasoning is enough, or if data lookup is needed.
Search function is invoked.
Assistant asks Assistant Engine’s vector store for the top semantic matches.
Metadata helps refine the search (e.g. “only look in OrderBook namespace”).
Relevant chunks are returned.
Example: the exact C# file containing OrderBookDataTableType.
Or the SQL schema with related keys and constraints.
Assistant reads and reasons.
The retrieved chunks are injected into the Assistant’s context.
The model produces an answer grounded in your own data.

⚡ In Assistant Engine: Sources → Describe → Chunk → Vectorize → Store → Retrieve → Answer

Your assistant becomes a local, multi-model agent that not only reasons, but also truly knows your code and data.