Skip to content

Multi-Model AI

Assistant Engine is built around the idea that no single AI model can do everything best. Instead, it combines multiple specialized models into one assistant — balancing speed, computation cost, and domain expertise depending on the task.

By default, Assistant Engine comes with six configurable model roles.


Model Roles at a Glance

Role Purpose Typical Size When to Go Bigger Notes
Assistant The “brain” — answers questions and decides which tools to use. Large Complex reasoning, long answers, mixed tasks. Usually your largest model.
Embedding Converts documents into vectors for semantic search. Small–Medium Very large libraries or multilingual data. Acts as your “vectorizer” during ingestion & retrieval.
Descriptor Summarizes files/tables before embedding (“explain before storing”). Small–Medium Technical data needing richer summaries. Think auto-commenting your data for better search.
Correction Provides rephrasings or fallback ideas when the Assistant struggles. Small Rarely for simple tasks. Optional; boosts diversity.
Text-to-SQL Translates plain English into SQL queries. Medium Complex schemas, tricky joins. Pairs well with Describe Database.
Mini Task Handles lightweight background jobs (e.g., naming chats). Tiny Almost never. Quiet helper in the background.

Try Before You Save

Adjust a role, run a quick test, and only save if you’re happy. You can always revert.


Model Settings

Every role in Assistant Engine can be fine-tuned using the following parameters. These settings give you control over creativity, precision, and resource usage.

Setting What it Controls Example Values When to Use
Model Which local model file/variant is used. qwen2.5:7b, llama3.1:70b, gemma2:2b Choose a smaller model for speed (quick lookups, small tasks) and a larger one for deep reasoning.
Temperature Creativity vs precision. Low = predictable. High = imaginative. 0.2 (precise), 0.7 (balanced), 1.0+ (creative) Use low values for code, SQL, or facts; medium for general chat; higher for brainstorming/storytelling.
Top-p “Nucleus sampling”: keeps only the most probable words until their probabilities sum to p. 0.8 (conservative), 0.9 (balanced), 1.0 (max freedom) Lower for safe, reliable answers; higher for open-ended creative writing.
Top-k Considers only the top k word choices at each step. 20 (narrow), 50 (balanced), 200 (very broad) Low for structured tasks (math, logic); higher for diverse wording.
Max output tokens Maximum length of a reply. 200 (short), 1000 (long form), 4000+ (extended essays) Increase if responses cut off mid-sentence. Keep low for snappy answers.
Presence penalty Discourages repeating topics/words already used in the conversation. 0 (neutral), 0.6 (mildly varied), 1.0+ (forces variety) Useful when asking for new ideas, brainstorming, or avoiding repetition.
Frequency penalty Reduces repeated words/phrases within the same reply. 0 (neutral), 0.5 (less repetition), 1.0+ (strong reduction) Helps stop “echo loops” in lists, poetry, or summaries.