Multi-Model AI¶
Assistant Engine is built around the idea that no single AI model can do everything best. Instead, it combines multiple specialized models into one assistant — balancing speed, computation cost, and domain expertise depending on the task.
By default, Assistant Engine comes with six configurable model roles.
Model Roles at a Glance¶
Role | Purpose | Typical Size | When to Go Bigger | Notes |
---|---|---|---|---|
Assistant | The “brain” — answers questions and decides which tools to use. | Large | Complex reasoning, long answers, mixed tasks. | Usually your largest model. |
Embedding | Converts documents into vectors for semantic search. | Small–Medium | Very large libraries or multilingual data. | Acts as your “vectorizer” during ingestion & retrieval. |
Descriptor | Summarizes files/tables before embedding (“explain before storing”). | Small–Medium | Technical data needing richer summaries. | Think auto-commenting your data for better search. |
Correction | Provides rephrasings or fallback ideas when the Assistant struggles. | Small | Rarely for simple tasks. | Optional; boosts diversity. |
Text-to-SQL | Translates plain English into SQL queries. | Medium | Complex schemas, tricky joins. | Pairs well with Describe Database. |
Mini Task | Handles lightweight background jobs (e.g., naming chats). | Tiny | Almost never. | Quiet helper in the background. |
Try Before You Save
Adjust a role, run a quick test, and only save if you’re happy. You can always revert.
Model Settings¶
Every role in Assistant Engine can be fine-tuned using the following parameters. These settings give you control over creativity, precision, and resource usage.
Setting | What it Controls | Example Values | When to Use |
---|---|---|---|
Model | Which local model file/variant is used. | qwen2.5:7b , llama3.1:70b , gemma2:2b |
Choose a smaller model for speed (quick lookups, small tasks) and a larger one for deep reasoning. |
Temperature | Creativity vs precision. Low = predictable. High = imaginative. | 0.2 (precise), 0.7 (balanced), 1.0+ (creative) |
Use low values for code, SQL, or facts; medium for general chat; higher for brainstorming/storytelling. |
Top-p | “Nucleus sampling”: keeps only the most probable words until their probabilities sum to p. | 0.8 (conservative), 0.9 (balanced), 1.0 (max freedom) |
Lower for safe, reliable answers; higher for open-ended creative writing. |
Top-k | Considers only the top k word choices at each step. | 20 (narrow), 50 (balanced), 200 (very broad) |
Low for structured tasks (math, logic); higher for diverse wording. |
Max output tokens | Maximum length of a reply. | 200 (short), 1000 (long form), 4000+ (extended essays) |
Increase if responses cut off mid-sentence. Keep low for snappy answers. |
Presence penalty | Discourages repeating topics/words already used in the conversation. | 0 (neutral), 0.6 (mildly varied), 1.0+ (forces variety) |
Useful when asking for new ideas, brainstorming, or avoiding repetition. |
Frequency penalty | Reduces repeated words/phrases within the same reply. | 0 (neutral), 0.5 (less repetition), 1.0+ (strong reduction) |
Helps stop “echo loops” in lists, poetry, or summaries. |