Many Models, One Assistant¶

Assistant Engine is built around the idea that no single AI model can do everything best. Instead, it combines multiple specialized models into one assistant — balancing speed, computation cost, and domain expertise depending on the task.

By default, Assistant Engine comes with six configurable model roles.

Model Roles at a Glance¶

Role	Purpose	Typical Size	When to Go Bigger	Notes
Assistant	The “brain” — answers questions and decides which tools to use.	Large	Complex reasoning, long answers, mixed tasks.	Usually your largest model.
Embedding	Converts documents into vectors for semantic search.	Small–Medium	Very large libraries or multilingual data.	Acts as your “vectorizer” during ingestion & retrieval.
Descriptor	Summarizes files/tables before embedding (“explain before storing”).	Small–Medium	Technical data needing richer summaries.	Think auto-commenting your data for better search.
Correction	Provides rephrasings or fallback ideas when the Assistant struggles.	Small	Rarely for simple tasks.	Optional; boosts diversity.
Text-to-SQL	Translates plain English into SQL queries.	Medium	Complex schemas, tricky joins.	Pairs well with Describe Database.
Mini Task	Handles lightweight background jobs (e.g., naming chats).	Tiny	Almost never.	Quiet helper in the background.

Try Before You Save

Adjust a role, run a quick test, and only save if you’re happy. You can always revert.

Model Settings¶

Every model in Assistant Engine can be fine-tuned using the following parameters. These settings give you control over creativity, precision, and resource usage.

Setting	What it Controls	Example Values	When to Use
Model	Which local model file/variant is used.	`qwen2.5:7b`, `llama3.1:70b`, `gemma2:2b`	Choose a smaller model for speed (quick lookups, small tasks) and a larger one for deep reasoning.
Temperature	Creativity vs precision. Low = predictable. High = imaginative.	`0.2` (precise), `0.7` (balanced), `1.0+` (creative)	Use low values for code, SQL, or facts; medium for general chat; higher for brainstorming/storytelling.
Top-p	“Nucleus sampling”: keeps only the most probable words until their probabilities sum to p.	`0.8` (conservative), `0.9` (balanced), `1.0` (max freedom)	Lower for safe, reliable answers; higher for open-ended creative writing.
Top-k	Considers only the top k word choices at each step.	`20` (narrow), `50` (balanced), `200` (very broad)	Low for structured tasks (math, logic); higher for diverse wording.
Max output tokens	Maximum length of a reply.	`200` (short), `1000` (long form), `4000+` (extended essays)	Increase if responses cut off mid-sentence. Keep low for snappy answers.
Presence penalty	Discourages repeating topics/words already used in the conversation.	`0` (neutral), `0.6` (mildly varied), `1.0+` (forces variety)	Useful when asking for new ideas, brainstorming, or avoiding repetition.
Frequency penalty	Reduces repeated words/phrases within the same reply.	`0` (neutral), `0.5` (less repetition), `1.0+` (strong reduction)	Helps stop “echo loops” in lists, poetry, or summaries.