Overview
Twin Fidelity evals let you measure how well your twin represents your knowledge. Define expected Q&A pairs, run them against your twin, and track the score over time.Commands
Add an Eval Question
Bulk Import
List Eval Questions
Run Evals
View History
Delete
Scoring
| Threshold | Status | Meaning |
|---|---|---|
| 70-100 | ✅ Pass | Twin covers the key information |
| 40-69 | ⚠️ Warn | Partially covered, needs more content |
| 0-39 | ❌ Fail | Missing or wrong — add content to your KB |
Weight
| Weight | Multiplier | Use for |
|---|---|---|
high | 2x | Core methodology, pricing, key differentiators |
medium | 1x | Standard knowledge (default) |
low | 0.5x | Nice-to-have, edge cases |
Workflow
- Baseline — Add 10-20 eval questions covering your core knowledge
- Run — Get your baseline score
- Improve — Ingest content targeting failed evals
- Re-run — Track improvement
- Repeat — Add new evals as your twin’s scope grows