Add performance features: caching, cost tracking, retry, compaction, classification, scrubbing

Inspired by zeroclaw's lightweight patterns for slow hardware: - Response cache (SQLite + SHA-256 keyed) to skip redundant LLM calls - History compaction — LLM-summarize old messages when history exceeds 50 - Query classifier routes simple/research queries to cheaper models - Credential scrubbing removes secrets from tool output before sending to LLM - Cost tracker with daily/monthly budget enforcement (SQLite) - Resilient provider with retry + exponential backoff + fallback provider - Approval engine gains session "always allow" and audit log Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 09:20:52 +02:00
parent e24e3026b6
commit 872ed24f0c
10 changed files with 694 additions and 17 deletions
--- a/config.yaml
+++ b/config.yaml
@@ -43,3 +43,12 @@ agents:
 orchestrator:
  max_concurrent: 5
  delegation_timeout: 120
+
+performance:
+  cache_ttl: 3600
+  daily_budget_usd: 5.0
+  monthly_budget_usd: 50.0
+  fallback_model: nvidia_nim/deepseek-ai/deepseek-v3.1
+  model_routing:
+    simple: nvidia_nim/deepseek-ai/deepseek-v3.1
+    research: nvidia_nim/moonshotai/kimi-k2.5