Product

Introducing Smart LLM Routing v2.0

Partython TeamApril 26, 20265 min read

How our query classifier sends simple questions to a fast, low-cost model and complex reasoning to a premium one — cutting AI costs by 60% with no drop in quality.

Not every customer message needs the most powerful AI model. "What time do you close?" and "Can you compare these two plans for my use case?" are very different questions — but most platforms send both to the same expensive model. Smart LLM Routing v2.0 fixes that.

When a message arrives, a lightweight classifier reads it first and decides how hard the question really is. Simple, factual questions are routed to a fast, low-cost model. Questions that need reasoning, judgement, or multi-step logic are routed to a premium model. The customer never notices the difference — they just get a good answer quickly.

The result for our customers is a 60% reduction in AI costs while keeping answer quality the same. The classifier adds only about 15 milliseconds before the main reply, so conversations stay fast.

Here is what happens behind the scenes on every message:

1. Classification — a small, fast model scores the message for complexity.

2. Provider selection — the router picks the best-fit AI provider for that score.

3. Failover — if the chosen provider is slow or unavailable, the router automatically retries with the next best one, so a conversation never stalls.

4. Cost tracking — every reply is logged with its token usage and cost, so you always know what you are spending.

Smart LLM Routing is on by default for Pro and Enterprise plans. Starter plans use a single provider, which you can pick in your settings.

Ready to deploy your AI agents?

Join businesses transforming customer engagement with intelligent, multi-channel AI agents.

Start free trial Talk to sales