AI Gateway Routing: выбор LLM становится операционной политикой

Схема маршрутизации запросов через AI Gateway к default model, Azure DeepSeek, BYOK, ZDR и журналам стоимости — Маршрутизация моделей переносит выбор LLM из разрозненного кода в управляемую политику.

В production LLM-функции зависят не только от качества модели: важны стоимость, задержка, отказоустойчивость, хранение данных и BYOK.

Vercel announced on June 11, 2026 that Azure is now a provider path for DeepSeek V4 Pro and V4 Flash in AI Gateway. The operational lesson is broader: teams need a place to control provider order, fallback, BYOK credentials, ZDR requirements, latency budgets, and cost reporting without rewriting application code.

Что произошло

AI Gateway can route requests through multiple providers and can use providerOptions.gateway.order or only to control routing. The Azure DeepSeek update adds another provider path and shows why routing should be configurable per workload.

Related documentation describes provider timeouts for BYOK, per-request zeroDataRetention, and AI SDK integration. Together, these features turn model calls into a policy surface rather than a hard-coded provider choice.

const result = streamText({
  model: 'deepseek/deepseek-v4-pro',
  prompt,
  providerOptions: {
    gateway: {
      order: ['azure'],
      zeroDataRetention: true,
    },
  },
});

Почему это важно

Different prompts have different requirements. A low-risk summary can optimize for price and latency, while a sensitive customer-data workflow may require ZDR and stricter provider selection. Coding agents may need stable tool behavior and stronger audit trails.

The right abstraction is therefore not just a wrapper around fetch. It is a small operations layer that records request class, tenant, selected provider, fallback status, token usage, latency, and error type.

Example routing policy by request class

Low-risk summaries

Low-cost default route + short timeout

Cost and latency first

Sensitive data analysis

Force ZDR + restrict allowed providers

Retention policy per request

Enterprise BYOK

Customer or team provider credentials

Clear contract and billing ownership

Realtime UX

Provider timeout + fast fallback

Avoid waiting on a slow provider

Сигналы сообщества

Community discussion around LLM gateways often focuses on avoiding provider lock-in, starting with a reliable default, and adding routing only when real usage creates cost or reliability pressure. Treat these posts as practitioner signals, not as factual product claims.

That signal is useful because it shows the real adoption question: teams want flexibility, but they do not want routing complexity before they have observability and quality checks.

Влияние на разработку и эксплуатацию

For developers, the main change is to keep the model-call surface thin and move provider policy into configuration. For operators, failover must be paired with alerts, cost tracking, and quality regression checks.

BYOK and ZDR also need explicit ownership. BYOK can shift data-retention and billing responsibility to the direct provider agreement, while ZDR requests should fail visibly when no compliant provider is available.

Практический чеклист

• Separate the default model, low-cost fallback, and high-trust fallback by workload type.

• Treat providerOptions.gateway.order and only as environment or tenant policy, not random inline code.

• For BYOK requests, document who owns provider terms, data retention, and billing responsibility.

• Force zeroDataRetention for sensitive prompts and define the product behavior when no provider is available.

• Pair provider timeouts with a total request budget, stream-cancellation expectations, and cost logging.

• Use representative prompts, golden answers, and human-review sampling to catch model-quality regressions.

Риски и возражения

The risk is over-engineering. A small product with low usage may not need a gateway yet, and a cheap fallback without evaluation can silently reduce answer quality.

Start with request classification, logging, and a rollback plan. Add provider ordering, timeouts, BYOK, and ZDR only where they map to a real product or compliance requirement.

AI Gateway Routing: выбор LLM становится операционной политикой

Что произошло

Почему это важно

Сигналы сообщества

Влияние на разработку и эксплуатацию

Практический чеклист

Риски и возражения

Источники

다른 글

Почему ECI США 0,9% — не бюджет найма

GitHub Projects: расширенный поиск стал общедоступен

The Big One: что делать, когда садок полон

Что внутри притока TIC США на $132,2 млрд