Workflow first
Choose a repeated task with clear review criteria before choosing a default model.
Use cases
This page is written as an implementation guide, not a list of invented customer wins. Use it to decide which workflow to test, which model route to start with and what quality evidence to collect before sending real production traffic.
Choose a repeated task with clear review criteria before choosing a default model.
Separate keys for bots, batch jobs, IDE tools and staging so spend is easier to audit.
Compare real examples, not only demos, and keep failed outputs in the evaluation set.
Implementation patterns
The safest first project is usually a narrow workflow with a human review loop and a clear definition of a useful answer.
Support volume is rising and the team needs a lower-cost first answer before escalation.
SaaS teams with repeated tickets, help-center articles and multilingual users.
Start with a retrieval step that sends the model only the article, product state and customer message needed for one answer. Use a budget-limited API key for the bot service, keep the key server-side and log request IDs with ticket IDs.
Measure answer helpfulness, citation accuracy, escalation rate and cases where the assistant should have refused to answer. Review failed conversations weekly before increasing automation.
Do not let the model guess account-specific billing, refunds or security status. Route those questions to human support.
The team wants model choice beyond a single vendor while keeping OpenAI-compatible client code.
Engineering teams testing Chinese coding and reasoning models inside IDEs, CLIs or internal review tools.
Give each tool a separate API key, set a daily budget, and record model, latency and token cost beside repository or issue metadata. Keep prompts short enough that diffs and logs stay reviewable.
Track accepted suggestions, reverted suggestions, test pass rate after generated patches and the number of follow-up prompts needed for a useful answer.
Never paste private credentials or production logs into prompts. Use redaction before sending stack traces or customer data.
Human graders need a first pass that is consistent enough to edit, not a final automated grade.
Edtech products that need draft feedback, rubric checks or language-learning comments at scale.
Represent the rubric as structured prompt context, ask for short reasons for each score, and keep the final grade decision in the product workflow. Use one model for feedback and another route for moderation or policy checks when needed.
Sample graded work by course, language and difficulty. Compare consistency against the rubric and audit for overly confident or overly generic feedback.
Avoid presenting model feedback as final academic judgment without review, appeal or teacher override.
Teams have many small documents and lose time asking the same internal questions.
Operations, finance or product teams searching policies, release notes and runbooks.
Use retrieval with document permissions, send source snippets with clear titles, and ask the model to return answer, source, confidence and next action. Separate read-only question answering from any workflow that changes records.
Review no-answer rate, source coverage, hallucinated citations and whether the assistant correctly says when a document is missing.
Permission boundaries matter more than model choice. Do not retrieve documents the requesting user should not see.
Decision guide
A model route should be chosen because it helps a specific job, not because it is the newest name in the catalog.
| Situation | Starting point | Validation note |
|---|---|---|
| Need lowest-cost reasoning | Start with DeepSeek | Run the same prompt set against Qwen and Kimi before production. |
| Need long document reading | Start with Kimi | Chunk very long files and measure token spend per useful answer. |
| Need broad multilingual product behavior | Start with Qwen | Check tone, structured output and language switching with real user examples. |
| Need predictable spend | Use budget-limited keys | Set separate keys for staging, batch jobs and customer-facing traffic. |
No. This page is a practical implementation guide for common SmarToken use cases. It avoids fabricated customer names or performance claims and focuses on architecture, quality checks and rollout steps.
Start with a narrow workflow that already has examples and review criteria: one support intent, one coding task class, one rubric or one internal document collection.
Use the same prompts, expected outputs and scoring rubric across models. Compare quality, latency, cost and failure modes before changing the default route.
Measure useful answer rate, cost per accepted output, latency, refusal behavior, escalation rate and incidents where the model used missing or outdated context.