# Use AllToken with SillyTavern

[AllToken](https://alltoken.ai) is an OpenAI-compatible AI gateway that gives you Claude, GPT, Gemini, DeepSeek, Qwen, GLM, Kimi, and Minimax behind a single API key — at a fraction of the per-token cost of going direct. Setup in SillyTavern takes about three minutes.

## Why AllToken for SillyTavern

SillyTavern sessions burn tokens fast — long character cards, deep context windows, frequent regenerations. AllToken's pricing on **DeepSeek V4**, **Qwen 3.6**, and **GLM 5.1** is **5–15× cheaper** than equivalent Western frontier models, with comparable roleplay quality. You can switch to **Claude Opus** or **GPT 5** on the same key for high-stakes scenes without juggling multiple subscriptions.

## Prerequisites

- A SillyTavern install (any recent version)
- An AllToken API key — sign up at [alltoken.ai](https://alltoken.ai), grab the key from **Settings → API Keys**
- Some credit on the key — $5 covers many hours of chat

## Setup (8 steps)

1. In SillyTavern, click the **API icon** (top-left) and switch the API to **Chat Completion**
2. Under **Chat Completion Source**, select **`Custom (OpenAI-compatible)`**
3. **Custom Endpoint (Base URL)** — enter:
   ```
   https://api.alltoken.ai/v1
   ```
   ⚠️ Include `/v1`. **Do NOT** add `/chat/completions` — SillyTavern appends it automatically.
4. **Custom API Key** — paste your AllToken key (starts with `sk-at-...`)
5. *(Optional)* Tick **Bypass API status check** if you see false-positive connection warnings
6. Click **Connect** — SillyTavern fetches `/v1/models` and populates the model dropdown
7. **Custom Model ID** — pick a model from the dropdown (see recommendations below)
8. Click **Test Message** to verify

If you get a reply, you're done. Start a chat.

## Recommended models for roleplay

| Model ID | Best for | Notes |
|---|---|---|
| `deepseek-v4-pro` | Cheap daily driver — Chinese + English roleplay | Strong character voice consistency |
| `qwen3.6-max-preview` | Long character cards, long sessions | High context limit, Chinese-strong |
| `claude-sonnet-4-6` | Top-tier dialogue, faster than Opus | Best balance of quality and cost |
| `claude-opus-4-7` | High-stakes scenes, maximum quality | Premium model |
| `glm-5.1` | Cheap casual roleplay | Lightweight |
| `kimi-k2.6` | Chinese, long context | Cheap |

The full live list (38+ chat models as of writing) is at <https://alltoken.ai/docs/apis/models>.

## Tips

- **Switch models mid-session.** Change the **Custom Model ID** without disconnecting — use a cheap model for ambient roleplay, swap to Opus for a key scene, swap back.
- **Streaming works.** Leave SillyTavern's **Streaming** toggle on (the default). AllToken returns OpenAI-format SSE streams.
- **Tool / function calling** is supported on most models. If you hit a prompt-formatting issue, set **Prompt Post-Processing** to `None` or `Merge`.
- **Cost tracking in real time** — check spend at <https://alltoken.ai> (Settings → Billing). Every response carries token usage you can monitor.

## Troubleshooting

| Problem | Fix |
|---|---|
| `401 invalid_api_key` | Re-paste the key without trailing whitespace; confirm it starts with `sk-at-` |
| `404` from SillyTavern | You added `/chat/completions` to the base URL — remove it, keep only `/v1` |
| Model dropdown is empty | Click **Connect** first. If still empty, network may be blocking `/v1/models`; type a model ID manually |
| `402 insufficient_balance` | Top up at AllToken Settings → Billing |
| Slow first token | Switch from `claude-opus-4-7` (thinks deeply) to `claude-sonnet-4-6` or `deepseek-v4-pro` for faster TTFT |

## Cost expectations

A typical 8-hour SillyTavern session on `deepseek-v4-pro` runs ~$0.30 – $1.50, depending on context size. The same workload on Claude Opus billed direct from Anthropic is $10 – $30. Through AllToken on `claude-opus-4-7`, expect $3 – $8 — still meaningfully cheaper than direct, while using the same model.

## See also

- AllToken API reference — <https://alltoken.ai/docs/apis/overview>
- Available models — <https://alltoken.ai/docs/apis/models>
- SillyTavern official docs — <https://docs.sillytavern.app/>
- AllToken Skills (for agent runtimes like Hermes, OpenClaw, Claude Code) — <https://github.com/alltoken-ai/alltoken-skills>
