Cost Trend
Cost by Model
| Model | Requests | Cost | Avg Latency |
|---|
Tool Usage
| Tool | Type | Calls |
|---|
Authenticate to continue
| Model | Requests | Cost | Avg Latency |
|---|
| Tool | Type | Calls |
|---|
To set up Claude Code with this key, go to Connect.
| Key Prefix | Name | Team | User | Status | Created |
|---|
Create a virtual key to get started.
| User | Team | Requests | Tokens | Spend | Avg $/req |
|---|
| Endpoint | Requests | Errors | Err % | Avg Lat | p99 Lat |
|---|
| Server | Tools | Calls | Users | Err Rate |
|---|
| Team | Members | Keys | Team Budget | Resets | Spend | Usage |
|---|
Create teams to organize members and control spend budgets.
| Time | Type | User / Team | Threshold | Spend | Limit | Delivered |
|---|
| User | Status | Team | Budget | Role | Created |
|---|
Add members to manage who can access the gateway.
Each endpoint is a Bedrock runtime client with its own account, region, and credentials. CCAG selects an endpoint per-request based on the team's routing strategy, endpoint priority, and health status. Adding multiple endpoints lets you pool quota across accounts or regions.
Routing Target
Credentials
Required IAM permissions
bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream — required for inferencebedrock:ListInferenceProfiles — required for model discovery and health checksservicequotas:ListServiceQuotas — optional; needed for quota visibility in this portalPrompt caching & routing strategy
Bedrock caches the model's internal state (KV cache) for repeated prompt prefixes. Cache hits cost 0.1× the standard input price (90% savings), while cache writes cost 1.25× (25% premium). The cache has a 5-minute sliding TTL — each hit resets the timer. An extended 1-hour TTL is available for Claude 4.5+ models.
Why Sticky User is the default: Claude Code sessions have large, stable prompt prefixes (system prompt + tools + conversation history). Routing the same user to the same endpoint maximizes cache hits. Round Robin spreads users across endpoints, causing each endpoint to see different prefixes and miss the cache — this can significantly increase costs.
Add a Bedrock endpoint to start routing requests.
Amazon Bedrock does not support Anthropic's web_search server tool natively.
When Claude Code asks to search the web, this gateway intercepts the request,
executes the search using your configured provider, and returns results in the format Claude expects.
This is a well-known pattern used by API gateways like LiteLLM Proxy to enable web search on providers that don't support it natively. The search happens server-side — Claude Code sees the results as if they came from Anthropic's own search infrastructure.
What's different from Anthropic Direct API
encrypted_content won't work (CC still shows titles + URLs)web_search tool is transparently replaced with a regular tool definition for BedrockPrivacy note: Search queries may contain context from your conversation. If you're working with sensitive code or proprietary information, consider which search provider you trust with those queries.
Configure a single notification destination for all gateway events (budget alerts, rate limits). Events are delivered asynchronously every 30 seconds. Three destination types are supported:
sns:Publishevents:PutEvents
BYO resources: Create your own SNS topic or EventBridge bus, then add a resource-based policy
allowing the CCAG task role to publish. The Task Role ARN is available as a CDK output (TaskRoleArn).
Workflow: Configure a destination → Save as draft → Test delivery → Activate. You can change or deactivate the destination at any time without a redeploy.
<task-role-arn> with the TaskRoleArn CDK output.{
"source": "ccag",
"version": "1",
"category": "budget",
"event_type": "budget_warning",
"severity": "warning",
"user_identity": "[email protected]",
"team_id": "uuid",
"team_name": "frontend",
"detail": {
"threshold_percent": 80,
"spend_usd": 41.20,
"limit_usd": 50.00,
"percent": 82.4,
"period": "weekly",
"period_start": "2026-03-17T00:00:00Z"
},
"timestamp": "2026-03-19T14:30:00Z"
}
| Time | Event | Status | Duration | Error |
|---|
No delivery history yet
| Method | Status | Description | |
|---|---|---|---|
| Virtual Keys | Database-backed virtual keys with rate limiting | ||
| OIDC / SSO | JWT tokens from identity providers | ||
| Admin Login | Username/password bootstrap login |
CCAG supports two complementary mechanisms for managing user access from your identity provider (Okta, Entra ID, authentik, etc.):
OIDC Single Sign-On (Authentication)
Users authenticate via your IdP using standard OpenID Connect. CCAG validates JWT tokens against the IdP's JWKS endpoint. Supports device code, authorization code, and implicit flows. Multiple IdPs can be active simultaneously.
SCIM 2.0 Provisioning (User Lifecycle)
Your IdP pushes user creates, updates, and deactivations to CCAG's SCIM endpoint (/scim/v2).
This gives you centralized control over who has access — deactivating a user in your IdP immediately blocks their CCAG access.
Setup Steps
<your-gateway-url>/scim/v2 with the bearer token
CLI alternative: All SCIM configuration can be automated via ccag scim enable,
ccag scim create-token, and ccag scim set-admin-groups.
| Name | Issuer | Status | SCIM |
|---|
Add an OIDC provider to enable SSO authentication.
| Session Token TTL |
hours
How long gateway session tokens remain valid (1–8760 hours). Changing this only affects new tokens.
|
| Mode |
|
| Global Provider |
Provider
API Key
Key is not stored in plain text — re-enter to set globally.
Max Results
|
| Version | |
| Proxy URL | |
| Health | Healthy |
| Database | Connected |
| Authentication | |
| Identity Providers |
| Model Prefix | Input $/MTok | Output $/MTok | Cache Read $/MTok | Cache Write $/MTok | Source | Updated |
|---|
Click "Refresh from AWS" to pull the latest prices, or add entries manually.
| User | Budget | Spend | Requests |
|---|
| Prefix | Name | Owner | Status | Created |
|---|
bedrock:InvokeModel*, bedrock:ListInferenceProfiles, and optionally servicequotas:ListServiceQuotas.sts:ExternalId. Prevents confused-deputy attacks.