Self-Hosted

Own Your AIInfrastructure

A self-hosted platform for deploying, managing, and scaling conversational AI across your organization. Multi-model. Enterprise-grade. Yours to control.

chat.yourdomain.com
Claude Opus 4.6
Agents
Chat Models
Image Generation

Built for Organizations that take AI seriously

Data sovereignty — conversations, attached documents, and usage data never leave your infrastructure.

You also get true multi-model flexibility (not locked to one vendor), granular cost controls per team, and the ability to customize behavior through system prompts, guardrails, and agent skills.

But we still use external LLM API providers...?

Even when using external LLM providers, API access carries a smaller risk and exposure profile than full SaaS platforms. API requests are retained for a shorter time and might be better protected then full ChatGPT-like Web App platforms hosting your entire data indefinitely, and having much larger attack surface. Many reputable LLM providers also offer Zero Data Retention (ZDR) guarantees.

A polished, high-quality chat experience that users actually prefer — fast streaming, rich Markdown and code rendering, math, diagrams, file and image attachments, and a clean model picker across providers.

Multi-model from a single interface — Claude, gpt-5.x, Gemini, local open-weights — plus image generation in the same workspace. Knowledge Collections can be attached to any chat for RAG or full-context grounding, and MCP Tools let chats reach into your internal systems and APIs.

Cowork Agent mode runs Claude Code in sandboxed containers with configurable Skills — file operations, browser automation, diagram and office-document authoring, persistent memory, and more — turning any chat into an agentic session when you need it.

Personas, system prompts, and shareable chat templates let teams standardize how the AI behaves for specific workflows, without locking individuals out of free-form use.

A specialized guardrails module compatible with LiteLLM and configurable via the ControlTower admin dashboard — covering PII, secrets, and other policy-sensitive content on both inbound prompts and outbound responses.

Built to be flexible: new guardrail types and policies can be added or customized on-site to match your organization's data classifications, compliance needs, and tolerance thresholds — not a fixed black-box ruleset.

Strong focus on minimizing false positives. A two-stage pipeline (fast heuristic detection followed by a context-aware LLM classifier with caching) keeps obvious non-issues from blocking legitimate work — because guardrails that misfire ruin the user experience and break agents in production.

LiteLLM request Detection Heuristic · Fast · High Recall candidates Cache Dedup Lookups new only LLM Classifier Local LLM · True / False + true positive Fail / Redact Block · Mask · Log Triggered per-request inside LiteLLM guardrails callback
LLM classifier for Secrets (example)

Classifies whether a detected candidate string is a real credential (password, API key, token, private key) or a harmless match (code identifier, hash, UUID, public key, color code, placeholder). Examples of NOT a credential: documentation placeholders, commit hashes, package hashes, version numbers, color codes, PNG data, SVG strokes, file names, dates. Examples of a real credential: randomly-generated passwords, API keys, tokens, secrets, private keys that don't look like demo values. The classifier receives both the flagged string and surrounding context to make an informed decision.

LLM classifier for Customer PII (example)

Classifies whether a detected candidate string is a sensitive customer PII that should not be sent to external LLM API. Examples of NOT PII for this purpose: placeholder values ("John Doe", "test@test.com"), company names, product names, employee business contacts e.g. from email headings or signatures, business email addresses, internal Customer IDs, fictional characters, general mentions of public figures (e.g. "President George W Bush"). Depending on a company policy, even an isolated one-off occurence of low-sensitivity PII elements e.g. a customer's email address may be allowed e.g. in troubleshooting context if not accompanied by other highly sensitive revelaing data. Examples of real PII: data extracted from database records or user profiles, customer PII appearing at scale in structured data, high-sensitivity identifiers like SSNs or credit card numbers. The classifier uses surrounding context to distinguish a customer database dump from routine business correspondence.

Pay for actual compute consumption to your LLM API providers, not per-user subscriptions.

In larger organizations, most users are low-volume; API consumption-based pricing is typically far more cost-effective than flat per-seat fees.

Robust cost tracking and budget controls keep that model safe at scale — per-user, per-team, and per-API-key budgets with threshold alerts, spend forecasting, and usage analytics down to the individual request. Spend stays predictable, and overages are caught before they become invoices.

Without a gateway, teams end up with scattered API keys, no visibility into spend, and no guardrails. SecondStack centralizes all LLM traffic — from chat users, internal apps, agents, and IDE plugins — through LiteLLM.

Every request gets virtual API key authentication, per-user and per-team budget enforcement, usage analytics, and guardrails for PII and secrets.

Your apps call standard OpenAI-compatible endpoints (/chat/completions, /responses) with virtual keys, and the gateway handles routing, failover, load balancing, observability, and policy across all supported models.

The embedded LiteLLM Proxy is the free, open-source build, and our team supports it end-to-end alongside the rest of the platform — one point of contact for the whole deployment.

Naturally integrates with Databricks for organizations that already run their data and AI workloads on the Lakehouse.

Use Databricks Foundation Models as an upstream LLM provider through the LiteLLM gateway — Claude, gpt-5, Gemma, Qwen, Llama, and other served endpoints billed to your Databricks account.

Invoke Databricks Mosaic AI Agents directly from the Chat UI as first-class chat profiles: Knowledge Agents, Supervisor Agents, Custom Agents

Expose Genie Spaces and Databricks SQL as MCP Tools for the Chat App models and Cowork Agent for natural-language analytics over your governed data

End-to-end OAuth on behalf of end-users. Every Databricks call is authenticated as the actual user, so Unity Catalog permissions, row/column-level security, and audit trails apply to AI traffic exactly as they do to a human analyst.

Architecture Overview

A modular, self-hosted stack built on proven foundations.

Users User App Dashboard Self-Serve API Keys · Usage Reporting Control Tower: Admin UI Models & Providers · Guardrails · Skills · MCP · Budgets · Teams /chat/completion /responses /messages LiteLLM LLM Routing Gateway · Virtual API Keys · Spend Tracking · Budget Enforcement Enterprise Apps, Agents & IDEs Users Chat App UI Multi-model chat with agentic capabilities Containerized Cowork Agent Agent Harness + Skills Kreuzberg · Docling Content Extraction & Document Processing Guardrails Pre-Flight Policy Checks Secrets PII OpenResty Ingress Front Door Authentik SSO Auth Center PostgreSQL Database Qdrant Vector DB Redis Cache Kubernetes | Docker Compose LLM PROVIDERS Anthropic Claude OpenAI gpt-5 Google Gemini Databricks Foundation Models OpenRouter 100+ models vLLM · Local Inference

Everything your team needs

A complete conversational AI platform, not just a chat interface.

Chat App

Multi-model chat

Claude, gpt-5.x, Gemini, local models — chat, agents, and image generation all in one place.

Agents
Chat Models
Image Generation
Cost Management

Budget & spend controls

Per-user, per-team, and per-API-key budgets. Threshold alerts, spend forecasting, and usage analytics down to the request level.

Total Platform Spend $798 / $1,200
Remaining: $402
Engineering $642 / $1,000
Chat App $380
API Keys $64
Marketing $156 / $200
78% of budget used
Chat App $32
API Keys $112
d.kowalski
Chat App — Standard Spend: $312 · Budget: $450
Chat App — Agent Mode Spend: $47 · Budget: $180
API Keys
VSCode Spend: $142 · Budget: $200
local-testr-rag-agent Spend: $89 · Budget: $150
Remaining unallocated budget $460
Ranked by total spend this period
a.chen
$268
j.martinez
$251
d.kowalski
$189
r.patel
$176
s.nakamura
$162
l.dubois
$148
m.oconnor
$97
k.berg
$82
n.volkov
$64
p.santos
$51
e.wright
$38
AI Agent

Cowork agent

Cowork Agent runs in sandboxed containers with dozens of Skills — file operations, browser automation, image generation, diagrams authoring, office documents, persistent memory, etc.

create a hello world program in this workspace
Tools

Done. Created and ran hello_world.py — it prints "Hello, World!" as expected.

hello_world.py
Preview · Download
Attach files
Engineering Wiki
Customer Research
Sales Playbook
+ New Knowledge…
MCP servers
Local Files
Context7
Atlassian Suite
+ Manage MCP…
SSH workspace running
ssh 44a43ef4@localhost -p 2222
Open in VS Code
Auto-stop in 1h · extends on activity
Model
Claude Opus 4.7
Claude Sonnet 4.6
Claude Haiku 4.5
Sandbox files
.plans
hello_world.py 23 B
Type your message here...
Knowledge & RAG

Knowledge collections

Named document sets, referenced in chats. Files are auto-indexed for RAG retrieval. Share with teams or specific users.

System Design

File Comment (model-visible) Mode Tokens
architecture-backend.md RAG 3k
architecture-frontend.md RAG 4k
sync-jobs.md Background Sync Jobs RAG 1k
Explain the ControlTower flow on user deactivation
System Design
API Routing

LiteLLM-based API routing

Centralize LLM API traffic from enterprise apps and local agents with user virtual API keys, guardrails, spend observability and budgets enforcement.

Claude CLI Codex CLI Cursor IDE VS Code IDE OpenClaw Obsidian Chatbot Agents User Virtual API Keys LiteLLM Routing · Guardrails Budgets · Observability Enterprise API Keys LLM PROVIDERS Anthropic OpenAI Google OpenRouter Databricks Local Inference
ControlTower

Admin dashboard

Centralized model management, provider configuration, system prompts, and deployment controls. One panel for your entire AI stack.

admin.yourdomain.com/dashboard/admin?section=models
Control Tower
A

Models

Model Display Name Access Health Order Actions
cowork-agentagent
Cowork Agent 2 Teams 2/2 OK
llm-councilagent
LLM Council 2 Teams 1/1 OK
Databricks
sales-genieagent
Sales Genie Space Sales Team 1/1 OK
Anthropic
claude-opus-4-6
Claude Opus 4.6 All Teams 2/2 OK
OpenAI
gpt-5-5
GPT-5.5 All Teams 1/1 OK
Gemini
gemini-3-1-pro
Gemini 3.1 Pro All Teams 1/1 OK
Gemini
gemini-3-flash
Gemini 3 Flash All Teams 1/1 OK
Databricks
qwen3-next-80b
Qwen 3 Next 80B All Teams 1/1 OK
OpenAI
gpt-image-2image
GPT Image 2 All Teams 1/1 OK
Gemini
nano-banana-2image
Nano Banana 2 All Teams 1/1 OK
OpenAI
text-embedding-3-smallembedding
Text Embedding 3 Small All Teams 1/1 OK

Users

User Email Role Teams Last Active Actions
JC
Jane Chen
jane.chen@acme.com Admin Platform 2 min ago
MR
Mike Rodriguez
mike.r@acme.com User Engineering 14 min ago
SP
Sarah Park
sarah.p@acme.com User Design 1 hr ago
AK
Alex Kim
alex.kim@acme.com Team Manager Engineering 3 hrs ago
LW
Lisa Wang
lisa.w@acme.com User Marketing 1 day ago

Providers

A
Anthropic Direct API Key
connected
Models4
Latency142ms
Uptime99.9%
O
OpenAI API Key
connected
Models3
Latency98ms
Uptime99.8%
AWS
AWS Bedrock IAM Role
connected
Models2
Latency167ms
Uptime99.7%
G
Google AI Service Account
degraded
Models1
Latency312ms
Uptime97.2%
Databricks
Databricks PAT Token
connected
Models2
Latency186ms
Uptime99.6%

Teams

E
Engineering 12 members
active
Budget$2,400/mo
Used$1,847
Models6
D
Design 5 members
active
Budget$800/mo
Used$612
Models4
M
Marketing 8 members
active
Budget$600/mo
Used$423
Models3

Groups

GroupSourceMembersTeamsActions
engineering-allAzure AD24Engineering, Platform
design-teamAzure AD8Design
marketing-allAzure AD12Marketing

Analytics & Logs

Total Requests48,291+12.4%
Total Spend$3,847+8.2%
Avg Latency156ms-5.1%
Error Rate0.3%-0.1%
Usage Over Time
$300$225$150$75$0
Feb 1Feb 5Feb 10Feb 15Feb 20Feb 25Mar 1
DatetimeModelUserCachedInput TokOutput TokDurationStatus
2026-03-01 14:32:01claude-sonnet-4-6jane.chen12,4801,2048471.2s200
2026-03-01 14:31:58gpt-4.1mike.r03,8201,2040.8s200
2026-03-01 14:31:45claude-opus-4-6alex.kim8,1922,5605,1023.4s200
2026-03-01 14:31:32gemini-2.5-flashsarah.p4,0965128920.4s200

General Configuration

User Limits
Default New User Budget
$400
Budget Reset Period
30d
Max API Keys Per User
10
Chat App
Default Model
claude-sonnet-4-5-20250929
Key Budgets
Cowork Agent Key Budget
$100
Chat App Key Budget
$100
New API Key Budget
$50
Budget Alerts
Per-Key Alerts
enabledthreshold: 0.7
Per-User Alerts
enabledthreshold: 0.7
Team Settings
Default Team
Default Team

Notifications

Budget Alerts
enabled
Health Alerts
enabled
Slack Webhook
https://hooks.slack.com/...4xK
Email Recipients
admin@acme.com, ops@acme.com
Broadcast Alert to Users
All Users Engineering Research
2026-03-05 04:00 UTC

MCP Servers

ServerTypeToolsAccessStatusActions
Context7http2 toolsAll Usersactive
Web Search and Fetchsse2 toolsMarketing Teamactive

Guardrails

RuleScopeActionStatus
PII DetectionAll ModelsBlock & Alertactive
Prompt InjectionCowork AgentBlockactive
Secrets DetectionAll ModelsBlock & Alertactive

Localized Content

Manage system prompts, help content, and other localized strings

General
NamePreview
Budget Alert Key
Last updated: 3/2/2026
Your {platformName} API Key "{displayName}" has used {pct}% ...
Setup Instructions
Last updated: 3/2/2026
export ANTHROPIC_BASE_URL={{baseUrl}} export ANTHROPIC_A...
System Prompts
NamePreview
Default
Last updated: 3/2/2026
You are Chat App, an access interface to AI Language Models ...
Cowork Agent
Last updated: 3/2/2026
COWORK AGENT SYSTEM PROMPT (AMENDS PREVIOUS INSTRUC...

Configs Deployment

LiteLLM Config
syncedLast deployed 4 min ago
Chat App Config
pending2 changes waiting
Guardrails Config
syncedLast deployed 1 hr ago
User Portal

User dashboard

Personal budget tracking, API key management, and per-request usage logs — full visibility into your own AI consumption.

app.yourdomain.com/dashboard
SecondStack
d.kowalski

Budget & Spend

d.kowalski
Chat App — Standard Spend: $312 · Budget: $450
Chat App — Agent Mode Spend: $47 · Budget: $180
API Keys
VSCode Spend: $142 · Budget: $200
local-testr-rag-agent Spend: $89 · Budget: $150
Remaining unallocated budget $460

API Keys

Manage your API access keys

Name Team Key Spend (current period) Budget Created Actions
RAG Pipeline Agent Default Team sk-...Mx4p $0.18 $50.00 04/17/2026 10:36
Data Export Script Default Team sk-...K9rw $0.00 $30.00 03/04/2026 08:54
Analysis Notebook Default Team sk-...Bz7n $0.00 $25.00 03/03/2026 11:15
Local Dev Testing Default Team sk-...Qh2s $0.00 $40.00 02/03/2026 13:19
Slack Bot Connector Default Team sk-...Ty6v $0.00 $35.00 02/02/2026 02:30

My Usage & Logs

Usage Over Time
$36$27$18$9$0
Apr 5Apr 10Apr 15Apr 20Apr 25May 1
Total Requests 3,247
Total Spend $329.58
Input Tokens 328.2 mln
Output Tokens 3.1 mln
Showing 1 – 25 of 3,247 records  ·  Page 1 of 130
Time (EDT) Status API Endpoint Cost Duration API Key Model Cached In New In Output
2026-05-03 18:49:14 success
/v1/messagesanthropic_messages
$0.05 1.60 s Chat App claude-sonnet-4-6 165,956 101 51
2026-05-03 18:44:00 success
/v1/messagesanthropic_messages
$0.05 840 ms Chat App claude-sonnet-4-6 165,509 448 34

API Playground

Pick an endpoint, an API key, and a model — the cURL preview reflects the exact request that will be sent.

Local Dev Testing
Anthropic Messages (/v1/messages)
claude-haiku-4-5-20251001
With web search tool
Why is the sky blue? Explain in 20 words
curl -s "https://litellm.yourdomain.com/v1/messages" \
  -H "Authorization: Bearer sk-...Qh2s" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-haiku-4-5-20251001","max_tokens":1024,"messages":[{"role":"user","content":[{"type":"text","text":"Why is the sky blue? Explain in 20 words","cache_control":{"type":"ephemeral"}}]}]}' | jq

Up and Running in Minutes

Three steps from zero to a production AI platform.

01

Deploy

Provision a Linux VM and run the platform setup. Docker Compose spins up everything — PostgreSQL, Redis, Meilisearch, and the rest — pre-configured.

bash ~/SecondStack
# Initialize the platform
cd SecondStack
./dev-setup.sh
just init-full
        
02

Connect providers

Add your LLM provider API keys through ControlTower. Supports Anthropic, OpenAI, Google AI Studio, OpenRouter, Databricks - or your own inference endpoints. Manage access and pricing.

Anthropic
Anthropic
OpenAI
OpenAI
Gemini
Google AI Studio
OpenRouter
OpenRouter
Databricks
Databricks
03

Manage Users and Teams

Set up user teams, assign budgets, and synchronize with your enterprise IdP groups. Control spend and budgets at the team level.

Engineering $342 / $500
Product $89 / $200
Design $56 / $150
Support $21 / $100
Marketing $134 / $300

Self-Hosted. Deploy Anywhere.

Run SecondStack on your own infrastructure, with our help.

Enterprise

Contact Us

Tailored to your needs

For organizations deploying SecondStack on their own infrastructure.

  • All platform features
  • Docker Compose & Kubernetes deployment
  • Priority support & SLA
  • Custom integrations
  • Deployment assistance
  • Security review & hardening
Contact Us

Frequently asked questions

Can't find your answer? Get in touch.

Why self-host AI instead of using ChatGPT Enterprise or similar SaaS?
Data sovereignty — conversations, documents, and usage data never leave your infrastructure. You also get true multi-model flexibility (not locked to one vendor), granular cost controls per team, and the ability to customize behavior through system prompts, guardrails, and agent skills. No per-seat licensing — deploy for your entire organization at infrastructure cost only.
Why does an organization need an LLM routing gateway?
Without a gateway, teams end up with scattered API keys, no visibility into spend, and no guardrails. SecondStack centralizes all LLM traffic — from chat users, internal apps, agents, and IDE plugins — through LiteLLM. Every request gets virtual API key authentication, per-user and per-team budget enforcement, usage analytics, and guardrails for PII and secrets. Your apps call OpenAI-compatible endpoints (/chat/completions, /responses) with virtual keys, and the gateway handles routing, observability, and policy.
LiteLLM vs Databricks AI Gateway
Both are viable LLM gateways. Databricks AI Gateway is attractive when an organization is already heavily invested in the Databricks platform and wants LLM usage billed through DBUs. SecondStack uses LiteLLM, which gives broader provider coverage and richer team-level controls.
CapabilitySecondStack (LiteLLM)Databricks AI Gateway
Hosts major models in Databricks own infrastructure can use Databricks as upstream provider
Bills usage to Databricks Account in DBUs can use Databricks as upstream provider
Can route LLM API requests to major external providers
Supports /chat/completion, /messages and /responses API for coding agents
Load balancing and fallback model endpoints
Users can self-create virtual API keys
via "coding agents" UI
Captures detailed usage data and shows usage analytics dashboards
Attributes usage to end-users
Aggregates usage to "Teams" (departments)
Access control to models by Users / Groups / Teams
Flexible user-level and team-level budgets, alerts.
Self-Serve supervision by Dept Managers
AI Guardrails
flexible and expandable
Limited, basic, not configurable
What does deployment look like?
Simplest deployment is on a Virtual Machine using Docker Compose. Clone the repo, set a few environment variables in the .env.local file, and run the init script (details in README). For production, expose the VM to the internet, allow inbound port 443, and assign a DNS domain name. For enterprise SSO, create an EntraID or KeyCloak OAuth App Client.
What are the minimal deployment requirements?
• Linux VM with 4 vCPU, 16 GB RAM, 200 GB storage • DNS name with subdomains for user access pointing to the VM • OAuth Client credentials in your IdP (EntraID, Keycloak, Okta) for SSO • External LLM API provider credentials (API keys, URL endpoints) See more details in README.md.
How does user authentication work?
SecondStack relies on the OAuth protocol. You can run it standalone using the built-in Authentik OAuth core to manage users, but normally enterprise deployments connect it to an upstream identity provider such as Azure AD (EntraID), Okta, or KeyCloak.
Which LLM providers and models are supported?
All major providers via LiteLLM: Anthropic Claude, OpenAI, Azure Foudnation Models, Google AI Studio, and thousands of open models through OpenRouter, Databricks, etc. You can also connnect to locally served open-weights models if you run your own GPU infrastructure. Models are grouped by capability — chat, agentic, image generation — and can be enabled or restricted per team.
What is Cowork Agent?
Cowork Agent is an agentic AI mode in the Chat App. It runs Claude Code CLI inside sandboxed Docker containers with configurable skills — file operations, browser automation, image generation, diagram authoring, office documents, persistent memory, and more. Each session gets an isolated workspace. Think of it as a capable AI coworker, not just a chatbot.
What about data privacy and compliance?
Everything runs on your infrastructure — chat history, documents, user data, and usage logs stay in your PostgreSQL database. You control retention, backups, and access. Frontier LLM providers normally promise to limit data retention for API traffic, and many providers even offer Zero Data Retention guarantees. The platform also includes a guardrails service that can detect and block PII and secrets before they reach LLM providers.

Ready to own your AI infrastructure?

Deploy SecondStack on your infrastructure, with our help. Built for teams that demand control.