Open Source & Self-Hosted

Own Your AIInfrastructure

A self-hosted, open-source platform for deploying, managing, and scaling conversational AI across your organization. Multi-model. Enterprise-grade. Yours to control.

chat.yourdomain.com
Claude Opus 4.6
Agents
Chat Models
Image Generation

Built for Organizations that take AI seriously

Keep sensitive data (chat threads, attached documents, usage logs) inside your organization's perimeter.

But we still use external LLM API providers...?

Even when using external LLM providers, API access carries a smaller risk and exposure profile than full SaaS platforms. API requests are retained for a shorter time and might be better protected then full ChatGPT-like Web App platforms hosting your entire data indefinitely, and having much larger attack surface. Many reputable LLM providers also offer Zero Data Retention (ZDR) guarantees.

A specialized guardrails module compatible with LiteLLM, and configurable via the ControlTower admin dashboard.

LiteLLM request Detection Heuristic · Fast · High Recall candidates Cache Dedup Lookups new only LLM Classifier Local LLM · True / False + true positive Fail / Redact Block · Mask · Log Triggered per-request inside LiteLLM guardrails callback
LLM classifier for Secrets (example)

Classifies whether a detected candidate string is a real credential (password, API key, token, private key) or a harmless match (code identifier, hash, UUID, public key, color code, placeholder). Examples of NOT a credential: documentation placeholders, commit hashes, package hashes, version numbers, color codes, PNG data, SVG strokes, file names, dates. Examples of a real credential: randomly-generated passwords, API keys, tokens, secrets, private keys that don't look like demo values. The classifier receives both the flagged string and surrounding context to make an informed decision.

LLM classifier for Customer PII (example)

Classifies whether a detected candidate string is a sensitive customer PII that should not be sent to external LLM API. Examples of NOT PII for this purpose: placeholder values ("John Doe", "test@test.com"), company names, product names, employee business contacts e.g. from email headings or signatures, business email addresses, internal Customer IDs, fictional characters, general mentions of public figures (e.g. "President George W Bush"). Depending on a company policy, even an isolated one-off occurence of low-sensitivity PII elements e.g. a customer's email address may be allowed e.g. in troubleshooting context if not accompanied by other highly sensitive revelaing data. Examples of real PII: data extracted from database records or user profiles, customer PII appearing at scale in structured data, high-sensitivity identifiers like SSNs or credit card numbers. The classifier uses surrounding context to distinguish a customer database dump from routine business correspondence.

Pay for actual compute consumption to your LLM API providers, not per-user subscriptions.

In larger organizations, most users are low-volume; API consumption-based pricing is typically far more cost-effective than flat per-seat fees.

Centralize all LLM traffic from chat users, internal apps, agents, and IDE plugins through LiteLLM.

Users create self-serve virtual API keys for authentication, with budget enforcement, logging, usage analytics, and guardrails. Your apps call standard OpenAI-compatible endpoints, and the gateway handles routing, failover, load balancing, observability, and policy across all supported models

An open platform means simplified integration with your existing toolchain, no vendor lock-in, and full ability to customize.

Inspect every line, contribute back, or fork. Your deployment is never held hostage by a vendor's roadmap.

Architecture Overview

A modular, self-hosted stack built on proven open-source foundations.

Users User App Dashboard Self-Serve API Keys · Usage Reporting Control Tower: Admin UI Models & Providers · Guardrails · Budgets · Teams /chat/completion /responses /messages LiteLLM LLM Routing Gateway · Virtual API Keys · Budget Enforcement Enterprise Apps, Agents & IDEs Users Chat App UI Multi-model chat with agentic capabilities Containerized Cowork Agent Agent Harness + Skills Kreuzberg · Docling Content Extraction & Document Processing Guardrails Pre-Flight Policy Checks Secrets PII OpenResty Ingress Front Door Authentik SSO Auth Center PostgreSQL Database Qdrant Vector DB Redis Cache Kubernetes | Docker Compose LLM PROVIDERS Anthropic Claude OpenAI GPT Google Gemini OpenRouter 100+ models vLLM · Local Inference SecondStack Software (Open Source) Open LLM Ops Software (FOSS) Foundational Platform Stack (FOSS)

Everything your team needs

A complete conversational AI platform, not just a chat interface.

Chat App

Multi-model chat

Claude, GPT, Gemini, local models — chat, agents, and image generation all in one place.

Agents
Chat Models
Image Generation
AI Agent

Cowork Agent

Cowork Agent runs in sandboxed containers with dozens of Skills — file operations, browser automation, image generation, diagrams authoring, office documents, persistent memory, etc.

.docx .csv Web Cowork Agent Sandboxed · Skillful .pptx Chart PDF
Cost Management

Budget & spend controls

Per-user, per-team, and per-API-key budgets. Threshold alerts, spend forecasting, and usage analytics down to the request level.

Budget Overview February 2026
Total Platform Spend $1,087 / $1,700
Remaining: $613
Engineering $642 / $1,000
Chat App $380
Cowork Agent $198
API Keys $64
Product $289 / $500
Chat App $245
Cowork Agent $32
API Keys $12
Marketing $156 / $200
78% of budget used
Chat App $32
Cowork Agent $12
API Keys $112
Marketing Team is approaching budget: 78%
View Details →
d.kowalski
Chat App — Standard Spend: $312 · Budget: $450
Chat App — Agent Mode Spend: $47 · Budget: $180
API Keys
VSCode Spend: $142 · Budget: $200
local-testr-rag-agent Spend: $89 · Budget: $150
openclaw Spend: $38 · Budget: $80
tg-support-bot Spend: $22 · Budget: $60
Remaining unallocated budget $380
Ranked by total spend this period
a.chen
$268
j.martinez
$251
d.kowalski
$189
r.patel
$176
s.nakamura
$162
l.dubois
$148
m.oconnor
$97
k.berg
$82
n.volkov
$64
p.santos
$51
e.wright
$38
h.li
$24
c.johnson
$18
f.mueller
$11
b.kim
$6
API Routing

LiteLLM-based API Routing

Centralize LLM API traffic from enterprise apps and local agents with user virtual API keys, guardrails, spend observability and budgets enforcement.

OpenClaw Local Agent Obsidian VS Code User Virtual API Keys LiteLLM Routing · Guardrails Budgets · Observability Enterprise API Keys LLM PROVIDERS Anthropic OpenAI Google OpenRouter
ControlTower

Admin dashboard

Centralized model management, provider configuration, system prompts, and deployment controls. One panel for your entire AI stack.

admin.yourdomain.com/dashboard/admin?section=models
Control Tower
A

Models

Model Display Name Access Status Health Order Actions
cowork-agent(agent)
Cowork Agent 2 Teams active 2/2 OK
llm-council(agent)
LLM Council 2 Teams active 1/1 OK
Anthropic
claude-opus-4-6
Claude Opus 4.6 All Teams active 2/2 OK
OpenAI
gpt-5-2
GPT-5.2 All Teams active 1/1 OK
Gemini
gemini-3-1-pro
Gemini 3.1 Pro All Teams active 1/1 OK
Gemini
gemini-3-flash
Gemini 3 Flash All Teams active 1/1 OK
OpenAI
gpt-image-1-5(image_generation)
GPT Image 1.5 All Teams active 1/1 OK
Gemini
nano-banana-2(image_generation)
Nano Banana 2 Hidden disabled

Users

User Email Role Teams Last Active Actions
JC
Jane Chen
jane.chen@acme.com Admin Platform 2 min ago
MR
Mike Rodriguez
mike.r@acme.com User Engineering 14 min ago
SP
Sarah Park
sarah.p@acme.com User Design 1 hr ago
AK
Alex Kim
alex.kim@acme.com Team Manager Engineering 3 hrs ago
LW
Lisa Wang
lisa.w@acme.com User Marketing 1 day ago

Providers

A
Anthropic Direct API Key
connected
Models4
Latency142ms
Uptime99.9%
O
OpenAI API Key
connected
Models3
Latency98ms
Uptime99.8%
AWS
AWS Bedrock IAM Role
connected
Models2
Latency167ms
Uptime99.7%
G
Google AI Service Account
degraded
Models1
Latency312ms
Uptime97.2%

Teams

E
Engineering 12 members
active
Budget$2,400/mo
Used$1,847
Models6
D
Design 5 members
active
Budget$800/mo
Used$612
Models4
M
Marketing 8 members
active
Budget$600/mo
Used$423
Models3

Groups

GroupSourceMembersTeamsActions
engineering-allAzure AD24Engineering, Platform
design-teamAzure AD8Design
marketing-allAzure AD12Marketing

Analytics & Logs

Total Requests48,291+12.4%
Total Spend$3,847+8.2%
Avg Latency156ms-5.1%
Error Rate0.3%-0.1%
Usage Over Time
$300$225$150$75$0
Feb 1Feb 5Feb 10Feb 15Feb 20Feb 25Mar 1
DatetimeModelUserCachedInput TokOutput TokDurationStatus
2026-03-01 14:32:01claude-sonnet-4-6jane.chen12,4801,2048471.2s200
2026-03-01 14:31:58gpt-4.1mike.r03,8201,2040.8s200
2026-03-01 14:31:45claude-opus-4-6alex.kim8,1922,5605,1023.4s200
2026-03-01 14:31:32gemini-2.5-flashsarah.p4,0965128920.4s200

General Configuration

User Limits
Default New User Budget
$400
Budget Reset Period
30d
Max API Keys Per User
10
Chat App
Default Model
claude-sonnet-4-5-20250929
Key Budgets
Cowork Agent Key Budget
$100
Chat App Key Budget
$100
New API Key Budget
$50
Budget Alerts
Per-Key Alerts
enabledthreshold: 0.7
Per-User Alerts
enabledthreshold: 0.7
Team Settings
Default Team
Default Team

Notifications

Budget Alerts
enabled
Health Alerts
enabled
Slack Webhook
https://hooks.slack.com/...4xK
Email Recipients
admin@acme.com, ops@acme.com
Broadcast Alert to Users
All Users Engineering Research
2026-03-05 04:00 UTC

MCP Servers

ServerTypeToolsStatusActions
Context7stdio2 toolsactive
Web Search and Fetchsse2 toolsactive

Guardrails

RuleScopeActionStatus
PII DetectionAll ModelsBlock & Alertactive
Prompt InjectionCowork AgentBlockactive
Secrets DetectionAll ModelsBlock & Alertactive

Localized Content

Manage system prompts, help content, and other localized strings

General
NamePreview
Budget Alert Key
Last updated: 3/2/2026
Your {platformName} API Key "{displayName}" has used {pct}% ...
Setup Instructions
Last updated: 3/2/2026
export ANTHROPIC_BASE_URL={{baseUrl}} export ANTHROPIC_A...
System Prompts
NamePreview
Default
Last updated: 3/2/2026
You are Chat App, an access interface to AI Language Models ...
Cowork Agent
Last updated: 3/2/2026
COWORK AGENT SYSTEM PROMPT (AMENDS PREVIOUS INSTRUC...

Configs Deployment

LiteLLM Config
syncedLast deployed 4 min ago
Chat App Config
pending2 changes waiting
Guardrails Config
syncedLast deployed 1 hr ago

Up and Running in Minutes

Three steps from zero to a production AI platform.

01

Deploy

Clone the repo and run Docker Compose. The full platform spins up with PostgreSQL, Redis, Meilisearch, etc — all pre-configured.

bash ~/SecondStack
# Clone & initialize
git clone https://github.com/
  SecondStack-AI/SecondStack
cd SecondStack
./dev-setup.sh
just init-full
        
02

Connect providers

Add your LLM provider API keys through ControlTower. Supports Anthropic, OpenAI, Google Gemini, OpenRouter - or your own inference endpoints. Manage access and pricing.

Anthropic
Anthropic
OpenAI
OpenAI
Gemini
Gemini
OpenRouter
OpenRouter
vLLM
vLLM
03

Manage Users and Teams

Set up user teams, assign budgets, and synchronize with your enterprise IdP groups. Control spend and budgets at the team level.

Engineering $342 / $500
Product $89 / $200
Design $56 / $150
Support $21 / $100
Marketing $134 / $300

Open Source. Deploy Anywhere.

SecondStack is free open source licensed. Run it on your hardware or let us help.

Community

Free

Open source, forever

Full platform for teams getting started with self-hosted AI.

  • All platform features
  • Docker Compose deployment
  • Community support via GitHub
  • Multi-model chat interface
  • Team & budget management
  • Identity federation
Deploy Now

Enterprise

Custom

Tailored to your needs

For organizations requiring dedicated support and advanced deployment.

  • Everything in Community
  • Kubernetes deployment support
  • Priority support & SLA
  • Custom integrations
  • Deployment assistance
  • Security review & hardening
Contact Us

Frequently asked questions

Can't find your answer? Open an issue on GitHub.

Why self-host AI instead of using ChatGPT Enterprise or similar SaaS?
Data sovereignty — conversations, documents, and usage data never leave your infrastructure. You also get true multi-model flexibility (not locked to one vendor), granular cost controls per team, and the ability to customize behavior through system prompts, guardrails, and agent skills. No per-seat licensing — deploy for your entire organization at infrastructure cost only.
Why does an organization need an LLM routing gateway?
Without a gateway, teams end up with scattered API keys, no visibility into spend, and no guardrails. SecondStack centralizes all LLM traffic — from chat users, internal apps, agents, and IDE plugins — through LiteLLM. Every request gets virtual API key authentication, per-user and per-team budget enforcement, usage analytics, and guardrails for PII and secrets. Your apps call OpenAI-compatible endpoints (/chat/completions, /responses) with virtual keys, and the gateway handles routing, observability, and policy.
What does deployment look like?
Simplest deployment is on a Virtual Machine using Docker Compose. Clone the repo, set a few environment variables in the .env.local file, and run the init script (details in README). For production, expose the VM to the internet, allow inbound port 443, and assign a DNS domain name. For enterprise SSO, create an EntraID or KeyCloak OAuth App Client.
What are the minimal deployment requirements?
• Linux VM with 4 vCPU, 16 GB RAM, 200 GB storage • DNS name with subdomains for user access pointing to the VM • OAuth Client credentials in your IdP (EntraID, Keycloak, Okta) for SSO • External LLM API provider credentials (API keys, URL endpoints) See more details in README.md.
How does user authentication work?
SecondStack relies on the OAuth protocol. You can run it standalone using the built-in Authentik OAuth core to manage users, but normally enterprise deployments connect it to an upstream identity provider such as Azure AD (EntraID), Okta, or KeyCloak.
Which LLM providers and models are supported?
All major providers via LiteLLM: Anthropic Claude, OpenAI GPT, Google Gemini, and 100+ models through OpenRouter. You can also serve open-source models locally with vLLM. Models are grouped by capability — chat, agentic, image generation — and can be enabled or restricted per team.
What is Cowork Agent?
Cowork Agent is an agentic AI mode in the Chat App. It runs Claude Code CLI inside sandboxed Docker containers with configurable skills — file operations, browser automation, image generation, diagram authoring, office documents, persistent memory, and more. Each session gets an isolated workspace. Think of it as a capable AI coworker, not just a chatbot.
What about data privacy and compliance?
Everything runs on your infrastructure — chat history, documents, user data, and usage logs stay in your PostgreSQL database. You control retention, backups, and access. Frontier LLM providers normally promise to limit data retention for API traffic, and many providers even offer Zero Data Retention guarantees. The platform also includes a guardrails service that can detect and block PII and secrets before they reach LLM providers.

Ready to own your AI infrastructure?

Deploy SecondStack in minutes. Open source, self-hosted, and built for teams that demand control.