Self-Hosted

Own Your AIInfrastructure

SecondStack is a self-hosted, multi-model LLM platform for deploying, managing, and scaling conversational AI across your organization. Enterprise-grade. Yours to control.

or email hello@secondstack.ai

chat.yourdomain.com

Why SecondStack

Built for Organizations that take AI seriously

Data sovereignty — conversations, attached documents, and usage data never leave your infrastructure.

You also get true multi-model flexibility (not locked to one vendor), granular cost controls per team, and the ability to customize behavior through system prompts, guardrails, and agent skills.

But we still use external LLM API providers...?

Even when using external LLM providers, API access carries a smaller risk and exposure profile than full SaaS platforms. API requests are retained for a shorter time and might be better protected then full ChatGPT-like Web App platforms hosting your entire data indefinitely, and having much larger attack surface. Many reputable LLM providers also offer Zero Data Retention (ZDR) guarantees.

A polished, high-quality chat experience that users actually prefer — fast streaming, rich Markdown and code rendering, math, diagrams, file and image attachments, and a clean model picker across providers.

Multi-model from a single interface — Claude, gpt-5.x, Gemini, local open-weights — plus image generation in the same workspace. Knowledge Collections can be attached to any chat for RAG or full-context grounding, and MCP Tools let chats reach into your internal systems and APIs.

Cowork Agent mode runs Claude Code in sandboxed containers with configurable Skills — file operations, browser automation, diagram and office-document authoring, persistent memory, and more — turning any chat into an agentic session when you need it.

Personas, system prompts, and shareable chat templates let teams standardize how the AI behaves for specific workflows, without locking individuals out of free-form use.

A specialized guardrails module compatible with LiteLLM and configurable via the ControlTower admin dashboard — covering PII, secrets, and other policy-sensitive content on both inbound prompts and outbound responses.

Built to be flexible: new guardrail types and policies can be added or customized on-site to match your organization's data classifications, compliance needs, and tolerance thresholds — not a fixed black-box ruleset.

Strong focus on minimizing false positives. A two-stage pipeline (fast heuristic detection followed by a context-aware LLM classifier with caching) keeps obvious non-issues from blocking legitimate work — because guardrails that misfire ruin the user experience and break agents in production.

LLM classifier for Secrets (example)

Classifies whether a detected candidate string is a real credential (password, API key, token, private key) or a harmless match (code identifier, hash, UUID, public key, color code, placeholder). Examples of NOT a credential: documentation placeholders, commit hashes, package hashes, version numbers, color codes, PNG data, SVG strokes, file names, dates. Examples of a real credential: randomly-generated passwords, API keys, tokens, secrets, private keys that don't look like demo values. The classifier receives both the flagged string and surrounding context to make an informed decision.

LLM classifier for Customer PII (example)

Classifies whether a detected candidate string is a sensitive customer PII that should not be sent to external LLM API. Examples of NOT PII for this purpose: placeholder values ("John Doe", "test@test.com"), company names, product names, employee business contacts e.g. from email headings or signatures, business email addresses, internal Customer IDs, fictional characters, general mentions of public figures (e.g. "President George W Bush"). Depending on a company policy, even an isolated one-off occurence of low-sensitivity PII elements e.g. a customer's email address may be allowed e.g. in troubleshooting context if not accompanied by other highly sensitive revelaing data. Examples of real PII: data extracted from database records or user profiles, customer PII appearing at scale in structured data, high-sensitivity identifiers like SSNs or credit card numbers. The classifier uses surrounding context to distinguish a customer database dump from routine business correspondence.

Pay for actual compute consumption to your LLM API providers, not per-user subscriptions.

In larger organizations, most users are low-volume; API consumption-based pricing is typically far more cost-effective than flat per-seat fees.

Robust cost tracking and budget controls keep that model safe at scale — per-user, per-team, and per-API-key budgets with threshold alerts, spend forecasting, and usage analytics down to the individual request. Spend stays predictable, and overages are caught before they become invoices.

Every request gets virtual API key authentication, per-user and per-team budget enforcement, usage analytics, and guardrails for PII and secrets.

Your apps call standard OpenAI-compatible endpoints (/chat/completions, /responses) with virtual keys, and the gateway handles routing, failover, load balancing, observability, and policy across all supported models.

The embedded LiteLLM Proxy is the free, open-source build, and our team supports it end-to-end alongside the rest of the platform — one point of contact for the whole deployment.

Naturally integrates with Databricks for organizations that already run their data and AI workloads on the Lakehouse.

Use Databricks Foundation Models as an upstream LLM provider through the LiteLLM gateway — Claude, gpt-5, Gemma, Qwen, Llama, and other served endpoints billed to your Databricks account.

Invoke Databricks Mosaic AI Agents directly from the Chat UI as first-class chat profiles: Knowledge Agents, Supervisor Agents, Custom Agents

Expose Genie Spaces and Databricks SQL as MCP Tools for the Chat App models and Cowork Agent for natural-language analytics over your governed data

End-to-end OAuth on behalf of end-users. Every Databricks call is authenticated as the actual user, so Unity Catalog permissions, row/column-level security, and audit trails apply to AI traffic exactly as they do to a human analyst.

Platform

Architecture Overview

A modular, self-hosted stack built on proven foundations.

Features

Everything your team needs

A complete conversational AI platform, not just a chat interface.

Chat App

Multi-model chat

Claude, gpt-5.x, Gemini, local models — chat, agents, and image generation all in one place.

Cost Management

Budget & spend controls

Per-user, per-team, and per-API-key budgets. Threshold alerts, spend forecasting, and usage analytics down to the request level.

Team Budgets User-Level Top Users

Total Platform Spend $798 / $1,200

Remaining: $402

Engineering $642 / $1,000

Chat App $380

API Keys $64

Marketing $156 / $200

78% of budget used

Chat App $32

API Keys $112

d.kowalski

Chat App — Standard Spend: $312 · Budget: $450

Chat App — Agent Mode Spend: $47 · Budget: $180

API Keys

VSCode Spend: $142 · Budget: $200

local-testr-rag-agent Spend: $89 · Budget: $150

Remaining unallocated budget $460

Ranked by total spend this period

a.chen

$268

j.martinez

$251

d.kowalski

$189

r.patel

$176

s.nakamura

$162

l.dubois

$148

m.oconnor

$97

k.berg

$82

n.volkov

$64

p.santos

$51

e.wright

$38

AI Agent

Cowork agent

Cowork Agent runs in sandboxed containers with dozens of Skills — file operations, browser automation, image generation, diagrams authoring, office documents, persistent memory, etc.

create a hello world program in this workspace

Tools

Done. Created and ran hello_world.py — it prints "Hello, World!" as expected.

hello_world.py

Preview · Download

Attach files

Engineering Wiki

Customer Research

Sales Playbook

+ New Knowledge…

MCP servers

Local Files

Context7

Atlassian Suite

+ Manage MCP…

SSH workspace running

ssh 44a43ef4@localhost -p 2222

Open in VS Code

Auto-stop in 1h · extends on activity

Model

Claude Opus 4.7

Claude Sonnet 4.6

Claude Haiku 4.5

Sandbox files

.plans —

hello_world.py 23 B

Type your message here...

Knowledge & RAG

Knowledge collections

Named document sets, referenced in chats. Files are auto-indexed for RAG retrieval. Share with teams or specific users.

API Routing

LiteLLM-based API routing

Centralize LLM API traffic from enterprise apps and local agents with user virtual API keys, guardrails, spend observability and budgets enforcement.

ControlTower

Admin dashboard

Centralized model management, provider configuration, system prompts, and deployment controls. One panel for your entire AI stack.

admin.yourdomain.com/dashboard/admin?section=models

Control Tower

Models

Model	Display Name	Access	Health
cowork-agentagent	Cowork Agent	2 Teams	2/2 OK
llm-councilagent	LLM Council	2 Teams	1/1 OK
sales-genieagent	Sales Genie Space	Sales Team	1/1 OK
claude-opus-4-6	Claude Opus 4.6	All Teams	2/2 OK
gpt-5-5	GPT-5.5	All Teams	1/1 OK
gemini-3-1-pro	Gemini 3.1 Pro	All Teams	1/1 OK
gemini-3-flash	Gemini 3 Flash	All Teams	1/1 OK
qwen3-next-80b	Qwen 3 Next 80B	All Teams	1/1 OK
gpt-image-2image	GPT Image 2	All Teams	1/1 OK
nano-banana-2image	Nano Banana 2	All Teams	1/1 OK
text-embedding-3-smallembedding	Text Embedding 3 Small	All Teams	1/1 OK

Users

User	Email	Role	Teams	Last Active
JC Jane Chen	jane.chen@acme.com	Admin	Platform	2 min ago
MR Mike Rodriguez	mike.r@acme.com	User	Engineering	14 min ago
SP Sarah Park	sarah.p@acme.com	User	Design	1 hr ago
AK Alex Kim	alex.kim@acme.com	Team Manager	Engineering	3 hrs ago
LW Lisa Wang	lisa.w@acme.com	User	Marketing	1 day ago

Providers

Anthropic Direct API Key

connected

Models4

Latency142ms

Uptime99.9%

OpenAI API Key

connected

Models3

Latency98ms

Uptime99.8%

AWS

AWS Bedrock IAM Role

connected

Models2

Latency167ms

Uptime99.7%

Google AI Service Account

degraded

Models1

Latency312ms

Uptime97.2%

Databricks PAT Token

connected

Models2

Latency186ms

Uptime99.6%

Teams

Engineering 12 members

active

Budget$2,400/mo

Used$1,847

Models6

Design 5 members

active

Budget$800/mo

Used$612

Models4

Marketing 8 members

active

Budget$600/mo

Used$423

Models3

Groups

Group	Source	Members	Teams
engineering-all	Azure AD	24	Engineering, Platform
design-team	Azure AD	8	Design
marketing-all	Azure AD	12	Marketing

Analytics & Logs

Total Requests48,291+12.4%

Total Spend$3,847+8.2%

Avg Latency156ms-5.1%

Error Rate0.3%-0.1%

Usage Over Time

$300$225$150$75$0

Feb 1Feb 5Feb 10Feb 15Feb 20Feb 25Mar 1

Datetime	Model	User	Cached	Input Tok	Output Tok	Duration	Status
2026-03-01 14:32:01	claude-sonnet-4-6	jane.chen	12,480	1,204	847	1.2s	200
2026-03-01 14:31:58	gpt-4.1	mike.r	0	3,820	1,204	0.8s	200
2026-03-01 14:31:45	claude-opus-4-6	alex.kim	8,192	2,560	5,102	3.4s	200
2026-03-01 14:31:32	gemini-2.5-flash	sarah.p	4,096	512	892	0.4s	200

General Configuration

User Limits

Default New User Budget

$400

Budget Reset Period

30d

Max API Keys Per User

Chat App

Default Model

claude-sonnet-4-5-20250929

Key Budgets

Cowork Agent Key Budget

$100

Chat App Key Budget

$100

New API Key Budget

$50

Budget Alerts

Per-Key Alerts

enabledthreshold: 0.7

Per-User Alerts

enabledthreshold: 0.7

Team Settings

Default Team

Notifications

Budget Alerts

enabled

Health Alerts

enabled

Slack Webhook

https://hooks.slack.com/...4xK

Email Recipients

admin@acme.com, ops@acme.com

Broadcast Alert to Users

Severity

In-App Banner Email Slack

Audience

All Users Engineering Research

Title

Message

The platform will be undergoing scheduled maintenance on March 5 from 02:00–04:00 UTC. Claude and GPT endpoints will be temporarily unavailable. Please save your work in advance.

Until 2026-03-05 04:00 UTC

MCP Servers

Server	Type	Tools	Access	Status	Actions
Context7	http	2 tools	All Users	active
Web Search and Fetch	sse	2 tools	Marketing Team	active

Guardrails

Rule	Scope	Action	Status
PII Detection	All Models	Block & Alert	active	…
Prompt Injection	Cowork Agent	Block	active	…
Secrets Detection	All Models	Block & Alert	active	…

Localized Content

Manage system prompts, help content, and other localized strings

General

Name	Preview
Budget Alert Key Last updated: 3/2/2026	Your {platformName} API Key "{displayName}" has used {pct}% ...
Setup Instructions Last updated: 3/2/2026	export ANTHROPIC_BASE_URL={{baseUrl}} export ANTHROPIC_A...

System Prompts

Name	Preview
Default Last updated: 3/2/2026	You are Chat App, an access interface to AI Language Models ...
Cowork Agent Last updated: 3/2/2026	COWORK AGENT SYSTEM PROMPT (AMENDS PREVIOUS INSTRUC...

Configs Deployment

LiteLLM Config

syncedLast deployed 4 min ago

Chat App Config

pending2 changes waiting

Guardrails Config

syncedLast deployed 1 hr ago

User Portal

User dashboard

Personal budget tracking, API key management, and per-request usage logs — full visibility into your own AI consumption.

app.yourdomain.com/dashboard

SecondStack

d.kowalski

Budget & Spend

d.kowalski

Chat App — Standard Spend: $312 · Budget: $450

Chat App — Agent Mode Spend: $47 · Budget: $180

API Keys

VSCode Spend: $142 · Budget: $200

local-testr-rag-agent Spend: $89 · Budget: $150

Remaining unallocated budget $460

API Keys

Manage your API access keys

Name	Team	Key	Spend (current period)	Budget	Created
RAG Pipeline Agent	Default Team	sk-...Mx4p	$0.18	$50.00	04/17/2026 10:36
Data Export Script	Default Team	sk-...K9rw	$0.00	$30.00	03/04/2026 08:54
Analysis Notebook	Default Team	sk-...Bz7n	$0.00	$25.00	03/03/2026 11:15
Local Dev Testing	Default Team	sk-...Qh2s	$0.00	$40.00	02/03/2026 13:19
Slack Bot Connector	Default Team	sk-...Ty6v	$0.00	$35.00	02/02/2026 02:30

My Usage & Logs

Usage Over Time

$36$27$18$9$0

Apr 5Apr 10Apr 15Apr 20Apr 25May 1

Total Requests 3,247

Total Spend $329.58

Input Tokens 328.2 mln

Output Tokens 3.1 mln

Showing 1 – 25 of 3,247 records · Page 1 of 130

Time (EDT)	Status	API Endpoint	Cost	Duration	API Key	Model	Cached In	New In	Output
2026-05-03 18:49:14	success	/v1/messagesanthropic_messages	$0.05	1.60 s	Chat App	claude-sonnet-4-6	165,956	101	51
2026-05-03 18:44:00	success	/v1/messagesanthropic_messages	$0.05	840 ms	Chat App	claude-sonnet-4-6	165,509	448	34

API Playground

Pick an endpoint, an API key, and a model — the cURL preview reflects the exact request that will be sent.

API Key

Local Dev Testing

Select Endpoint

Anthropic Messages (/v1/messages)

Model

claude-haiku-4-5-20251001

Prompt

With web search tool

Why is the sky blue? Explain in 20 words

Request in cURL format:

curl -s "https://litellm.yourdomain.com/v1/messages" \
  -H "Authorization: Bearer sk-...Qh2s" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-haiku-4-5-20251001","max_tokens":1024,"messages":[{"role":"user","content":[{"type":"text","text":"Why is the sky blue? Explain in 20 words","cache_control":{"type":"ephemeral"}}]}]}' | jq

Setup

Up and Running in Minutes

Three steps from zero to a production AI platform.

Deploy

Provision a Linux VM and run the platform setup. Docker Compose spins up everything — PostgreSQL, Redis, Meilisearch, and the rest — pre-configured.

bash ~/SecondStack

# Initialize the platform
cd SecondStack
./dev-setup.sh
just init-full

Connect providers

Add your LLM provider API keys through ControlTower. Supports Anthropic, OpenAI, Google AI Studio, OpenRouter, Databricks - or your own inference endpoints. Manage access and pricing.

Anthropic

OpenAI

Google AI Studio

OpenRouter

Databricks

Manage Users and Teams

Set up user teams, assign budgets, and synchronize with your enterprise IdP groups. Control spend and budgets at the team level.

Engineering $342 / $500

Product $89 / $200

Design $56 / $150

Support $21 / $100

Marketing $134 / $300

Pricing

Self-Hosted. Deploy Anywhere.

Run SecondStack on your own infrastructure, with our help.

Enterprise

Tailored to your needs

For organizations deploying SecondStack on their own infrastructure.

All platform features
Docker Compose & Kubernetes deployment
Priority support & SLA
Custom integrations
Deployment assistance
Security review & hardening

or email hello@secondstack.ai

FAQ

Frequently asked questions

Can't find your answer? Get in touch.

Why self-host AI instead of using ChatGPT Enterprise or Claude SaaS?

Data sovereignty — conversations, documents, and usage data never leave your infrastructure. You also get true multi-model flexibility (not locked to one vendor), granular cost controls per team, and the ability to customize behavior through system prompts, guardrails, and agent skills. No per-seat licensing — deploy for your entire organization at infrastructure cost only.

Why do we need an LLM API gateway?

Without a gateway, teams end up with scattered API keys, no visibility into spend, and no guardrails. SecondStack centralizes all LLM traffic — from chat users, internal apps, agents, and IDE plugins — through LiteLLM. Every request gets virtual API key authentication, per-user and per-team budget enforcement, usage analytics, and guardrails for PII and secrets. Your apps call OpenAI-compatible endpoints (/chat/completions, /responses) with virtual keys, and the gateway handles routing, observability, and policy.

What does deployment look like?

Simplest deployment is on a Virtual Machine using Docker Compose. Download and unpack the release archive, set a few environment variables in the .env.local file, and run the init script. Details can be found in the provided documentation. For production, expose the VM to the internet, allow inbound port 443, and assign a DNS domain name. For enterprise SSO, create an EntraID or KeyCloak OAuth App Client.

Can SecondStack host it for us?

Yes. SecondStack is not a SaaS product, and our primary focus is self-serve deployments on your own infrastructure. That said, if you'd prefer not to operate it yourself, we can host and fully manage a single-tenant, dedicated virtual environment for you in our private cloud. Let's talk.

Who supports the bundled open-source components?

Yes. As a platform vendor, we take full responsibility for every component shipped with SecondStack — both the proprietary code we develop and the open-source software we package and distribute, including LiteLLM, Kreuzberg, Meilisearch, Qdrant, PostgreSQL, Authentik, and others. You get a single point of contact for support across the entire stack, without needing separate vendor relationships for each component.

What are the minimal infra requirements?

• Linux VM with 4 vCPU, 16 GB RAM, 200 GB storage • DNS name with subdomains for user access pointing to the VM • OAuth Client credentials in your IdP (EntraID, Keycloak, Okta) for SSO • External LLM API provider credentials (API keys, URL endpoints)

How does user authentication work?

SecondStack relies on the OAuth protocol. You can run it standalone using the built-in Authentik OAuth core to manage users, but normally enterprise deployments connect it to an upstream identity provider such as Azure AD (EntraID), Okta, or KeyCloak.

Which LLM models are supported?

All major providers via LiteLLM: Anthropic Claude, OpenAI, Azure Foudnation Models, Google AI Studio, and thousands of open models through OpenRouter, Databricks, etc. You can also connnect to locally served open-weights models if you run your own GPU infrastructure. Models are grouped by capability — chat, agentic, image generation — and can be enabled or restricted per team.

What is Cowork Agent?

Cowork Agent is an agentic AI mode in the Chat App. It runs Claude Code CLI inside sandboxed Docker containers with configurable skills — file operations, browser automation, image generation, diagram authoring, office documents, persistent memory, and more. Each session gets an isolated workspace. Think of it as a capable AI coworker, not just a chatbot.

What about data privacy and compliance?

Everything runs on your infrastructure — chat history, documents, user data, and usage logs stay in your PostgreSQL database. You control retention, backups, and access. Frontier LLM providers normally promise to limit data retention for API traffic, and many providers even offer Zero Data Retention guarantees. The platform also includes a guardrails service that can detect and block PII and secrets before they reach LLM providers.

Ready to own your AI infrastructure?

Deploy SecondStack on your infrastructure, with our help. Built for teams that demand control.

or email hello@secondstack.ai

Own Your AIInfrastructure

Built for Organizations that take AI seriously

Architecture Overview

Everything your team needs

Multi-model chat

Budget & spend controls

Cowork agent

Knowledge collections

System Design

LiteLLM-based API routing

Admin dashboard

User dashboard

Up and Running in Minutes

Deploy

Connect providers

Manage Users and Teams

Self-Hosted. Deploy Anywhere.

Frequently asked questions

Ready to own your AI infrastructure?