Own Your AIInfrastructure
A self-hosted platform for deploying, managing, and scaling conversational AI across your organization. Multi-model. Enterprise-grade. Yours to control.
or email hello@secondstack.ai
Why SecondStack
Built for Organizations that take AI seriously
Data sovereignty — conversations, attached documents, and usage data never leave your infrastructure.
You also get true multi-model flexibility (not locked to one vendor), granular cost controls per team, and the ability to customize behavior through system prompts, guardrails, and agent skills.
But we still use external LLM API providers...?
Even when using external LLM providers, API access carries a smaller risk and exposure profile than full SaaS platforms. API requests are retained for a shorter time and might be better protected then full ChatGPT-like Web App platforms hosting your entire data indefinitely, and having much larger attack surface. Many reputable LLM providers also offer Zero Data Retention (ZDR) guarantees.
A polished, high-quality chat experience that users actually prefer — fast streaming, rich Markdown and code rendering, math, diagrams, file and image attachments, and a clean model picker across providers.
Multi-model from a single interface — Claude, gpt-5.x, Gemini, local open-weights — plus image generation in the same workspace. Knowledge Collections can be attached to any chat for RAG or full-context grounding, and MCP Tools let chats reach into your internal systems and APIs.
Cowork Agent mode runs Claude Code in sandboxed containers with configurable Skills — file operations, browser automation, diagram and office-document authoring, persistent memory, and more — turning any chat into an agentic session when you need it.
Personas, system prompts, and shareable chat templates let teams standardize how the AI behaves for specific workflows, without locking individuals out of free-form use.
A specialized guardrails module compatible with LiteLLM and configurable via the ControlTower admin dashboard — covering PII, secrets, and other policy-sensitive content on both inbound prompts and outbound responses.
Built to be flexible: new guardrail types and policies can be added or customized on-site to match your organization's data classifications, compliance needs, and tolerance thresholds — not a fixed black-box ruleset.
Strong focus on minimizing false positives. A two-stage pipeline (fast heuristic detection followed by a context-aware LLM classifier with caching) keeps obvious non-issues from blocking legitimate work — because guardrails that misfire ruin the user experience and break agents in production.
LLM classifier for Secrets (example)
Classifies whether a detected candidate string is a real credential (password, API key, token, private key) or a harmless match (code identifier, hash, UUID, public key, color code, placeholder). Examples of NOT a credential: documentation placeholders, commit hashes, package hashes, version numbers, color codes, PNG data, SVG strokes, file names, dates. Examples of a real credential: randomly-generated passwords, API keys, tokens, secrets, private keys that don't look like demo values. The classifier receives both the flagged string and surrounding context to make an informed decision.
LLM classifier for Customer PII (example)
Classifies whether a detected candidate string is a sensitive customer PII that should not be sent to external LLM API. Examples of NOT PII for this purpose: placeholder values ("John Doe", "test@test.com"), company names, product names, employee business contacts e.g. from email headings or signatures, business email addresses, internal Customer IDs, fictional characters, general mentions of public figures (e.g. "President George W Bush"). Depending on a company policy, even an isolated one-off occurence of low-sensitivity PII elements e.g. a customer's email address may be allowed e.g. in troubleshooting context if not accompanied by other highly sensitive revelaing data. Examples of real PII: data extracted from database records or user profiles, customer PII appearing at scale in structured data, high-sensitivity identifiers like SSNs or credit card numbers. The classifier uses surrounding context to distinguish a customer database dump from routine business correspondence.
Pay for actual compute consumption to your LLM API providers, not per-user subscriptions.
In larger organizations, most users are low-volume; API consumption-based pricing is typically far more cost-effective than flat per-seat fees.
Robust cost tracking and budget controls keep that model safe at scale — per-user, per-team, and per-API-key budgets with threshold alerts, spend forecasting, and usage analytics down to the individual request. Spend stays predictable, and overages are caught before they become invoices.
Without a gateway, teams end up with scattered API keys, no visibility into spend, and no guardrails. SecondStack centralizes all LLM traffic — from chat users, internal apps, agents, and IDE plugins — through LiteLLM.
Every request gets virtual API key authentication, per-user and per-team budget enforcement, usage analytics, and guardrails for PII and secrets.
Your apps call standard OpenAI-compatible endpoints (/chat/completions, /responses) with virtual keys, and the gateway handles routing, failover, load balancing, observability, and policy across all supported models.
The embedded LiteLLM Proxy is the free, open-source build, and our team supports it end-to-end alongside the rest of the platform — one point of contact for the whole deployment.
Naturally integrates with Databricks for organizations that already run their data and AI workloads on the Lakehouse.
Use Databricks Foundation Models as an upstream LLM provider through the LiteLLM gateway — Claude, gpt-5, Gemma, Qwen, Llama, and other served endpoints billed to your Databricks account.
Invoke Databricks Mosaic AI Agents directly from the Chat UI as first-class chat profiles: Knowledge Agents, Supervisor Agents, Custom Agents
Expose Genie Spaces and Databricks SQL as MCP Tools for the Chat App models and Cowork Agent for natural-language analytics over your governed data
End-to-end OAuth on behalf of end-users. Every Databricks call is authenticated as the actual user, so Unity Catalog permissions, row/column-level security, and audit trails apply to AI traffic exactly as they do to a human analyst.
Platform
Architecture Overview
A modular, self-hosted stack built on proven foundations.
Features
Everything your team needs
A complete conversational AI platform, not just a chat interface.
Multi-model chat
Claude, gpt-5.x, Gemini, local models — chat, agents, and image generation all in one place.
Budget & spend controls
Per-user, per-team, and per-API-key budgets. Threshold alerts, spend forecasting, and usage analytics down to the request level.
Cowork agent
Cowork Agent runs in sandboxed containers with dozens of Skills — file operations, browser automation, image generation, diagrams authoring, office documents, persistent memory, etc.
Knowledge collections
Named document sets, referenced in chats. Files are auto-indexed for RAG retrieval. Share with teams or specific users.
LiteLLM-based API routing
Centralize LLM API traffic from enterprise apps and local agents with user virtual API keys, guardrails, spend observability and budgets enforcement.
Admin dashboard
Centralized model management, provider configuration, system prompts, and deployment controls. One panel for your entire AI stack.
Models
| Model | Display Name | Access | Health | Order | Actions |
|---|---|---|---|---|---|
| cowork-agentagent | Cowork Agent | 2 Teams | 2/2 OK | ||
| llm-councilagent | LLM Council | 2 Teams | 1/1 OK | ||
| sales-genieagent | Sales Genie Space | Sales Team | 1/1 OK | ||
| claude-opus-4-6 | Claude Opus 4.6 | All Teams | 2/2 OK | ||
| gpt-5-5 | GPT-5.5 | All Teams | 1/1 OK | ||
| gemini-3-1-pro | Gemini 3.1 Pro | All Teams | 1/1 OK | ||
| gemini-3-flash | Gemini 3 Flash | All Teams | 1/1 OK | ||
| qwen3-next-80b | Qwen 3 Next 80B | All Teams | 1/1 OK | ||
| gpt-image-2image | GPT Image 2 | All Teams | 1/1 OK | ||
| nano-banana-2image | Nano Banana 2 | All Teams | 1/1 OK | ||
| text-embedding-3-smallembedding | Text Embedding 3 Small | All Teams | 1/1 OK |
Users
| User | Role | Teams | Last Active | Actions | |
|---|---|---|---|---|---|
| JC Jane Chen | jane.chen@acme.com | Admin | Platform | 2 min ago | |
| MR Mike Rodriguez | mike.r@acme.com | User | Engineering | 14 min ago | |
| SP Sarah Park | sarah.p@acme.com | User | Design | 1 hr ago | |
| AK Alex Kim | alex.kim@acme.com | Team Manager | Engineering | 3 hrs ago | |
| LW Lisa Wang | lisa.w@acme.com | User | Marketing | 1 day ago |
Providers
Teams
Groups
| Group | Source | Members | Teams | Actions |
|---|---|---|---|---|
| engineering-all | Azure AD | 24 | Engineering, Platform | |
| design-team | Azure AD | 8 | Design | |
| marketing-all | Azure AD | 12 | Marketing |
Analytics & Logs
| Datetime | Model | User | Cached | Input Tok | Output Tok | Duration | Status |
|---|---|---|---|---|---|---|---|
| 2026-03-01 14:32:01 | claude-sonnet-4-6 | jane.chen | 12,480 | 1,204 | 847 | 1.2s | 200 |
| 2026-03-01 14:31:58 | gpt-4.1 | mike.r | 0 | 3,820 | 1,204 | 0.8s | 200 |
| 2026-03-01 14:31:45 | claude-opus-4-6 | alex.kim | 8,192 | 2,560 | 5,102 | 3.4s | 200 |
| 2026-03-01 14:31:32 | gemini-2.5-flash | sarah.p | 4,096 | 512 | 892 | 0.4s | 200 |
General Configuration
Notifications
MCP Servers
| Server | Type | Tools | Access | Status | Actions |
|---|---|---|---|---|---|
| Context7 | http | 2 tools | All Users | active | |
| Web Search and Fetch | sse | 2 tools | Marketing Team | active |
Guardrails
| Rule | Scope | Action | Status | |
|---|---|---|---|---|
| PII Detection | All Models | Block & Alert | active | … |
| Prompt Injection | Cowork Agent | Block | active | … |
| Secrets Detection | All Models | Block & Alert | active | … |
Localized Content
Manage system prompts, help content, and other localized strings
| Name | Preview |
|---|---|
| Budget Alert Key Last updated: 3/2/2026 | Your {platformName} API Key "{displayName}" has used {pct}% ... |
| Setup Instructions Last updated: 3/2/2026 | export ANTHROPIC_BASE_URL={{baseUrl}} export ANTHROPIC_A... |
| Name | Preview |
|---|---|
| Default Last updated: 3/2/2026 | You are Chat App, an access interface to AI Language Models ... |
| Cowork Agent Last updated: 3/2/2026 | COWORK AGENT SYSTEM PROMPT (AMENDS PREVIOUS INSTRUC... |
Configs Deployment
User dashboard
Personal budget tracking, API key management, and per-request usage logs — full visibility into your own AI consumption.
Budget & Spend
API Keys
Manage your API access keys
| Name | Team | Key | Spend (current period) | Budget | Created | Actions |
|---|---|---|---|---|---|---|
| RAG Pipeline Agent | Default Team | sk-...Mx4p | $0.18 | $50.00 | 04/17/2026 10:36 | |
| Data Export Script | Default Team | sk-...K9rw | $0.00 | $30.00 | 03/04/2026 08:54 | |
| Analysis Notebook | Default Team | sk-...Bz7n | $0.00 | $25.00 | 03/03/2026 11:15 | |
| Local Dev Testing | Default Team | sk-...Qh2s | $0.00 | $40.00 | 02/03/2026 13:19 | |
| Slack Bot Connector | Default Team | sk-...Ty6v | $0.00 | $35.00 | 02/02/2026 02:30 |
My Usage & Logs
| Time (EDT) | Status | API Endpoint | Cost | Duration | API Key | Model | Cached In | New In | Output |
|---|---|---|---|---|---|---|---|---|---|
| 2026-05-03 18:49:14 | success | /v1/messagesanthropic_messages | $0.05 | 1.60 s | Chat App | claude-sonnet-4-6 | 165,956 | 101 | 51 |
| 2026-05-03 18:44:00 | success | /v1/messagesanthropic_messages | $0.05 | 840 ms | Chat App | claude-sonnet-4-6 | 165,509 | 448 | 34 |
API Playground
Pick an endpoint, an API key, and a model — the cURL preview reflects the exact request that will be sent.
-H "Authorization: Bearer sk-...Qh2s" \
-H "Content-Type: application/json" \
-d '{"model":"claude-haiku-4-5-20251001","max_tokens":1024,"messages":[{"role":"user","content":[{"type":"text","text":"Why is the sky blue? Explain in 20 words","cache_control":{"type":"ephemeral"}}]}]}' | jq
Setup
Up and Running in Minutes
Three steps from zero to a production AI platform.
Deploy
Provision a Linux VM and run the platform setup. Docker Compose spins up everything — PostgreSQL, Redis, Meilisearch, and the rest — pre-configured.
Connect providers
Add your LLM provider API keys through ControlTower. Supports Anthropic, OpenAI, Google AI Studio, OpenRouter, Databricks - or your own inference endpoints. Manage access and pricing.
Manage Users and Teams
Set up user teams, assign budgets, and synchronize with your enterprise IdP groups. Control spend and budgets at the team level.
Pricing
Self-Hosted. Deploy Anywhere.
Run SecondStack on your own infrastructure, with our help.
Enterprise
Tailored to your needs
For organizations deploying SecondStack on their own infrastructure.
- All platform features
- Docker Compose & Kubernetes deployment
- Priority support & SLA
- Custom integrations
- Deployment assistance
- Security review & hardening
or email hello@secondstack.ai
Why self-host AI instead of using ChatGPT Enterprise or Claude SaaS?
Why do we need an LLM API gateway?
What does deployment look like?
Can SecondStack host it for us?
Who supports the bundled open-source components?
What are the minimal infra requirements?
How does user authentication work?
Which LLM models are supported?
What is Cowork Agent?
What about data privacy and compliance?
Ready to own your AI infrastructure?
Deploy SecondStack on your infrastructure, with our help. Built for teams that demand control.
or email hello@secondstack.ai