SOC 2 does not contain a special chapter called “LLM infrastructure.”
But the moment your AI system touches customer data, influences decisions, or runs in a production workflow, your existing controls need to cover a wider operating surface:
- prompts and system instructions
- model routing and third-party AI providers
- retrieval pipelines and vector storage
- evaluation workflows
- deployment approval for models and prompt changes
- audit trails for outputs and operator actions
That is where many teams get caught. The company may already have a SOC 2 program, but the AI stack has grown faster than the control environment around it.
Start with Scope, Not Tooling
Before auditors care about a vector database or a model runtime, they care about scope.
Write down exactly which AI systems are in-scope for production and customer-facing use. For each system, identify:
- what data enters the system
- where prompts and retrieved content are stored
- which vendors process traffic
- who can change prompts, routes, and models
- what evidence exists when something changes
If this inventory is missing, every later control conversation becomes slower and more subjective.
Control Area 1: Identity and Access
AI systems often have more privileged paths than standard web apps.
Examples:
- access to model endpoints
- access to prompt repositories
- access to vector indexes or retrieval sources
- access to evaluation datasets
- access to secret material for external providers
Expect to show:
- role-based access by function
- SSO enforcement for human operators
- service identity for workloads
- privileged access review on a regular schedule
- rapid offboarding for contractors and employees
If prompt or policy changes can be made through an untracked admin panel, that is a control gap.
Control Area 2: Change Management
Traditional change management usually covers code and infrastructure. AI systems add more artifacts that can materially alter behavior:
- prompts
- system instructions
- routing rules
- model versions
- evaluation thresholds
- retrieval logic
These need the same governance pattern as code:
- version control
- peer review
- approval rules for production
- rollback capability
- deployment evidence
The related article /blog/prompt-versioning-and-rollback-prompts-like-infrastructure goes deeper on the prompt side of this problem.
Control Area 3: Logging and Auditability
For auditors, incident investigators, and enterprise customers, “we think the model did this” is not good enough.
You need logs that answer:
- which model handled the request
- which prompt or policy version was active
- which operator changed the route or configuration
- whether a fallback or manual override occurred
- which tenant or workflow was affected
Your logs do not need to capture raw sensitive prompts in every case, but they do need enough structure to reconstruct what happened. /blog/ai-audit-logs-regulators-ask-for-how-to-prepare is a good companion pattern here.
Control Area 4: Vendor and Subprocessor Risk
Many AI stacks depend on external model providers, embedding vendors, observability tools, and retrieval services.
For each one, maintain evidence for:
- contract and data processing review
- allowed data classes
- retention and deletion posture
- regional processing behavior
- incident notification commitments
- fallback plan if the vendor becomes unavailable
This matters even when the rest of the platform is private. A single external inference or reranking call can change your risk profile.
Control Area 5: Data Handling and Retention
AI teams often collect more transient data than they realize.
Examples include:
- prompts containing customer context
- retrieved passages from internal systems
- output logs stored for debugging
- human review datasets
- evaluation artifacts copied from production traffic
Controls should define:
- what data is allowed in prompts
- how sensitive fields are filtered or masked
- how long traces and logs are retained
- where datasets used for evaluation come from
- how deletion and legal hold requests are handled
Control Area 6: Release Safety and Reliability
A broken AI release may not look like normal downtime. The service can stay “up” while producing unsafe or incorrect results.
SOC 2 evidence for AI systems often gets stronger when you can show:
- pre-production evaluation before release
- canary or shadow deployment patterns
- incident runbooks for rollback
- observability for quality, cost, and latency signals
- post-incident review and control improvement
This is where infrastructure and reliability disciplines directly support compliance.
Evidence Auditors and Customers Commonly Ask For
Keep a lightweight but explicit evidence pack for each production AI system:
- architecture diagram with data flows
- access matrix for operators and services
- deployment and approval records
- logging and retention standards
- vendor list and risk assessments
- incident history and follow-up actions
- backup or disaster recovery evidence where applicable
When this evidence is assembled only during audit season, teams lose weeks.
Common AI-Specific Gaps
We see the same issues repeatedly:
- prompt changes are not versioned like code
- evaluation datasets contain copied production data without clear retention rules
- external model calls bypass the approved egress path
- audit logs show app events but not model, prompt, or retrieval state
- nobody owns control mapping across platform, ML, and security teams
A Practical 90-Day Improvement Plan
If your current AI platform is ahead of your SOC 2 readiness, start here:
- inventory in-scope AI systems and vendors
- move prompt, route, and policy changes into controlled workflows
- define required audit fields for model-serving requests
- tighten secret handling and identity boundaries
- add release approval and rollback evidence
- document the minimum evidence pack per system
That sequence usually gets teams much further than trying to write a giant AI governance document first.
Final Takeaway
SOC 2 for AI infrastructure is not about inventing a separate compliance universe. It is about extending proven control categories to the parts of the stack that now shape system behavior: models, prompts, retrieval, routing, evaluation, and vendor dependencies.
The strongest teams treat AI controls as an operating discipline, not an audit-time project. That makes customer security reviews easier, incident response faster, and enterprise sales conversations much smoother.