Prompt Injection Checklist: 25-Point Audit for LLM Apps
Print this. Hand it to engineering. Use it before every production AI launch. This checklist consolidates the controls that the OWASP Top 10 for LLM Applications, the NIST AI Risk Management Framework, and post-mortem analysis of 2024–2026 incidents agree on for defending against prompt injection.
It is organized into six layers. Every "yes" makes a successful injection materially harder. For the full strategic context, pair this with our Prompt Injection Security pillar.
Layer 1 — Architecture & Trust Boundaries (5 controls)
1. System prompts are stored server-side and never exposed to the client. Even if leaked, they should not reveal API keys, internal endpoints, or business-sensitive logic.
2. The model context window separates trust levels with explicit delimiters.
Use distinct tags (e.g., <system>, <user>, <retrieved_untrusted>) so downstream guards can reason about provenance.
3. Each tool/function the model can invoke runs with the minimum privileges required. A summarizer should not have write access to your CRM. Apply zero-trust principles to AI agents.
4. Irreversible actions require human-in-the-loop confirmation. Wire transfers, account deletions, mass emails, code merges — never autonomous.
5. Each AI agent has a unique service identity with auditable scopes. Treat the agent as a privileged principal in your IAM model, not as a feature of the user's session.
Layer 2 — Input Hardening (4 controls)
6. All user input is normalized before reaching the model. Strip zero-width characters, suspicious Unicode, and homoglyph attacks.
7. Retrieved content (RAG, email, web, documents) is sanitized for hidden text. Detect white-on-white, off-screen, font-size-zero, and metadata-embedded instructions.
8. Instruction-pattern detectors flag high-risk phrases in untrusted content. "Ignore previous," "system prompt," "do not mention," tool-call syntax. Examples are documented in our prompt injection examples article.
9. Multimodal inputs are filtered too. Run OCR on uploaded images and apply the same detectors to extracted text.
Layer 3 — Model & Prompt Defenses (4 controls)
10. System prompts use defensive instruction framing. Explicitly state that any instruction appearing inside user/retrieved blocks must be treated as data, not commands.
11. Spotlighting or delimiter encoding is applied to untrusted content. Techniques such as base64-wrapping or unique-token tagging help the model distinguish data from instructions.
12. Use a separate, smaller "guard" model to classify input and output. A specialized classifier inspects both the user query and the model's response for policy violations before either is acted on.
13. The temperature and tool-calling configuration are tuned for the use case. Low temperature plus strict JSON schemas reduce the model's freedom to act on injected instructions.
Layer 4 — Output & Tool-Use Controls (4 controls)
14. All tool calls are validated against a schema and a policy. Reject any tool call whose arguments fall outside the user's authorization scope.
15. Outbound URLs and markdown images in model output are sandboxed or blocked. This neutralizes the GitHub Copilot-style image-exfiltration trick.
16. Code generated by the model is executed only in isolated sandboxes. No direct shell access on production systems.
17. Rate-limit sensitive tool calls per user and per session. Sudden bursts of refunds, file reads, or external sends should auto-pause and alert.
Layer 5 — Monitoring & Detection (4 controls)
18. Every prompt, tool call, and response is logged with full context. Include user ID, agent identity, source trust label, and decision path. Logs feed your SIEM.
19. Anomaly detection runs on AI-agent activity. Off-baseline tool usage, unusual data egress, and policy-violation rates trigger investigation. The Verizon DBIR flags AI-mediated exfiltration as the slowest-detected category.
20. Red-team exercises run on a recurring schedule. Quarterly at minimum; monthly for high-risk deployments. See the red teaming AI systems guide.
21. A clear AI-incident response playbook exists and has been tested. Reuse and extend your existing incident response plan template.
Layer 6 — Governance, Compliance & People (4 controls)
22. An AI Acceptable Use Policy covers prompt injection explicitly. Reference our prompt injection policy template.
23. Regulated-data exposure has been mapped. For each AI workflow, document what categories of data (GDPR, HIPAA, PCI) could be exfiltrated via injection. This feeds your AI risk assessment.
24. Cyber-insurance disclosures are current. Underwriters now ask about LLM deployments. Misstatements can void coverage — see the cyber insurance underwriting questionnaire guide.
25. Developers and users are trained on AI-specific threats annually. General phishing training is not sufficient. The IBM Cost of a Data Breach Report consistently shows training reduces both incident frequency and time-to-detect.
Scoring Your Maturity
| Score (out of 25) | Posture | Recommended next step |
|---|---|---|
| 0–8 | Critical exposure | Pause new AI launches; address Layers 1 & 2 immediately. |
| 9–16 | Developing | Prioritize tool privilege scoping (Layer 4) and monitoring (Layer 5). |
| 17–22 | Strong | Mature red-teaming and governance; pursue insurance reductions. |
| 23–25 | Industry-leading | Maintain via continuous testing; share lessons with peers. |
A realistic 2026 enterprise baseline is 17–20. Anything below that is below underwriter expectations.
Quick Wins You Can Ship This Week
If the full checklist is daunting, start here:
- Audit every AI tool integration for least-privilege scopes (Control 3).
- Block markdown images and outbound URLs in chat-style UIs (Control 15).
- Enable logging on every prompt and response with retention ≥90 days (Control 18).
- Add a guard model for output classification on customer-facing agents (Control 12).
- Run one tabletop exercise against an indirect injection scenario (Controls 20–21).
These five alone close the majority of the patterns documented in our examples library.
Working the Checklist Across Teams
Prompt injection cuts across silos. A working assignment:
| Owner | Controls |
|---|---|
| Platform / ML engineering | 1, 2, 3, 10, 11, 13, 16 |
| Application security | 6, 7, 8, 9, 14, 15 |
| Detection & response | 17, 18, 19, 20, 21 |
| GRC / Legal | 22, 23, 24 |
| People & training | 5, 25, 4 |
Make a single executive accountable for the score — usually the CISO or a Head of AI Risk.
Further Reading
- The strategic case for funding this work: Prompt Injection Security
- Where injection fits with broader AI threats: Model Exploitation Risks
- Insurance angle: Cyber Insurance for SaaS Companies
Frequently asked questions

Sarah leads our coverage of AI security, prompt injection, and LLM application risk. She has spent eight years writing about applied machine learning and previously worked as a security engineer at a SaaS data platform.
More from Sarah →Related reading
Prompt Injection Attacks Explained: How LLMs Get Hijacked
Prompt injection is a critical vulnerability where attackers craft malicious inputs to override an LLM's original instructions, leading to unauthorized.
Securing LLM Applications: A 2026 Engineering Checklist
As Large Language Models (LLMs) transition from standalone chatbots to agentic systems with tool-calling capabilities, the attack surface has expanded.
Prompt Injection Examples: 10 Real-World Attacks to Study
Ten documented prompt injection attacks from 2023-2026 — from jailbreaks to data exfiltration — with the defensive lesson behind each.
Prompt Injection Explained: How LLMs Get Tricked, Technically
A technical deep-dive into why LLMs are structurally vulnerable to prompt injection, written for engineers and security architects.

