AI security, cybersecurity, and cyber insurance research for modern businesses.

Prompt Injection Checklist: 25-Point Audit for LLM Apps

By Business Indemnity EditorialUpdated May 11, 2026

Print this. Hand it to engineering. Use it before every production AI launch. This checklist consolidates the controls that the [OWASP Top 10 for LLM Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/), the [NIST AI Risk Management Framework](https://www.nist.gov/itl/ai-risk-management-framework), and post-mortem analysis of 2024–2026 incidents agree on for defending against [prompt injection](/ai-risks/what-is-prompt-injection).

It is organized into six layers. Every "yes" makes a successful injection materially harder. For the full strategic context, pair this with our Prompt Injection Security pillar.


Layer 1 — Architecture & Trust Boundaries (5 controls)

1. System prompts are stored server-side and never exposed to the client. Even if leaked, they should not reveal API keys, internal endpoints, or business-sensitive logic.

2. The model context window separates trust levels with explicit delimiters. Use distinct tags (e.g., <system>, <user>, <retrieved_untrusted>) so downstream guards can reason about provenance.

3. Each tool/function the model can invoke runs with the minimum privileges required. A summarizer should not have write access to your CRM. Apply zero-trust principles to AI agents.

4. Irreversible actions require human-in-the-loop confirmation. Wire transfers, account deletions, mass emails, code merges — never autonomous.

5. Each AI agent has a unique service identity with auditable scopes. Treat the agent as a privileged principal in your IAM model, not as a feature of the user's session.


Layer 2 — Input Hardening (4 controls)

6. All user input is normalized before reaching the model. Strip zero-width characters, suspicious Unicode, and homoglyph attacks.

7. Retrieved content (RAG, email, web, documents) is sanitized for hidden text. Detect white-on-white, off-screen, font-size-zero, and metadata-embedded instructions.

8. Instruction-pattern detectors flag high-risk phrases in untrusted content. "Ignore previous," "system prompt," "do not mention," tool-call syntax. Examples are documented in our prompt injection examples article.

9. Multimodal inputs are filtered too. Run OCR on uploaded images and apply the same detectors to extracted text.


Layer 3 — Model & Prompt Defenses (4 controls)

10. System prompts use defensive instruction framing. Explicitly state that any instruction appearing inside user/retrieved blocks must be treated as data, not commands.

11. Spotlighting or delimiter encoding is applied to untrusted content. Techniques such as base64-wrapping or unique-token tagging help the model distinguish data from instructions.

12. Use a separate, smaller "guard" model to classify input and output. A specialized classifier inspects both the user query and the model's response for policy violations before either is acted on.

13. The temperature and tool-calling configuration are tuned for the use case. Low temperature plus strict JSON schemas reduce the model's freedom to act on injected instructions.


Layer 4 — Output & Tool-Use Controls (4 controls)

14. All tool calls are validated against a schema and a policy. Reject any tool call whose arguments fall outside the user's authorization scope.

15. Outbound URLs and markdown images in model output are sandboxed or blocked. This neutralizes the GitHub Copilot-style image-exfiltration trick.

16. Code generated by the model is executed only in isolated sandboxes. No direct shell access on production systems.

17. Rate-limit sensitive tool calls per user and per session. Sudden bursts of refunds, file reads, or external sends should auto-pause and alert.


Layer 5 — Monitoring & Detection (4 controls)

18. Every prompt, tool call, and response is logged with full context. Include user ID, agent identity, source trust label, and decision path. Logs feed your SIEM.

19. Anomaly detection runs on AI-agent activity. Off-baseline tool usage, unusual data egress, and policy-violation rates trigger investigation. The Verizon DBIR flags AI-mediated exfiltration as the slowest-detected category.

20. Red-team exercises run on a recurring schedule. Quarterly at minimum; monthly for high-risk deployments. See the red teaming AI systems guide.

21. A clear AI-incident response playbook exists and has been tested. Reuse and extend your existing incident response plan template.


Layer 6 — Governance, Compliance & People (4 controls)

22. An AI Acceptable Use Policy covers prompt injection explicitly. Reference our prompt injection policy template.

23. Regulated-data exposure has been mapped. For each AI workflow, document what categories of data (GDPR, HIPAA, PCI) could be exfiltrated via injection. This feeds your AI risk assessment.

24. Cyber-insurance disclosures are current. Underwriters now ask about LLM deployments. Misstatements can void coverage — see the cyber insurance underwriting questionnaire guide.

25. Developers and users are trained on AI-specific threats annually. General phishing training is not sufficient. The IBM Cost of a Data Breach Report consistently shows training reduces both incident frequency and time-to-detect.


Scoring Your Maturity

Score (out of 25)PostureRecommended next step
0–8Critical exposurePause new AI launches; address Layers 1 & 2 immediately.
9–16DevelopingPrioritize tool privilege scoping (Layer 4) and monitoring (Layer 5).
17–22StrongMature red-teaming and governance; pursue insurance reductions.
23–25Industry-leadingMaintain via continuous testing; share lessons with peers.

A realistic 2026 enterprise baseline is 17–20. Anything below that is below underwriter expectations.


Quick Wins You Can Ship This Week

If the full checklist is daunting, start here:

  1. Audit every AI tool integration for least-privilege scopes (Control 3).
  2. Block markdown images and outbound URLs in chat-style UIs (Control 15).
  3. Enable logging on every prompt and response with retention ≥90 days (Control 18).
  4. Add a guard model for output classification on customer-facing agents (Control 12).
  5. Run one tabletop exercise against an indirect injection scenario (Controls 20–21).

These five alone close the majority of the patterns documented in our examples library.


Working the Checklist Across Teams

Prompt injection cuts across silos. A working assignment:

OwnerControls
Platform / ML engineering1, 2, 3, 10, 11, 13, 16
Application security6, 7, 8, 9, 14, 15
Detection & response17, 18, 19, 20, 21
GRC / Legal22, 23, 24
People & training5, 25, 4

Make a single executive accountable for the score — usually the CISO or a Head of AI Risk.

Further Reading

Frequently asked questions

BI
Written by
Business Indemnity Editorial
Editorial Team

The Business Indemnity editorial team covers AI security, cybersecurity, and cyber insurance for SaaS and modern businesses.

About the editorial team →

Related reading