Episode 94 — Choose audit criteria for AI using policy, risk, and outcomes (Domain 3A)

In this episode, we focus on a planning skill that makes the difference between an audit that feels fair and grounded and an audit that feels like opinions: choosing audit criteria. Criteria are the standards you use to judge what you see, and without them, every finding turns into a debate about what should have been done. For brand-new learners, it helps to think of criteria like the rules of a game, because you cannot call something a foul unless you know what rule was broken. Artificial Intelligence (A I) systems make criteria selection more challenging because the technology is new, organizational practices vary widely, and there is rarely a single universal checklist that fits every use case. That does not mean you are stuck. It means you must choose criteria thoughtfully by combining internal policy, risk expectations, and desired outcomes, then using those criteria to define what evidence should exist. By the end of this lesson, you should be able to explain where criteria come from, how to pick criteria that fit the system, and how to avoid criteria that are either too vague to test or too generic to reflect real risk.

Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.

A strong place to start is internal policy, because policy is the organization’s own statement of what it expects itself to do. Policy can include security policies, data handling policies, acceptable use rules, third-party risk requirements, and any specific A I governance policies the organization has created. When you use internal policy as criteria, you are not forcing an external standard onto the organization; you are evaluating whether the organization is doing what it said it would do. That is often the most defensible approach because leadership has already agreed, at least in principle, that these expectations matter. For A I systems, internal policy may define who can approve model deployments, what data can be connected to models, how prompts and configurations are changed, what monitoring is required, and what incident response readiness looks like. The audit planner’s job is to identify which policy statements apply to the system being audited, then interpret them in a way that can be tested. If a policy says access must be least privilege, your criteria must explain what least privilege looks like in that specific A I environment, such as limiting service account reach, restricting model management actions, and controlling data connector permissions.

Policy alone is not always enough, because policy can be high level or outdated, especially in organizations that adopted A I quickly. That is where risk-based criteria comes in. Risk-based criteria means you choose standards that address the most likely and most harmful failure modes for the specific A I use case. This begins with understanding exposure and impact in plain language. Exposure includes who can interact with the model, what data it can reach, and what actions it can trigger through integrations. Impact includes what happens if the model leaks data, produces harmful output, makes a biased decision, or becomes unavailable at a critical time. When you use risk as criteria, you are essentially saying that controls should be strong enough to match the level of harm the system could cause. A public-facing model connected to sensitive internal data should have much stricter criteria than an internal brainstorming tool with no sensitive data access. This is important for beginners because it shows that auditing is not one-size-fits-all. A fair audit adapts criteria to risk rather than treating every A I system as identical.

Outcomes-based criteria is the third ingredient, and it helps you avoid the trap of auditing only paperwork. Outcomes-based criteria focus on what the organization wants to be true in practice, such as sensitive data is not exposed through model outputs, unauthorized users cannot change model behavior, misuse is detected quickly, and incidents can be contained before they escalate. Outcomes-based criteria are powerful because they connect directly to business expectations and because they can often be tested through evidence that reflects reality, such as monitoring records, access logs, change histories, and incident handling artifacts. For A I systems, outcomes often include reliability and trust as well as classic confidentiality and integrity. For instance, an outcome might be that the model’s responses remain consistent with policy across updates, which implies criteria around change management and testing. Another outcome might be that the system can explain what data sources were used to answer a query, which implies criteria around traceability and logging. Outcomes-based criteria do not replace policy and risk; they help translate policy and risk into things you can observe. When you plan criteria around outcomes, you reduce the chance of a report that says a policy exists while the system still fails in practice.

Now let’s talk about how to combine these sources into criteria that are usable. A practical method is to start with policy statements, then adjust and sharpen them using risk and outcomes. Suppose policy says that only authorized users may access sensitive data. Risk analysis might reveal that the model is connected to a retrieval system that can access human resources documents, and that public users can submit prompts. Outcomes might require that the model never returns protected personal information and that retrieval only returns documents the user is allowed to see. Those pieces together form criteria that are both aligned with policy and tailored to risk. They also imply concrete evidence you can gather, such as role definitions, access control configurations, retrieval permission enforcement, and monitoring alerts for sensitive data exposure. Without the risk and outcome sharpening, the criterion would remain vague. Without the policy anchor, the criterion might feel arbitrary. Combining them produces criteria that feel fair, relevant, and testable.

Criteria must be testable, and this is one of the most important lessons for beginners because it changes how you write and select standards. A criterion is testable if you can gather evidence that shows whether it is met. If the criterion is written as the system is secure, you cannot prove it, because secure has no defined measurement. If the criterion is written as privileged access is restricted to approved roles and reviewed periodically, then you can test it by examining role assignments, approval records, and review evidence. If the criterion is written as changes to prompts and model configurations are documented, approved, and traceable to deployments, then you can test it by examining change tickets, version histories, and deployment logs. For A I, testable criteria often require you to include both control design and operational proof. It is not enough to say monitoring exists; you want criteria that monitoring detects defined categories of misuse and that alerts are investigated. This is how criteria become more than words and start shaping an audit that can deliver defensible findings.

Another important aspect is scoping criteria so they match the audit scope and do not create false expectations. If your scope is a specific A I assistant used by one department, your criteria should focus on controls relevant to that assistant, not on enterprise-wide A I strategy unless that strategy is part of scope. Beginners sometimes pick broad framework criteria that cover everything, then struggle to gather evidence for items unrelated to the system. That creates friction and weakens confidence in the audit. Instead, criteria selection should reflect the boundaries you set earlier, such as which models, which data sources, and which environments are included. Criteria should also reflect the system’s maturity. If the organization is early in adoption, you may still use strong criteria, but you may focus on foundational controls like inventory, access restriction, logging, and change management rather than advanced optimization. The key is that the criteria should guide meaningful evaluation, not create a wish list of unrelated improvements.

For A I audits, it is useful to think about criteria categories that repeat across many systems, even though the details vary. One category is governance, which includes approval processes, ownership, documentation, and accountability. Another is data management, which includes data classification, minimization, retention, provenance, and access control for training and retrieval data. Another is model management, which includes version control, configuration control, prompt governance, and testing for behavior changes. Another is operational security, which includes identity, least privilege for service accounts, monitoring for abuse patterns, and incident response readiness. Another category is third-party risk, which includes vendor data handling commitments, transparency, and change notification. These categories are not a checklist to recite; they are a way to ensure you do not forget a major risk area while selecting criteria. Each category can be mapped back to policy, risk, and outcomes so the criteria remain grounded. Beginners benefit from seeing these categories because they provide a mental scaffold that keeps criteria selection organized without turning the audit into a generic template.

One challenge in A I criteria selection is that some risks involve behavior and interactions rather than static configurations. For example, prompt injection and misuse detection are not solved by a single setting; they are addressed through layered controls and monitoring. Criteria here should still be testable, but the evidence may include test cases, monitoring results, and records of how the organization responds to alerts. A criterion might state that the system must detect and respond to repeated bypass attempts within a defined timeframe, and you would test it by reviewing alert configurations and investigation records. Another criterion might state that the model must not retrieve sensitive documents outside authorized boundaries, and you would test it by reviewing retrieval configuration and evidence of permission enforcement. This is where outcomes-based criteria become especially valuable because they allow you to evaluate whether the system behaves as required. Beginners sometimes worry that behavioral criteria are too subjective, but they can be made objective through defined scenarios, defined evidence types, and clear thresholds for what counts as meeting the expectation.

Criteria selection should also consider change and drift, because A I systems evolve and can become risky through routine updates. A model update can alter outputs, a prompt template change can weaken safety, and a new connector can broaden data access. Criteria should therefore include expectations around change management, testing, and monitoring after changes. For example, a criterion might state that production model updates require approval and documented testing of known risk scenarios, and that monitoring is reviewed after deployment to confirm expected behavior. Another criterion might state that new data connectors require data classification review and access control validation before being enabled. These criteria connect directly to business outcomes like stability and confidentiality because change is a common root cause of incidents. For beginners, this is an important mindset shift. You are not only evaluating whether controls exist at a moment in time; you are evaluating whether the organization can keep controls effective as the system changes.

Finally, criteria should be communicated clearly to stakeholders early in the audit, because transparency reduces conflict and improves evidence quality. If stakeholders know what criteria will be used, they can provide relevant evidence and explain how their controls map to those expectations. Clear criteria also prevent last-minute disputes where a team argues that the auditor is holding them to an unexpected standard. In A I audits, this communication is especially important because teams may not share the same vocabulary, and what one team calls a safeguard another team may call a feature. By anchoring criteria in policy, risk, and outcomes, you create a shared language for what matters. You also create a structure for findings. Instead of saying something feels weak, you can say a criterion was not met and point to evidence. That is what makes an audit trustworthy and useful.

As we wrap up, remember that choosing audit criteria for A I is about grounding judgment in standards that fit the organization and the system. Internal policy provides a legitimate anchor for expectations, risk ensures criteria match exposure and impact, and outcomes ensure the criteria reflect what must be true in practice, not just what is written. Criteria must be testable, scoped, and clearly connected to evidence, especially in A I environments where behavior and change can create subtle risks. When you select criteria thoughtfully and communicate them clearly, you set up an audit that produces defensible findings, meaningful remediation, and improved trust in how A I systems are governed and operated. That is the core of Domain 3A planning, and it is the foundation you will build on as you move into audit techniques and evidence collection in the next topics.

Episode 94 — Choose audit criteria for AI using policy, risk, and outcomes (Domain 3A)
Broadcast by