Episode 95 — Use audit techniques tailored to AI systems, not generic checklists (Domain 3B)
In this episode, we move from planning into execution by focusing on audit techniques that actually work for A I systems. If you are new to auditing, it is tempting to rely on generic checklists because they feel safe and organized, and they can be useful for basic controls like access management and logging. The problem is that A I systems can pass a generic checklist while still being easy to abuse, because many A I failures come from behavior, data flows, and hidden configuration layers that a traditional checklist never examines. A model can be perfectly patched, hosted on secure infrastructure, and still be manipulated through prompts, still retrieve sensitive documents, or still change behavior after an update in ways no one notices. So the skill you are building here is choosing techniques that match the nature of the system you are auditing. That means you still review documents and configurations, but you also use observation, trace-based evidence, and targeted testing that reflects how A I systems are used in real life. By the end, you should be able to explain why A I needs tailored techniques and what those techniques look like at a high level.
Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.
A good starting point is to remind yourself what is different about A I systems from normal I T systems in an audit context. Traditional applications usually have behavior defined by code paths that are relatively deterministic, and audits often focus on whether controls exist and whether systems are configured according to standards. A I systems have behavior shaped by data, prompts, and model configuration, and that behavior can vary in ways that are hard to capture in a static document review. This does not mean auditing becomes impossible. It means the auditor must gather evidence that includes how the system behaves under realistic conditions, not just how it is described. The model may have hidden system instructions, retrieval configurations, and tool permissions that affect outputs, and those layers can change more frequently than the underlying infrastructure. A generic checklist tends to stop at the infrastructure layer because it is familiar and measurable. A tailored audit technique deliberately reaches into the behavior layer, the data layer, and the change layer so your conclusions reflect reality rather than assumptions.
One tailored technique is system mapping through data and influence tracing, which is a way of building an audit understanding of what actually shapes outputs. Instead of starting with a list of servers, you start with the question: what inputs can influence this model, and what outputs can influence the business. Inputs include user prompts, system prompts, retrieved documents, fine-tuning data, configuration settings, and tool integration results. Outputs include responses that humans act on, decisions that feed workflows, logs and alerts generated by the system, and actions triggered through tools. In a tailored audit, you map these influence paths and then choose evidence points along each path. For example, if retrieved documents influence outputs, you examine how the retrieval source is selected, what access boundaries exist, and what logs confirm which documents were used. If tool results influence outputs, you examine tool permissions and monitoring. This mapping technique makes the audit focus on the pathways where risk travels, which is much more useful than a generic inventory review.
Another key technique is evidence triangulation, which means you do not accept a single type of evidence as proof for a control. In A I audits, teams might provide policies, diagrams, or verbal explanations that sound convincing, but the true test is whether the control is enforced and monitored in practice. Triangulation means you gather at least two or three independent evidence types that support the same claim. For example, if the team says only approved users can change system prompts, you might review role assignments, review change records, and review logs of prompt modifications. If the team says retrieval cannot access restricted documents, you might review retrieval configuration, review access control enforcement mechanisms, and review logs showing which repositories were queried during normal use. Triangulation is especially important in A I because many controls live in configuration interfaces that are easy to change quickly, and because model behavior can contradict documentation if the system has evolved. Beginners sometimes view triangulation as distrust, but it is really about reducing the chance of being misled by incomplete or outdated information.
Observation of real workflows is another technique that is more valuable for A I than for many traditional audits. A model can be configured correctly on paper but used in ways that introduce risk, such as users pasting sensitive data into prompts, using the model for decisions it was not designed to support, or bypassing intended approval processes by using a separate interface. Observing a workflow means you watch how the system is actually used in context, including what data users provide, what outputs are relied upon, and what steps are taken before action is taken. This technique helps you identify gaps between policy and reality. It also helps you understand where controls need to be placed, because sometimes the greatest risk is not inside the model but at the human boundary, such as unclear guidance on what data is allowed in prompts or lack of review before high-impact decisions. Observations should still be evidence-driven, meaning you document what you saw and connect it to criteria, but they provide a reality check that static checklists cannot. For beginners, this is a valuable mindset shift: auditing is as much about how people use systems as it is about how systems are configured.
Log and trace analysis is another tailored technique that becomes central in A I audits, because logs can show what the system did, not just what it was supposed to do. In A I environments, useful logs include records of who queried the model, what endpoints were used, whether requests were blocked by policy, what documents were retrieved, what tool calls were made, and what configuration changes occurred. Trace analysis means following a specific request through the system to see which components were involved and what data was touched. For example, you might select a set of representative interactions and trace whether retrieval pulled documents from approved sources only, whether tool calls occurred only when authorized, and whether sensitive outputs were flagged or filtered. This technique is powerful because it provides objective evidence of behavior and control enforcement. It also helps detect hidden risk, such as a service account retrieving more data than expected or a tool integration being triggered in unexpected ways. Generic checklists often treat logging as a checkbox, but tailored audits use logs as primary evidence to validate real control effectiveness.
Targeted adversarial testing is a technique that must be handled responsibly, but it is often necessary to evaluate A I-specific risks like prompt injection and evasion. This does not mean you perform dangerous experiments or attempt to break systems in uncontrolled ways. It means you use controlled, ethical test inputs designed to check whether known abuse patterns are blocked and whether monitoring detects suspicious behavior. For instance, you may test whether the model can be steered into revealing restricted internal instructions, whether it can be coaxed into retrieving sensitive documents outside the intended scope, or whether policy guardrails respond consistently across different phrasings. You can also test whether repeated probing triggers alerts. The key is that the test scenarios are defined in advance, aligned with audit criteria, and conducted in a safe environment or with appropriate approvals. Beginners should understand that behavioral testing is not optional when the risk is behavioral. If you never test how the model responds to adversarial inputs, you cannot confidently claim that guardrails are effective.
A tailored technique for A I also includes reviewing change management and version history in more depth than a traditional audit might. Because model behavior can shift with updates, you need to understand how the organization controls change, how they test before deployment, and how they validate behavior after deployment. Reviewing change management here includes examining how prompts are updated, how model versions are selected or pinned, how data connectors are added, and how tool integrations are enabled. It also includes checking whether changes are documented and whether there is an approval chain appropriate to risk. A useful method is to select a handful of recent changes and trace them end to end: what triggered the change, who approved it, what testing was done, what was deployed, and what monitoring confirmed after release. This provides concrete evidence that the change process works in practice, not just in theory. Generic checklists often ask whether change management exists, but tailored A I audits ask whether change management is strong enough to prevent silent behavior shifts that could harm the business.
Third-party and supply chain evaluation also benefits from tailored techniques, because vendor reliance is common in A I and visibility is limited. A generic audit might simply check that vendor assessments were done. A tailored audit looks at how vendor claims map to the organization’s data flows and risk, how contracts restrict data use and retention, what evidence the vendor provides, and what monitoring the organization uses to detect vendor-related issues. It may also examine how the organization handles vendor changes, such as model updates that affect behavior or policy enforcement. A useful technique is to trace a data flow that crosses the vendor boundary and identify what controls exist on both sides. Another technique is to examine incident response coordination agreements, such as how quickly the vendor will notify the organization of an incident and what telemetry the organization receives. This keeps vendor evaluation grounded in outcomes and operational readiness rather than trust. Beginners often underestimate this area, but vendor controls become part of the system’s control set when the system depends on the vendor.
Sampling strategy is also important, and while we will go deeper on sampling later, it matters even when you are choosing techniques. A I systems produce a huge volume of interactions, and it is not practical to review everything, so you need a method to select representative evidence. Tailored sampling might include selecting interactions from different user groups, different endpoints, different times of day, and different data sources, especially those with higher sensitivity. It might include selecting both normal interactions and interactions that triggered policy blocks or alerts, because those reveal how controls behave under stress. Sampling also applies to configuration and change records, where you might focus on high-impact changes like enabling a new connector or changing system prompt templates. For beginners, the key idea is that sampling is not random guessing. It is a disciplined selection method designed to reveal whether controls hold across the most important risk areas. Tailored audit techniques depend on good sampling because the system’s behavior is too broad to observe fully.
Finally, reporting techniques should also be tailored, because A I audit findings often need to explain behavioral risk in a way that stakeholders can act on. A generic report might list missing policies or missing documents. A tailored report connects findings to evidence of real behavior, describes the risk outcome, and explains what control gap allowed it. For example, a finding might describe that retrieval access was broader than intended and evidence showed sensitive repositories were reachable, which creates confidentiality risk. Another finding might describe that prompt changes were not reviewed and evidence showed untracked modifications to system instructions, which creates integrity and governance risk. The report should also describe the control expectation, which comes from criteria, so stakeholders know the standard being applied. Beginners should remember that the purpose of audit work is improvement, not blame. Tailored reporting makes improvement possible because it links findings to clear control levers, such as access restriction, logging enhancements, change approvals, or testing and monitoring updates. If the report reads like a generic checklist, it may be easy to ignore.
As we wrap up, remember that A I audits require techniques that match the system’s unique risk shape. Mapping influence paths helps you see where prompts, data, and configurations shape outputs and where those outputs create business impact. Triangulating evidence keeps you from trusting a single narrative and helps you prove controls are real and operating. Observing workflows and analyzing logs lets you see how the system behaves in practice, not just on paper. Controlled adversarial testing helps you evaluate behavioral guardrails and detection, which generic checklists rarely address. Reviewing change histories and vendor boundaries ensures you understand how risk shifts over time and across external dependencies. When you use these tailored techniques, you are not rejecting checklists; you are using checklists as a base while adding the A I-specific methods needed to reach accurate, defensible conclusions. That is what Domain 3B is aiming for, and it is what makes an A I audit credible to both technical teams and business stakeholders.