Episode 98 — Collect AI audit evidence: logs, lineage, artifacts, and change records (Domain 3C)
In this episode, we focus on a very practical auditing skill: collecting the evidence that allows you to prove what is happening in an A I system. For beginners, it helps to remember that an audit is not a debate where the most confident speaker wins. An audit is a process of gathering artifacts that show what the system did, what it was allowed to do, and how it changed over time. With Artificial Intelligence (A I), that evidence must cover more than servers and network diagrams, because the most important risks often live in data flows, model configurations, prompt control layers, and vendor boundaries. Domain 3C expects you to understand which evidence types matter and how to assemble them into a coherent picture, especially when a system is evolving quickly. The goal is not to collect everything. The goal is to collect the right set of evidence so you can trace behavior, validate controls, and support findings with clear proof. By the end, you should be able to explain why logs, lineage, artifacts, and change records form a complete evidence set and what each one contributes.
Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.
Logs are often the first evidence type people think about, but the important point is that A I audits require specific kinds of logs, not just generic authentication events. In an A I system, you want logs that capture who interacted with the model, when the interaction occurred, what model endpoint was used, and what the system decided to do in response. If the system has policy enforcement, you want logs that show when a request was blocked, modified, or flagged as risky. If the system retrieves documents, you want logs that show which sources were queried and which documents were used, because retrieval is often where sensitive data exposure happens. If the system can call tools, you want logs that show tool invocation details, such as what tool was called, what action was attempted, and whether it succeeded. These logs support multiple audit questions at once, including whether access is controlled, whether misuse is detected, and whether the system behaves consistently with policy. A beginner-friendly way to think about it is that logs are the system’s memory, and without them, you cannot reconstruct events or prove that controls were operating.
Collecting logs is not just about asking for a log file; it is about ensuring the logs are complete, trustworthy, and usable. Completeness means the logging covers the critical events that matter to your criteria, not only technical errors. Trustworthiness means the logs are protected from tampering, retained for an appropriate period, and access to them is controlled and auditable. Usability means logs can be correlated, so you can tie a model interaction to a user identity, a time window, and related system actions like retrieval and tool calls. In practice, that often requires consistent identifiers, such as a request identifier that appears across components. An A I audit should therefore collect evidence about logging configuration, retention settings, and any centralized log management practices that preserve integrity. Beginners sometimes treat logs as raw truth, but logs are only as good as the logging design and the operational discipline around them. If logs can be altered, rotated too quickly, or lack correlation, they will not support strong conclusions even if they exist.
Lineage is a second evidence type that is especially important in A I systems, and lineage simply means traceability of where data and models came from and how they were transformed. Data lineage answers questions like what source data was used, how it was cleaned or filtered, who approved it, and how it moved into training or retrieval systems. Model lineage answers questions like what base model was used, what fine-tuning steps occurred, what versions were produced, and what evaluations were performed before release. Lineage matters because many A I risks are rooted in data integrity, provenance, and the chain of custody of artifacts. If you cannot trace where a dataset came from, you cannot confidently assess poisoning risk, bias risk, or licensing and privacy obligations. If you cannot trace how a model version was produced and approved, you cannot confidently assess governance and change control. Collecting lineage evidence gives the audit a timeline, allowing you to connect behavior in production to upstream decisions about data and model development.
Lineage evidence can take many forms, and an auditor needs to know what to look for without turning the audit into an engineering project. Useful evidence includes dataset inventories, source descriptions, access control lists for data repositories, records of data selection and filtering decisions, and documentation of data refresh schedules. It can also include metadata that shows when a dataset was created, who modified it, and what processing steps were applied. For model lineage, useful evidence includes model registry records, version identifiers, training run summaries, evaluation results, and approvals for promotion from development to production. The key is that lineage should allow you to answer a simple question: if this output happened today, what data and model decisions made it possible. When lineage is weak, organizations often struggle to explain unexpected model behavior, and they struggle to contain incidents because they cannot quickly identify what changed. For beginners, lineage is the audit concept that connects upstream creation to downstream impact, and it is a signature skill in A I auditing.
Artifacts are the third evidence type, and artifacts are the concrete objects that define how the A I system behaves. Think of artifacts as the building blocks of the system, such as model files or model references, prompt templates, system instructions, policy configurations, retrieval index configurations, evaluation test suites, and deployment packages. In traditional I T audits, artifacts might be configuration files, software packages, or infrastructure templates. In A I audits, artifacts include those plus the pieces that shape model behavior. Prompt artifacts matter because they often encode rules, safety constraints, and the tone and purpose of the system. Retrieval artifacts matter because they define what documents can be accessed and how the system selects them. Evaluation artifacts matter because they show how the organization tested the model and what known failure modes were considered. Collecting artifacts allows an auditor to move from abstract statements to concrete analysis, because you can examine what the system is configured to do, not just what people say it does. Artifacts also help with repeatability, because another auditor can review the same artifact set and reach similar conclusions.
When collecting artifacts, you also need to pay attention to versioning and environment differences, because a common A I risk is that development and production do not match. The prompt template used in production might differ from the one shown in a demo. The retrieval connector configuration might be broader in production than in testing. The model version might have been updated without complete documentation. An evidence collection plan should therefore include artifacts from the specific environment in scope, especially the production environment if the audit is about real operational risk. It should also include identifiers that link artifacts to deployments, such as version numbers, commit references, or deployment timestamps. Beginners should remember that artifacts without context can be misleading, because you might collect a template that looks safe but is not actually the one in use. The collection process should therefore include proof of deployment, such as deployment logs or environment snapshots that confirm which artifacts were active during the period under review. This is how artifact evidence becomes operational evidence rather than theoretical evidence.
Change records are the fourth evidence type, and they often provide the most direct proof of governance maturity. Change records show who requested a change, who approved it, what testing was done, when it was deployed, and what validation occurred afterward. In A I systems, change records matter not only for software updates but also for model version updates, prompt adjustments, data connector changes, tool integration changes, and policy rule changes. A small change in a prompt can alter behavior significantly, and a small change in retrieval scope can change what data becomes reachable. So change records are not administrative noise; they are the audit trail that connects system evolution to risk. Collecting change records allows you to test whether the organization controls change according to its criteria, especially for high-impact changes. It also helps identify whether incidents or complaints correlate with specific changes, which can be a powerful way to focus remediation. Beginners sometimes think of change control as paperwork, but in A I environments it is a core safety mechanism because it reduces surprise.
A good evidence collection approach treats change records as something you can trace, not just something you read. You select a set of relevant changes, ideally including high-impact ones, and then trace each one end to end. You look for the request and rationale, the risk assessment if required, the approvals, the testing evidence, the deployment confirmation, and any post-deployment monitoring review. You also check whether emergency changes were handled differently and whether that difference was justified and documented. This trace method turns change records into operational proof that the process works. It also reveals common weaknesses, such as approvals that are missing, testing that is informal, or deployments that are not linked to artifact versions. In A I audits, this trace is especially useful for prompt and model changes because those are areas where organizations often move quickly and forget that behavior is a security-relevant property. If change records do not cover these areas, the audit will likely find governance gaps that increase risk over time.
These four evidence types are strongest when you can connect them, because an A I audit becomes much more powerful when you can follow a chain from a system event to upstream causes. For example, suppose logs show that a model retrieved a sensitive document and included it in an output. Lineage evidence can show how that document entered the retrieval index, what classification it had, and whether it was approved for retrieval. Artifact evidence can show how retrieval was configured and whether access enforcement was based on user identity or a broad service account. Change records can show when the retrieval scope was expanded and who approved it. This connected evidence chain supports a strong finding because it shows what happened, why it happened, and what control weakness allowed it. It also makes remediation clearer because you can identify which artifact or process needs improvement. Beginners should see this as the heart of evidence-driven auditing: you are not collecting disconnected items, you are assembling a story that is supported by proof at each step.
Evidence collection also includes practical considerations like confidentiality and minimization, because the evidence itself can be sensitive. Logs may contain user prompts, outputs, or identifiers. Data lineage records may reveal sources and business processes. Artifacts may include system instructions and configurations that are security-sensitive. Change records may include internal discussions and risk decisions. A strong audit plan defines how evidence will be stored, who can access it, and how long it will be retained. It also defines how to minimize sensitive content, such as collecting metadata rather than full content when possible, or redacting personal information when it is not necessary for the audit objective. This is important because audits should not create new risk by accumulating sensitive data without proper controls. For beginners, this is a good lesson in professionalism: auditors must handle evidence with the same care they expect the organization to apply to its own data. Evidence integrity and evidence confidentiality go together.
As we wrap up, remember that collecting A I audit evidence is about building a defensible picture of system behavior and control effectiveness over time. Logs show what happened and who did it, especially when they capture model interactions, retrieval actions, tool calls, and policy enforcement events. Lineage provides traceability for data and model origins and transformations, letting you connect outputs to upstream choices. Artifacts define how the system is configured to behave, including prompts, retrieval settings, model references, and evaluation suites, and they must be tied to the environment actually in use. Change records provide the audit trail of governance, proving whether high-impact changes were reviewed, tested, and validated. When you collect these evidence types and connect them into traceable chains, you move beyond opinions and demos into a disciplined audit that produces findings stakeholders can trust and actions teams can implement. That evidence-first approach is the core of Domain 3C and a foundational skill for any A I auditor.