Episode 60 — Embed vendor AI security requirements before procurement begins (Task 9)
In this episode, we focus on a set of controls that can sound very technical at first but are actually simple in purpose: they exist to keep A I development disciplined so mistakes, shortcuts, and hidden risk do not slip into production. When a model or an A I system causes harm, people often blame the model as if it were a mysterious black box. In many cases, the real problem is much more ordinary: poor code quality, rushed changes, missing reviews, and weak documentation that made it easy for errors to enter and hard for anyone to notice. Development controls are the habits and rules that make software work trustworthy, and in A I they matter even more because the system touches data pipelines, model training, evaluation, and user-facing behavior all at once. For brand-new learners, the goal is not to learn how to write code, but to learn how to audit whether the organization builds A I software in a controlled way that supports compliance, fairness, privacy, and safety. When development controls are strong, the organization can explain what changed, why it changed, and what evidence shows the change was safe. When they are weak, the organization is essentially gambling with every update.
Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.
Code quality is the first area, and in everyday terms code quality means the code is understandable, consistent, and less likely to behave unpredictably. High-quality code is not about elegance for its own sake; it is about making risky behavior less likely and making review and investigation possible. In A I systems, code quality matters because the code controls how data is cleaned and transformed, how models are trained, how outputs are produced, and how logs are handled. A small code mistake in any of these areas can create large real-world consequences, such as leaking sensitive data, mislabeling a population, or measuring performance incorrectly. A beginner-friendly way to evaluate code quality is to ask whether the organization has coding standards, uses automated checks to catch common mistakes, and keeps code modular so changes are easier to understand. You can also ask whether the code is readable enough that someone other than the original author can maintain it, because single-person knowledge creates governance risk. If the codebase is messy and fragile, the organization will hesitate to improve it safely, and that leads to drift and accumulating risk. Code quality is a long-term risk control disguised as a technical preference.
Another important piece of code quality is testing, and here testing means verifying that code changes do what they are supposed to do and do not break important behavior. In A I development, tests should cover data transformations, feature generation, labeling logic, and safety constraints, not just general application behavior. A system might pass basic tests and still fail ethically if a transformation change accidentally removes a subgroup’s records or if a filtering change accidentally retains personal information that should have been removed. Auditing for testing means asking what types of tests exist, what parts of the pipeline are covered, and how tests are required before changes can be merged. It also means asking how test failures are handled, because teams under pressure sometimes disable tests or treat failures as optional. A mature development environment treats tests as a gatekeeper that protects quality, not as a speed bump to be ignored. If testing is weak or inconsistent, changes become risky experiments in production, and that is where incidents are born.
Code reviews are the second major control, and they are one of the simplest ways to catch mistakes and reduce bias in decision-making. A code review is when someone other than the author examines a change and evaluates whether it is correct, safe, and aligned with standards. Reviews matter in A I because they provide a second set of eyes on sensitive transformations, feature choices, and evaluation logic that could introduce fairness or privacy risk. Under business pressure, teams sometimes treat reviews as a formality, but a real review is a conversation about intent, risk, and evidence. Auditing code reviews means checking whether reviews are required, who performs them, and whether reviewers have enough time and expertise to evaluate the change. It also means checking whether high-risk changes receive deeper review, such as changes to data handling, labeling rules, or safety filters. A good program defines what counts as high risk and requires additional scrutiny, because not all changes are equal. Reviews that are rushed or rubber-stamped are not controls; they are signatures without meaning.
A practical way to evaluate review strength is to look for signs that reviewers challenge assumptions rather than only checking syntax. For example, if a change introduces a new feature, does the review ask whether it could act as a proxy for sensitive traits. If a change modifies data retention or logging, does the review ask whether it increases privacy exposure. If a change adjusts thresholds or evaluation criteria, does the review ask whether it could increase unfair outcomes for certain groups. Reviews can also check whether changes are reproducible, such as whether random seeds and dataset versions are recorded for training runs. These questions do not require reviewers to be ethicists, but they do require a review culture that values risk thinking. Auditing this culture means looking for evidence that reviews sometimes result in redesign, extra testing, or rejection of a change. If no change is ever challenged, the review system is likely not functioning as a real safety net. Real controls sometimes slow things down, and that is a sign they are real.
Documentation is the third major control, and it is often the most underestimated because people treat documentation as a chore. In governance, documentation is memory, and memory is what makes accountability possible. A I systems need documentation about purpose, scope limits, data sources, transformations, labeling definitions, feature meanings, evaluation results, and known limitations. They also need documentation about how the system should be used and how it should not be used, because misuse can create harm even when the model is technically correct. Auditing documentation means asking whether the organization has a consistent way to document these elements, whether the documentation is updated when changes occur, and whether it is accessible to the right audiences. If documentation is only in one engineer’s notes or only in scattered chat messages, it will not support responsible oversight. Leaders, compliance teams, and incident responders need reliable reference materials, especially when the system is under scrutiny. When documentation is strong, the organization can explain and defend its systems with evidence rather than with improvisation.
One of the most important documentation topics in A I is traceability between dataset versions, code versions, and model versions. You want to know that when a model is deployed, the organization can identify the exact training dataset version it used, the code version that produced it, and the evaluation results that justified release. Without that traceability, the organization cannot confidently investigate drift or incidents. Documentation should also include change logs that summarize what changed and why, because raw technical diffs are not enough for governance audiences. A beginner-friendly audit question is whether the organization can answer, quickly and consistently, what changed in the last model update and what evidence supports that the change was safe. If the answer requires hunting through multiple systems and relying on personal memory, the controls are weak. Traceability documentation turns complex systems into something governable, because it creates a reliable timeline of decisions and artifacts.
Development controls also include how secrets, credentials, and sensitive access are managed, because A I development often touches sensitive datasets and model services. Even without teaching tool-specific details, you can evaluate whether access is controlled through least privilege, whether credentials are protected, and whether development environments are separated from production. If developers can access raw personal data casually, privacy risk increases and audit risk increases. If vendors are involved, development controls should ensure data sharing is restricted and logged. Auditing here means checking whether the organization can describe who has access to what, why they have it, and how that access is reviewed and revoked when roles change. It also means checking whether development environments create uncontrolled copies of data, because experimentation can multiply data exposure if it is not governed. Strong controls make it easy to develop responsibly without creating shadow datasets and untracked model variants. Weak controls create a sprawl of artifacts that no one can fully account for later.
Another important development control is separation between experimentation and production, because A I work naturally involves trying many approaches. Experimentation is good, but it must be controlled so that experimental models and experimental code do not accidentally become production dependencies. A mature program keeps a clear boundary where only reviewed, approved, and documented artifacts can be deployed. It also ensures that experiments are reproducible and recorded, so that the organization can learn from them without relying on memory. Auditing this separation means asking how models move from development to production, what approval gates exist, and what evidence is required at each gate. It also means checking whether emergency changes can bypass normal controls and, if so, whether those bypasses are documented and reviewed afterward. Emergency pathways are often necessary, but they can become a loophole if overused. Controls are real when exceptions are rare, justified, and audited.
Fairness, privacy, and safety need explicit attention in development controls because they can otherwise be treated as someone else’s problem. A good development process includes checks that protect these values, such as reviews of features for proxy risk, checks that logs do not capture sensitive data unnecessarily, and tests that safety filters behave as intended. It also includes documentation that records ethical constraints and compliance obligations so they are not forgotten during rapid iteration. Auditing these elements means asking where in the development workflow fairness and privacy are checked, and whether those checks are required or optional. If the organization relies on one specialist to catch everything, risk rises because humans miss things when busy. Strong controls distribute responsibility through process, making safety part of normal work rather than an extra effort. This is how ethics survives business pressure: it is built into gates, reviews, and required artifacts, not left to personal heroism.
Metrics and monitoring hooks also belong to development controls because governance usefulness depends on collecting the right signals after deployment. During development, teams should define what will be monitored, how it will be measured, and how reporting will support decision-making. If monitoring is bolted on late, it often becomes shallow and unreliable. Auditing this means checking whether the system is instrumented to detect drift, uneven outcomes, misuse patterns, and privacy exposure, and whether those signals are tied to response procedures. It also means checking whether development teams test their monitoring pathways, because a monitoring system that fails silently is as dangerous as a model that fails silently. Good development controls treat observability as a feature that must be designed and verified. This helps ensure that governance can see problems early rather than being surprised later. When monitoring is planned and tested, the organization is better prepared to protect trust over time.
A practical audit also looks for evidence that these controls have been applied consistently, not just in one flagship project. Consistency is important because weak projects often cause the incidents that damage the reputation of the entire program. Evidence can include records of code reviews, test results, documentation updates, release approvals, and change logs. You can also look for examples where a change was delayed or redesigned because a review found risk, because that demonstrates the controls have teeth. If controls exist but are routinely bypassed, their real value is low. A mature program can show that its development workflow creates reliable artifacts and that those artifacts support audits, incident response, and continuous improvement. In everyday terms, it can show that it builds A I systems the way you would build a bridge, with inspection points and documentation, not the way you would build a temporary shed and hope it holds. Development controls are about building trust through disciplined behavior.
To close, auditing A I development controls means evaluating whether the organization’s software practices support responsible A I outcomes in the real world. Code quality matters because readable, tested, modular code reduces unpredictable failures in data handling, training, and output behavior. Reviews matter because independent scrutiny catches mistakes, challenges risky assumptions, and prevents high-risk changes from slipping through under deadline pressure. Documentation matters because it creates the memory and traceability that governance needs to prove compliance, explain behavior, and respond to incidents. Strong development controls also include secure access management, clear separation between experimentation and production, and built-in checks for fairness, privacy, and safety. They ensure monitoring and reporting hooks are planned and validated so governance can detect drift and misuse early. When these controls are present and consistently enforced, an organization can evolve its A I systems safely and defend its choices with evidence. When they are missing or treated as optional, the organization may still ship quickly, but it will eventually pay the price in confusion, incidents, and loss of trust.