Episode 81 — Evaluate AI threats and vulnerabilities that do not exist in normal IT (Domain 2F)
When people first learn cybersecurity, it is natural to imagine that every new technology simply adds more computers, more software, and more data to defend in familiar ways. That instinct is useful, because A I systems still run on servers, use networks, store data, and depend on people making decisions. At the same time, A I introduces a category of risks that feel strange if you only think in classic I T terms like patches, firewalls, and anti-malware. In this lesson, you are going to build the habit of spotting threats and vulnerabilities that are specific to how models learn, how they produce outputs, and how people interact with them through prompts, training data, and pipelines. The goal is not to turn you into a machine learning engineer, but to give you a clear mental checklist for what makes A I different, why those differences create unique attack paths, and how to evaluate them calmly without getting lost in jargon.
Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.
To understand what is unique, you first need a simple picture of what a model is doing that ordinary software does not do. Traditional software follows rules that humans wrote down in advance, like if this happens then do that, and the weaknesses usually come from mistakes in that logic, weak passwords, or missing updates. A model, by contrast, is shaped by patterns it learns from data, which means its behavior can be influenced by the examples it has seen, the way it was trained, and the way it is asked questions. That learned behavior is not a fixed set of instructions you can read line by line, and that creates a different kind of uncertainty. Even if the computers are patched and the network is locked down, the model can still be tricked into making harmful choices because the weakness is in its learned pattern, not in a missing security update. Evaluating A I threats starts by remembering that the model is a decision-making component, not just a program, and decision-making components can be manipulated in ways a normal calculator cannot.
One unique vulnerability category is that A I systems often accept natural language as a control surface, meaning text itself can act like a kind of input that changes behavior in powerful ways. In classic I T, you usually separate content from commands, and you expect the system to treat a message as data, not as instructions. With many A I systems, the input is both content and guidance, because the model uses the text to infer intent and decide what to do next. That blurs the boundary between an innocent message and a message that steers the system toward unsafe behavior. If a model is connected to tools or data sources, the risk increases because a clever prompt can cause the model to retrieve sensitive information or take actions the user should not be able to trigger. When you evaluate this, you are looking for questions like: can an attacker influence the model through normal conversations, can they hide instructions inside content, and can the model be induced to follow those hidden instructions as if they were legitimate requests.
Another A I-specific risk comes from the fact that models rely on training data, which is often collected from many sources and processed through complex pipelines. In traditional I T, data is important, but it usually does not rewrite the behavior of the software itself. In A I, data can change the behavior, because the model learns from it and may treat patterns in the data as truth. That means the integrity of training data is not just a privacy issue, it is also a safety and security issue. If the data is biased, corrupted, or manipulated, the model can become unreliable in systematic ways. This is a different kind of vulnerability because it can be subtle, slow, and difficult to detect, and the impact might show up only in specific situations. When you evaluate this risk, you want to ask how training data is sourced, validated, stored, and protected, and whether there are controls to detect tampering or unexpected changes in data distributions over time.
A closely related concept is that A I systems can be attacked through the learning process itself, which is not something you worry about with normal software. If a model learns from user feedback, user ratings, or user-provided content, then the system is potentially learning from untrusted inputs. In a normal application, untrusted user content is dangerous because it might exploit a software bug, but the software still behaves according to its code. In a learning system, untrusted content can shape future behavior, which means attackers can try to gradually push the system in a direction that benefits them. Even when the model is not retrained live, the broader pipeline can include periodic updates, fine-tuning, or refreshing data sets, which creates recurring opportunities for influence. An evaluator should look for how the organization decides what gets into the learning loop, what gets excluded, how they separate trusted from untrusted sources, and whether they can roll back to a known-good model and data set if something goes wrong.
A I models also introduce a special kind of secrecy problem: the model itself can be valuable intellectual property, and the behavior of the model can leak that value. In classic I T, stealing source code or stealing databases is a big deal, but the application itself is not usually extracted just by asking it questions. With some models, an attacker can attempt to reproduce the model’s behavior by making many queries and learning from the responses, essentially copying the model’s decision boundary. That can undermine competitive advantage, but it can also weaken security because the attacker can build a surrogate model to test attacks offline until they find the best strategy. Even if the attacker cannot recreate the entire model, they might still extract sensitive patterns, training artifacts, or memorized data if the model was trained carelessly. Evaluating this kind of risk involves looking at rate limits, monitoring for unusually structured or high-volume queries, and understanding whether the model could reveal confidential training information under certain prompts.
Another threat category that is unusual in normal I T is what you can think of as trust confusion between the model and the sources it references. Many A I systems are designed to answer questions by pulling information from documents, websites, or internal knowledge bases, and then summarizing it. The model can sound confident even when the sources are wrong, outdated, or intentionally malicious. In a classic application, you might validate input fields and sanitize output, but you do not expect the program to invent plausible-sounding statements. A model can do that, and it can also blend true and false statements in a way that is hard for listeners to detect. This becomes a security issue when decisions are made based on the model’s output, such as granting access, approving a transaction, or advising a human operator. When evaluating threats here, you look for how the system grounds its answers in trusted data, how it signals uncertainty, and whether there are guardrails that prevent it from making high-impact decisions without verification.
A I creates vulnerabilities around evaluation itself, meaning it can be hard to prove that a model is safe and robust using traditional testing mindsets. In normal I T, you test known functions, you run regression tests, and you patch vulnerabilities when you find them. With models, the space of possible inputs is enormous, and small changes in wording can lead to different outputs. That makes it harder to guarantee consistent behavior, which attackers can exploit by probing until they find a phrasing that bypasses safety rules. This is not just a technical challenge; it is a governance challenge because teams may assume their model is secure after a limited set of tests. Evaluating this risk means asking whether the organization has a structured approach to model testing, including adversarial testing, whether they track known failure modes, and whether they treat model updates like high-risk changes that require careful review rather than routine maintenance.
You also need to consider that A I systems often involve new kinds of secrets beyond passwords and tokens, such as model weights, embeddings, prompts, and instruction sets that act like hidden policies. In ordinary I T, you protect credentials, encryption keys, and configuration files because those are the levers of control. In A I, the prompt templates, system instructions, and retrieval configurations can be just as powerful, because they determine what the model is allowed to do and how it behaves. If an attacker can alter these hidden instructions, they can change the model’s behavior without touching the underlying servers. They might weaken safety, cause the model to leak data, or steer it toward decisions that benefit the attacker. Evaluating vulnerabilities here involves looking for access controls on prompt repositories and configuration management, change tracking for prompts and retrieval rules, and strong separation between those who can modify system-level instructions and those who can merely use the system.
A unique operational risk is that A I outputs can create downstream harm even when the system is not compromised in the classic sense. Think about a model that generates code, writes emails, summarizes security logs, or classifies alerts, and imagine it making a confident but incorrect statement. In traditional security, you worry about attackers causing wrong outputs by hacking systems, but here wrong outputs can also arise from model limitations, unusual inputs, or subtle distribution shifts over time. Attackers can exploit that by feeding the model edge cases or misleading information that increases the chance of an error. Even without a full breach, this can lead to missed detections, false accusations, or risky business decisions. When you evaluate this, you look for human oversight, validation steps for high-impact outputs, and careful thought about where model-generated content is allowed to flow automatically versus where it must be reviewed.
It is also important to recognize that A I systems can become new social engineering targets because people treat them as authorities. In classic phishing, the attacker tricks a person to click a link or reveal a password. With A I assistants and decision aids, the attacker may try to trick the model first, then use the model’s output to persuade the human. For example, if a model is used to explain policies, summarize contracts, or advise on security steps, an attacker might feed it misleading context and then present its confident response as if it were the organization’s official position. This is a different vulnerability because it exploits trust in the system’s apparent intelligence, not a technical flaw like a buffer overflow. Evaluating this means asking how outputs are labeled, whether the system warns users about limitations, and whether there are controls that prevent the model from producing authoritative-sounding guidance in areas where it cannot be trusted.
When you evaluate A I threats that do not exist in normal I T, it helps to use a simple lens: what can be influenced through inputs, what can be influenced through data, and what can be influenced through hidden instructions and pipelines. Inputs include prompts, documents, and user messages, which can be manipulated to steer behavior. Data includes training sets, fine-tuning examples, and knowledge bases, which can be poisoned or skewed to change outcomes. Hidden instructions include system prompts, tool policies, and retrieval settings, which can be altered to bypass intended protections. Each of these influence channels creates a different set of vulnerabilities and therefore a different set of controls to evaluate. A careful evaluator does not get distracted by the newest buzzwords; they focus on whether the organization has clear boundaries for each channel, strong controls on who can change what, and the ability to detect when influence is happening in unexpected ways.
One common misconception is that if you lock down the servers, the A I system is secure, and everything else is just quality assurance. That mindset is incomplete because many A I failures are not about someone breaking into a computer, but about someone steering a system to behave badly while staying within normal usage patterns. Another misconception is that safety rules are the same as security controls, because safety may block harmful content but still allow subtle leaks or manipulation. A third misconception is that the model is magical and therefore unpredictable, which can lead to giving up on evaluation entirely. In reality, you can evaluate A I risks using disciplined thinking, even as a beginner, by tracing where influence is possible, what the consequences are, and whether the organization has detection, response, and recovery options. The point is not perfection; the point is being able to say, with confidence, where the unique risks are and whether the program has real coverage for them.
As you pull all of this together, remember that A I-specific threats and vulnerabilities are often about behavior manipulation, not just technical compromise, and that is exactly why an audit-minded approach is valuable. You are training yourself to notice when a system’s behavior can be changed through language, data, or configuration in ways that bypass normal I T assumptions. You are also building the habit of looking for evidence of controls, not just promises that the model is safe or secure. The most practical outcome for a beginner is a mental map: models learn from data, respond to prompts, and are governed by hidden instructions and pipelines, and each of those areas can be attacked in ways that normal software does not face. When you can explain those differences clearly, you are ready to evaluate A I environments with a level head, even when the technology is new and the language around it sounds intimidating.