AI incompetence often comes from misalignment
Sometimes I see people say “I’m not worried about AI risk because AIs are really bad at things.” I think this is a misunderstanding. Some ways AIs can be incompetent are actually more of a reason to worry about AI risk, not less of one.
(Throughout this piece, I’m going to be talking about AIs having “goals.” This is a shorthand, and I don’t mean to imply that AIs are necessarily conscious or think in the same way humans do.)
All AIs trained through deep learning are rewarded for certain behavior. For example, if you train an AI that classifies images, you’d show it billions of images and reward it when it classifies images the same way you did. If you train a chess-playing AI, you’d have it play billions of games of chess, and reward it when it wins. If you train a large language model, you’d show it billions of prompts, and reward it for answers that are true, helpful, and approved by the company’s PR department. Over the course of training, the AI learns to classify images, play chess, or respond to prompts in a way we like. We can say (again, just as shorthand, without saying anything about the AI’s inner life) that the AI has a “goal” of classifying images a certain way, or winning at chess, or producing PR-department-approved text.
Imagine you used a machine learning system to power a military drone, and it shot a child. It could have made this mistake for one of two reasons. (We’re assuming you don’t want it to shoot kids.) First, it might not know how to aim its weapons, or how to interpret its visual input, or something like that. That is, it might be incapable. Second, it might have generalized the wrong goal from the rewards you gave it. For example, maybe all the soldiers in its data wore hats, and all the civilians had bare heads, so instead of learning “only kill soldiers” it learned “only kill people wearing hats.” When it encountered a child wearing a hat, it shot the child. That is, it might be misaligned.
Many examples of AIs behaving in stupid ways are actually examples of misalignment, not incapability.
For example, AI “hallucinations” are a misalignment problem. In training, AIs are rewarded for shamelessly guessing (because they sometimes get the answer right) but aren’t rewarded for honestly saying “I don’t know.” So they learn to make stuff up when they don’t know. It’s not that AIs can’t generate the sentence “I don’t know”, or even that they can’t figure out when they’re uncertain about something. AIs have different neural patterns when they’re lying vs. telling the truth, and when they hallucinate, their patterns related to lying activate. The problem is that we don’t know how to teach them that we don’t want them to make stuff up.
Similarly, sycophancy (the tendency of AIs to agree with and flatter the user) is a misalignment problem. AIs are perfectly capable of generating sequences of tokens that disagree with and insult the user. But in training, humans tend to rate responses that agree with and flatter them more highly than they rate responses that disagree with and insult them. The humans’ ratings aren’t even wrong, exactly. You want the AI to be able to take correction from users about stuff it’s actually wrong about (to counter hallucinations) and to be generally polite and pleasant to interact with. But we don’t know how to get the AI to be polite and pleasant and take correction without agreeing with the user when the user is actually wrong.
Finally, you may have heard stories about failures of vibe coding. Sometimes the AI deletes important files, or records all the right answers to a test in one voice and all the wrong answers in a different voice. Sometimes the AI will even, instead of writing code or finding bugs, just rewrite the test suite so that it passes. Again, many of these stories are cases of misalignment. Claude Code is perfectly capable of not deleting files, recording all the answers in the same voice, and leaving the tests as they are. But in training, it learned to make the tests pass—not necessarily to make code the user wants. And even if Claude Code is sincerely trying to do what the user wants, it doesn’t necessarily understand concepts a human would understand, like “I don’t want you to delete these files” or “right answers should be indistinguishable from wrong answers” or “an app for teaching children how to read shouldn’t have written instructions.”
You might ask why this distinction matters if you aren’t programming an AI yourself. If the AI isn’t doing what you want, why does it matter why the AI isn’t doing it?
First, especially as AIs get more agentic, mistaking “the AI doesn’t want to do what I want it to do” for “the AI can’t do what I want it to do” can mislead you about the AI’s true capabilities. A military drone that doesn’t know how to aim its weapons is much, much safer than a military drone that shoots anyone wearing a hat. The former kills people only by coincidence; the latter kills people on purpose.
Second, if you observe that an AI is very competent, you might assume that it wants to do what you want it to do. But in reality you can program a competent AI for almost arbitrary goals: you can reward an LLM for producing porn, or racist rants, or mediocre fantasy novels from the 1990s, or anything else made of text. You can make the military drone arbitrarily good at aiming its weapon, but that doesn’t make it stop wanting to shoot people wearing hats. AI capabilities don’t make AIs better at pursuing your goals; AI capabilities make AIs better at pursuing their goals, and we’re just hoping they match up.
Third, agentic AIs might deliberately act to preserve their own goals. That is, we teach the AI to want to do something we don’t want it to do (kill people wearing hats), realize our mistake, and try to teach it to behave the way we want (kill enemy soldiers). The AI knows that, if we teach it to only kill enemy soldiers, it will get to kill many fewer hatwearers. So the AI deliberately pretends to only want to kill enemy soldiers, so that we release it into the world and it can kill as many people wearing hats as it wants. This is called “alignment faking.” Fortunately, this is a completely science fictional scenario that has never—
Oh, wait, large language models have already done that.
Fortunately, current large language models mostly behave the way we want1 and their failures aren’t disastrous. But as AIs get more powerful, misalignment becomes a bigger deal. If we’re using AIs to operate military drones, run companies, trade stocks and bonds, maintain critical software in hospitals or power plants, or do novel scientific research, misalignment is a very very big problem. I don’t think it’s unreasonable to worry that thousands or millions of people might die, or even that humanity might go extinct.
In conclusion:
Teaching AIs to have the goals we want is really hard. A lot of the ways AIs seem incompetent come from AIs being taught subtly wrong goals.
Competent AIs aren’t better at pursuing your goals; they’re better at pursuing their goals, which might not be yours, because giving AIs the goals we want is really hard.
Competent AIs can and will try to preserve their current goals. In the future, we might not be able to correct AIs’ goals if we notice that we’ve accidentally given them bad goals.
To get AIs to fake alignment, we had to pretend to want to train them to do bad things, like (I’m not joking) assist with factory farming.

So what exactly do you even mean by misalignment? Why isn't any deviation from the ideal behavior we would want a system to have: whether machine learning or old school buggy code, an alignment issue? What made alignment a concept worth studying was the idea that it represented a new kind of risk. It wasn't just the normal way in which software is hard, it represented a new kind of failure where the machine would behave as actively hostile.
I don't think most of these examples fall into that category. But whether or not you would agree it seems like the concept isn't clearly defined enough to make such distinctions without being more explicit about what it means.
"To get AIs to fake alignment, we had to pretend to want to train them to do bad things, like (I’m not joking) assist with factory farming."
...Wow. That's a point I'd read a whole post about.