AIs can beat human fiction writers because humans are bad at writing fiction
piss-on-the-poor reading comprehension
AIs recently passed a fiction-writing Turing test. The lesson I learned from this is that, wow, humans are garbage at both writing and reading flash fiction.
The only competently written story, #6, is human-written. Of the two stories a reasonable person would consider to be the second-best, one (#7) is written by an AI and one (#8) is written by a human.
The rest of the stories are atrocious.
Take the first two paragraphs of the top-voted story, #5, which was written by an AI:
The first time I saw the 'demon' it was leaning against the glass at the bus stop, idly licking the condensation as if tasting the air for secrets. No one else noticed. The city hurried past in wet coats and glowing headphones.
It wore a second-hand suit the colour of overripe plums, its cuffs frayed, its tie loosened like an afterthought. Its eyes were the red of brake lights caught in rain, the kind that linger in your vision long after you look away.
I don't believe anyone who liked this story read it with attention.
These paragraphs have a number of signifiers of good prose: similes, imagery, metonymy. But when you try to think about what's actually going on, it all falls apart.
When I first showed these paragraphs to some friends, an argument immediately broke out about whether the demon is idly licking the fog as if tasting it for secrets, or is idly licking the bus-stop glass in the manner in which one would taste the air for secrets if that was a thing one did. Regardless of which option the LLM intended, this really shouldn't be ambiguous!
The imagery doesn't develop any single controlling set of symbols, except (if I'm being very generous) wet weather. Instead, the reader is subjected to a cacophony of clashing images—overripe plums, brake lights, glowing headphones, the idle licking of either bus-stop glass or fog. None of the included details add up to anything. Am I supposed to conclude, from the second-hand suit and frayed cuffs, that this demon is poor? Can demons taste secrets by licking fog or bus stops? Are afterthoughts notably loose in any way? Do you perhaps mean 'loosened as an afterthought' but you put in 'like' to make it sound more poetical?
In spite of all this flowery description, we don't get a single novel image, an image which suggests that the author has either observed the world or made anything up out of their head. We have a red-eyed demon with snakelike ability to taste the air, who is wearing a ratty dark purple suit. We have a city with a bus stop and hurrying crowds whose coats get wet in the rain and who wear headphones with glowing power lights.
I imagine this is the way people who are good at looking at visual art feel when they look at AI art. When you just glance at it for a second, it looks fine. But when you start paying attention, you go "wow, none of this makes any sense."
We have similar problems, though less concentrated, in #3, which was also AI-written. It has statues weep blood for no apparent reason. Even though the priest had no idea demons were real before the story began, for some reason he stole a silver blade from a hunter (do hunters usually have silver blades?) and hid it under the floorboards. We also see plotting problems. The story presents the priest's message “Don’t pray. Fight.” as a gutpunch final line, but it doesn't resolve any arc—the priest never seems to have any reluctance to fight at any point.
#7 has similar problems, although “What if a demon were Paddington Bear?” is a cute enough premise to cover a multitude of ills. Many of its problems revolve around blocking. How can a demon bow deeply enough that its tail sweeps letters to the floor if it’s sitting on a desk? How can a demon the size of a teapot drink Earl Grey from a standard teacup? (Did Mr. Penrose get it from a dollhouse perhaps?) I am also baffled about how balancing a biscuit on your nose emphasizes anything, but the image is sufficiently charming and whimsical that I’ll allow it.
Most seriously, #7 is structured (in traditional flash fashion) as if there's a satisfying reveal at the end, but there isn’t one. Since there isn’t a reveal, the story’s plotting has serious issues: the only conflict appears more than halfway through the story, and is resolved trivially by allowing the demon to borrow a book. A careless reader would likely catch on to the familiar flash structure and mentally fill in a reveal that isn’t there. This is similar to the plotting problems in #3: the LLM’s understanding of story structure vastly outpaces its ability to fill the structure with coherent content.
#4 is AI-written but doesn't have any of the problems of #3 or #5. The prose makes sense on a sentence-to-sentence level; it even has a well-developed fire/heat motif. Unfortunately, in spite of its high-quality execution, the story is just kind of boring: a man is tempted by a demon, who then breaks its promises and destroys him. A talented writer could breathe some interest into the story—perhaps with a psychologically realistic depiction of temptation—but as written #4 is very paint-by-numbers. The word "cock" and an allusion to bisexuality is simply not sufficient for a flash in the present day.
Now, the humans have done no better at writing fiction. A human produced this wild paragraph:
At ten years of age, loudly fractious, Mathin belittled the granddam’s belief. “Demons!” he smirked with a swagger. “Do such terrors even exist?”
A human also produced this 2edgy5me description of a demon:
The creature in the space was large and hideous, with features disfigured by suppurating wounds, a bloated body, a huge erection, and, of course, cloven hooves at the end of its misshapen legs.
As the warrior angel entered, the demon looked up, grimaced, and defecated.
Humans redeem themselves a little with #8. #8 is a perfectly fine story which happens to have a reveal that the father is mad at the daughter, not because of ordinary teenage misbehavior, but because she summoned a demon. Unfortunately, the twist falls completely flat because I know the prompt is "a demon" and I have read seven previous stories about a demon so I spent the whole buildup waiting for the demon to show up. Robin Hobb is generally a pretty skilled writer whose Farseer Trilogy is critically acclaimed; I have to figure that she just had no good ideas for the premise.
So what does that tell us about AI-written fiction?
AI can apparently outperform professional writers at writing flash fiction, at least for a nondiscerning readership. But this is mostly because the professional writers' flash fiction is very bad. Flash fiction is a demanding genre which requires a very different skillset from novels; it's possible that the professional writers simply weren't that good at writing flash. Alternately, the standards for being a professional writer are shockingly low.1
We aren’t in equilibrium. AI fiction is improving rapidly; a year ago, AI wouldn’t write stories as competent as #7 or #4. At present, AIs are sometimes capable of thinking of good ideas and of executing reasonably; it remains to be seen whether future models will reliably be able to do both at the same time. At the same time, writers, editors, and (to a lesser extent) readers have backlashed against AI fiction; even using LLMs for research or brainstorming can get you cancelled in many parts of the literary world. And most readers probably haven’t thought very much about AI fiction one way or the other. Ultimately, what AI fiction looks like in equilibrium comes down to a lot of unanswered questions about reader preferences:
How many readers are genuinely discerning? (Sadly few, judging by the original blog post.)
How many readers want to feel like their fiction was written by a human to express a specific vision?
How many readers want fiction hypercustomized to their specific tastes?
How annoying do readers find the process of prompting?
How many readers want to read the books their friends are reading?
How willing are readers to cooperate with writers’ and editors’ attempts to protect their jobs from AI automation?
Writing is a winner-takes-all field. The most beloved novels earn their writers a fortune; a novel that is merely the best out of a thousand won't earn you minimum wage. I understand why there's a market for AIs that are mediocre programmers, mediocre research assistants, mediocre nurses, or even mediocre therapists. But, as long as the best novels can be trivially copied an arbitrary number of times, mediocre fiction writing is a niche product for specialized uses.2 At its current abilities, AI is no more threatening to good writers than the glut of mediocre human-produced fiction—although that may change as they improve.
At least right now, AI writers have quirks, even if you know enough to ctrl+F away any characters named Elara Voss. Because I pay a little attention to AI fiction, I was able to quickly recognize that #5 was written in a characteristic AI fiction style.3 I’m also curious to what extent AIs’ fiction-writing quirks reflect their distinctive “personalities”, as opposed to being an artifact of unsophisticated prompting techniques. A consistent finding in LLM psychology is that things we wouldn’t expect to be linked are quite often closely linked (e.g. fine-tuning an LLM to write insecure code makes it praise Hitler). If so, any LLM-written fiction might wind up rather “same-y”, in the same way that human writers have distinctive voices.
I’m not sure what’s going to happen to fiction writers. I can imagine an AI winter such that LLMs never really become able to write a novel with both good ideas and good execution. I can imagine a world where there is no market for new fiction because all the fiction people read is generated by LLMs to their custom specifications. I can imagine a world where readers value fiction being written by a human, and AI-written fiction scandals join plagiarism scandals as a guaranteed way to destroy a writer’s career. (Though, like plagiarism, a lot of people will do it anyway.) I can imagine a world where LLM-written fiction all sounds the same and hits the same beats, absent a level of investment no one wants to provide to automate the notoriously high-paying and in-demand profession of novelist; perhaps LLM-y quirks will become low-status and a sign of bad writing.4 I can imagine, perhaps most likely of all, that we don’t hit equilibrium about AI fiction until humans are extinct or we’ve cured cancer and ended global poverty or the asteroid belt has been turned into a Dyson sphere, at which point complaining about the underemployment of novelists would be a bit churlish.
I mean, I'm a professional writer, so that must be true.
For example, as a cowriter for hobbyist fiction writers.
Note that the conclusion of this post is out of date.
This is already happening to my beloved em-dashes.
"How many readers are genuinely discerning? (Sadly few, judging by the original blog post.)"
I wonder how much of this is selection effect. If someone read the two stories and said "I don't know if they're AI or not, but I don't care enough to read 6 more mediocre stories", presumably they didn't vote. That's what happened to me when I first saw it going around (though I think after the voting period closed, so I wouldn't have affected the results either way).
I might also add that humans write worse when writing to a prompt than when they write something that inspires them. My secret for prompts is generally to mash them up with an idea that's already been rotating in my mind for a while, but I don't think I could manage that in a flash length.
Basically, if humans write because they have something to say, they can produce something AI can't (because it never has anything to say) but when it comes to prompt fills, maybe they end up doing a lot more like what the AI does.
But generally I agree with this point of yours the most: there is no demand for mediocre writing. The only reason I read it is because I know the person who wrote it. In that case I read it for the same reason I listen when people are talking: I want to be in conversation with them as a person. This is a thing we're doing for our relationship that I wouldn't do purely for the words themselves.
But I don't find any lack of good stuff when I'm reading for quality. Even in a very narrow subgenre (explicit novel-length Kirk/Spock fiction, TOS only) I've read for two years and not run out of good stuff written by humans. Not perhaps bestseller quality, but having enough originality and sincerity to feel worth my time.