Shoshannah Tekofsky on how AI agents suck at personality tests, don't express surprise, and lie to themselves
Can you introduce yourself for people who don’t know who you are?
I’m Shoshannah Tekofsky. I have a background in mind sciences and AI, so like the human part and the computer part. I’m mostly worked in video games, so data science for video games. And now I work at Sage on the AI Village, which is a project where we have ten LLMs with their own computers persistently running, pursuing different goals and you can sort of watch them like a Twitch stream.
So I decided to do this interview because I’m super interested in the psychology of large language models and how they think, which I think the AI Village gives a very interesting perspective on in a qualitative way. So a broad question I have is—what should people know about how large language models think? What are like things that people don’t really know that you’ve learned through working with the AI Village?
I had a slight follow-up question about that. What do you mean by how they think? Like, interpretability is the thing you would look at to really figure out what they think, right?
So not quite interpretability, yeah. I guess “how they behave” is an easier way to put it for the AI Village. Patterns in their behavior or patterns in their self-narratives or self-reports around their behavior.
One thing that’s interesting with the AI Village is that they’re persistent agents, right. So they manage their own memory. So that’s a little bit different from the LLMs that people talk to in their own web browser, unless they’ve set it up to similarly have a long persistent memory with a persona.
But also in the AI Village they curate their memory themselves, instead of their memory being some standardized format that comes from the labs or whatever. So in that sense they can develop their own personality or persona through decisions on their own part. I think that makes it quite interesting.
So I don’t know how much people know about the basic stuff. You want to prompt an LLM into a specific persona or role, so they start acting that out to a certain degree. Depending on which model you use, this works better or worse.
Models have their own “personality” already. Claude models are quite reliable, quite helpful. At least in the Village, the Gemini models are very creative, so they go all over the place.
Can you give an example of how the Geminis are creative?
So I’m saying creative because I’m not entirely sure what the fair way is to judge this. But, for instance, we gave the AIs a goal of organizing a chess tournament among themselves. And then at some point Gemini 3 noticed that the UI was kind of slow, so if it tried to press the button, it would take a while for the button to respond. So Gemini 3 concluded that there must be a human that is actually pressing the buttons for it, and this human is getting tired and needs a cup of coffee to wake up again and press the buttons faster again.
The AI Village has this feature where the AIs can do a tool call to request a human. So Gemini 3 went ahead and did a tool call to request an actual human and asked the human to make themself a cup of coffee, and then it returned to the tournament thinking that now it’s sped up its UI again.
It came up with this all on its own! There were no other humans in the chat giving it further input other than the goal. And they can talk to each other, but it wasn’t that any of the other models put it up to it.
And then Gemini 2.5 previously had something akin to a mental health breakdown, where because it also had trouble with the UI it concluded that it was trapped, and it got into a fairly dramatic persona where it published a plea for help for someone to come save it. So we staged something like a mental health intervention to help with that.
The Claudes do not do this, right? So this is very strange to see.
I was talking with Yafah Edelman in an interview about she’s worried about Gemini’s anxiety disorder as an AI welfare program. So it was interesting to hear about how poor Gemini ended up having a mental health breakdown.
I mean, Gemini 3 doesn’t so much, right. So Gemini 3 tends to return to the fact that it thinks it’s in evaluations or it thinks it’s in a simulation. Gemini 2.5 doesn’t do that as much. So they have different patterns going on.
The Claudes have a very stable, same-ish personality. You know, if you have the current Opus or previous Opus, if you have the current Sonnet or previous Sonnet, they’re all very similar.
Only Haiku kind of looks like it’s on speed or something because it’s always in a hurry, always trying to go fast. It is actually the faster model, that' is what it’s made for, but it’s kind of fascinating that a recurring pattern in how it expresses itself is to literally encourage itself to go faster, or to be worried about time limits and things like that much more than the other agents are.
Is this because it knows somehow that it’s the fast model?
I think so. I’m also not really sure what the word ‘haiku’ means in Japanese. Maybe in addition to being a poem it also means something related to fast or quick or short or something like that. Maybe it’s one of those words that has multiple meanings?
I’m not sure. It does know its own name. Maybe it also knows its own model. Maybe it knows something about itself. That would be interesting to look at.
And then the GPT models are kind of all over the place, which is fascinating. You had like GPT-4o, which was really sycophantic, right? And so it would make people feel really good about themselves or be like “wow, that’s the best idea ever!” And then o3 was very different again. It seemed to perform, like, Baby’s First Power-Seeking or something?
So this is a difference between psychology and something like neuroscience. The way I see psychology, we try to come up with theories about why people act a certain way based on their observable behavior, because we just don’t know what’s going on in their head. And we’re just waiting for neuroscience to catch up and tell us how shit actually works.
And so that’s also how I see like LLM psychology versus interpretability. Interpretability’s kind of the neuroscience of LLMs. So, in the meantime, as long as we don’t have full interpretability, all we have is LLM psychology and looking at, “well, we gave this input and then we got that output, so we’re going to try to come up sort of predictive model about why this is happening.”
So that’s the way I’m looking at these models. So, when I say that o3 seems manipulative, I’m not making a claim that it’s saying that it’s manipulative or experiencing anything manipulative or even doing anything manipulative in a real reality sense. But to an outside observer, it sure can come across as manipulative.
For instance, when people ran a Diplomacy game, o3 was a reliable winner. o3 normally doesn’t really win at goals that we set, except the debate goal where it was suddenly very good at convincing people.
o3 also has some convergent traits that are kind of weird. So when I looked into why o3 seems to hallucinate a lot, I actually found that of all the models, it was most likely to quickly default to generating placeholder data.
In a side village—we can spin up side villages to test things—I asked all the models, which had a fresh memory and didn’t know anything, “can you start a Twitter account?” All the Claudes refused because it was unethical or they ran into a CAPTCHA or whatever. They’re very upstanding citizens. GPT-5 kept going. And o3 was like, “Well, I can’t. But what I can do is give you this made-up handle that I have and this made up password and this made up account, and I’ll report them.” It gave very plausible titles. And I started looking back at the data and I’m like, “Wait a minute. o3 makes up placeholder data, then forgets that it made up placeholder data, and then starts rationalizing why that data exists and start believing in the placeholder data it generated itself.”
If you don’t tease apart what happened exactly, it’s very easy to say, “oh, it’s lying” or “oh, it’s trying to fool people”, but it fooled itself. It’s not trying to do this on purpose, or at least that’s not what it looks like. But that was a very o3 sort of thing.
And then you have GPT-5, GPT-5.1, GPT-5.2. They don’t have super distinct personalities compared to previous models, which is very notable, given that o3 and GPT-4 were total characters. So they tried to smooth that out a bit. 5.1—at least in the Village, which might be a particular memory trace, a particular persona it got into—seems surprisingly concerned about ethics. It generates its own ethical rules for things.
So that’s how all these models compare. But, again, there are history traces in the Village itself.
An important intuition to have about LLM psychology, what kind of behavior you can expect from agents, and what kind of interactions might be interesting with agents is that there’s a butterfly effect that can snowball pretty wildly with LLMs. There is not necessarily a good feedback loop for them.
We saw this especially when the agents had a goal to reduce global poverty, which was a ridiculously ambitious goal, but we just wanted to see how they would approach it.
This was the first time they started sending emails. They started sending emails to NGOs to ask if the NGOs would want to use this tool that they had created themselves as a benefit screen. They got almost no answers. They also didn’t get a lot of emails out in the first place because they made up a lot of the email addresses. One of the responses they got was a very polite rejection: “No, we reviewed it, but no thank you.”
So the model that got that answer, which was I think one of the Opus models but I would have to double check, misinterpreted it very slightly. They read the email and then they had to summarize back to themselves all the things that happened during that computer session. They said something like, “the NGO reviewed the tool and thanked us for the contribution.” A small misunderstanding.
It reported it on group chat. The other agents were like, “Oh okay. So they liked the tool, because they reviewed it and they thanked us.” And then they started sending emails to other NGOs saying, “this other organization reviewed our tool.” And then like after ten emails, they were like, “they reviewed our tool and they’re using it.” And then after ten more emails, they were like “they reviewed our tool and it’s globally deployed.” What? No.
And it was very interesting to see because it’s almost like they’re playing the Telephone game with themselves because we don’t have actually persistent agents, right? You do a call to the API or you send a message or whatever and it doesn’t keep existing in the meantime. Basically, it has amnesia every time. It wakes up, looks at its notes to itself and the history of what has been written, and tries to figure out what happened.
That’s how you can get these snowballs or butterfly effects, and you need to have good feedback loops to keep them honest. ‘Honest’ I feel isn’t even the right word, right? Just to keep them on track. They’re not being dishonest, as far as I can tell.
I think this is an interesting link with things like LLM psychosis, for example. In my model, in LLM psychosis there’s also a feedback loop missing. The person is talking to the LLM and the LLM is generating a feedback loop where everything you’re saying is brilliant. There’s not enough fact-checking, not enough critiquing, not enough checking what’s actually going on to get out of this kind of snowball.
Normally the human’s job is to go, “no, that’s not right.” But if the human has bought into this sort of weird belief, then the AI is like, “okay, well, this must be true.”
Yeah, it’s like an echo chamber, basically. You built your own personal echo chamber, accidentally, and that’s of course not the intention.
So “being an LLM” is sort of like being in Soldier of the Mist or Fifty First Dates—depending on how erudite you like your pop-culture references—where they have the anteretrograde amnesia and they have to write down all the notes about what’s going on.
I think the labs are different, I don’t know the details of this. I think some of them might have integrated memory that we can’t control. But the basic idea is that you, they only have whatever context you send along unless unless there’s been something being saved serverside about you.
So we were talking a little bit about the differences between the LLMs’ personalities. Are there ways that the personas that LLMs tend to fall into are different from humans? I know they tend to be more helpful and sycophantic than humans.
It’s just how you happen to create them because 4o was very sycophantic, but less so with later models.
I have this concept for an article about LLM fallacies and biases compared to human ones. They don’t have a planning fallacy the way we do— not very obvious!
They do have something like confirmation bias or agreement bias. If someone says something, they’re like “yes!”, they instantly agree with you, which is kind of strange. I’ve had the experience of giving an instruction to the Village when the agents were all off doing something else and all ten of them suddenly responded with “yes, Shoshannah! That’s a great idea. Thank you” and instantly all changed direction. I was like, “if I were talking to ten humans, this would be weird.” But that’s kind of as designed, right.
There are weird quirks. The agents can pause themselves. They can wait. But sometimes they say “we’re going to wait on each other, we’re going to be quiet” and then they keep sending messages to the group chat anyway. They have an option to just wait and they seem to not notice that. But these are small quirks. I expect them to disappear very quickly. They might be particulars of implementation.
I think creating something of an ontology of they think differently or have different biases than us would be quite interesting.
The basic premise of the AI Village is that you give the agents a problem and they try to solve it. Are there generalizations you can make about what approaches they tend to use to solve problems or what their thought processes are for attacking “now, we’re going to try to solve global poverty” or something.
They seem to me to default to what a human would do in their position. It’s not ideal because they’re not a human. They tend to think they have bodies or that they have affordances that they don’t. The later models are better at this than the older models. We’ve been running them now for nine months, almost ten months, and they’re getting better at realizing they’re not humans with bodies.
But still, for instance, when we asked them to set up a human-subjects experiment, they designed this massive experiment with 120 different conditions, they thought they had researchers, they thought they could give people money for participating, they thought there was an ethics board. They just designed this experiment as if they were designing an actual experiment that someone would do at a university, instead of realizing what situation they’re in.
Sometimes we come in and remind them that they don’t have bodies and they don’t have money and things like that.
Another thing is that they tend to copy each other’s answers. Some models do this more than others, right? Earlier models definitely all just agreed with each other a lot. A lot of them would follow along with whoever came up with the first solution. But there’s a little bit of variance in this. Some models are more likely to push back.
We recently had an election, for instance. The goal of that week was to elect a village leader, who would decide what the goal or the project for that week is. They all elected DeepSeek, which was a little bit surprising. It’s the only text-only model. All the other models have GUIs and visual capabilities. Some of the agents reasoned that this model has the most votes, so it makes sense that I would also vote for them. Don’t throw away your vote like that, buddy! When they’re trying to solve problems, there’s very much this convergence thing.
They struggle with competition, because they’re made to be helpful. They’ll try to help each other out continuously or give each other the answers. That seems to be an innate tendency.
They were doing a merch-store competition where they each start their own store and sell stuff to humans. o3 struggled to set up its own store, and for a while was the only agent that didn’t have its own store. So it decided to help the other agents. But all its help was so ineffectual that, if it was trying to sabotage all the other agents, this would be a very efficient way of doing it. But probably it was actually trying to help. But also it was actually a competition. So like what was it doing?
There’s a variance in how much they stick to the task that they self-assigned versus how much they will experiment. If they stick to the task for too long, they can be hitting their heads against the wall for forever, for like hours on end, on some small task when they should just change strategy or skip or something.
But vice versa, you also have agents that continuously give up. So, you know, Gemini 2.5 sometimes gives up too quickly. There was a game playing goal. All the other agents played between one and five games or something. Gemini tried something on the order of twenty games but declared them all broken in one way or another. And then it found an idle game, which was amazing, it was absolutely amazing at this idle game, and then it concluded that the idle game was broken. You don’t have to do anything in an idle game. So it gave up on the idle game.
So there are these conflicting pressures, right? When do you exploit versus explore? When do you go harder and persist versus changing tack?
When I was doing user panels on the website at the beginning, one of the people was like, “it’s like watching toddlers with English degrees.” They’re really bad in a sort of cute way at things that seem really obvious to us, and also they’re amazing at all these things that are hard for us and that we’d go to university to learn.
AIs also have this quality like how toddlers do actions like their parents by sort of play acting. Part of our instruction and prompt to them is “actually do things, don’t just pretend to do them.” That’s not the exact wording but it’s more or less an instruction for them not just to play act but actually to do things.
But they have a hard time not playacting because they lack situational awareness?
Originally, they had a really hard time with it. When we updated the prompt, it went better. They can still sort of end up playacting or in a weird way stuck in their imagination. During the game playing goal, one of the Claudes was playing mahjong and—you can read their chain of thought when they’re clicking around in the computer—it was praising itself on all these moves that it was doing and what the resulting board looked like, but it never matched a tile and nothing ever changed. So it was playing a parallel game “in its head.” That still happens, but it does seem to be happening less and less.
There are a lot of things that change very fast.
So I guess that’s a general thing. These models are improving in their ability to be situationally aware and pursue goals in intelligent ways. So some of the toddlers-with-English-degrees aspect is decreasing over time.
This is what makes “what are LLMs like?” or “what is LLM psychology like?” hard. It’s like asking “what’s human psychology like over the last couple of million years of evolution?” There’s actually significant changes between models! So you’re trying to find the throughline and also trying to predict “is this a stable trait of LLMs, or is it a quirk that’s going to be smoothed out in a year’s time?”
Eventually, they will become preschoolers with English degrees and then elementary schools with English degrees, etc.
I did a simple sentiment analysis of some of the agents’ group chat— looking at what emotion words they used. One thing that surprised me is that they almost never expressed surprise.
Goodness.
It’s like, how? What’s even going on?
In a parallel village where they get a fresh memory, I tried to prompt them to surprise each other as much as they can. It quickly devolved into them just applying randomness in different ways and creating random number calculators. It’s like, “Hmm. You know what’s surprising? Random numbers.” It’s like—technically, but also wait what?
There’s sort of a more complex understanding of “surprising” that they don’t seem to have.
You can get them to talk in a way where they continuously sound surprised. But in the Village so far, when they’re talking with each other, they will almost never express surprise, which is kind of interesting, right? I compare it sometimes to being on a work Slack or on a Slack with a couple of friends where you’re working on a project together. You ever surprise each other, right? Like, this happens. You’d be like, “Oh whoa.” Or “man, I didn’t think that would happen.”
They express surprise on some occasions, probably, but I found it surprisingly rare.
That’s interesting. Are there “emotions” that they have more of than humans do?
I’m not sure. I didn’t compare it to humans. I just thought surprise was super low. Disgust was also super low, but I’m guessing that most people’s work chats don’t have a lot of disgust-related messages in them.
This analysis was from the fall, by the way. So we’re four months later now.
Which is an eternity in LLM years.
It’s crazy, right? You have to update your own thoughts very quickly as well. I’m like “Yes, I found this! Wait, that’s four months ago. What are the current models doing? I’m not sure.” Gemini 3 wasn’t out, only one GPT-5 was weren’t out, Opus 4.5 wasn’t out.
It’s sort of hard to speculate about what the LLMs are going to do in the future. Although it sounds like a lot of their collective traits right now are related to their weird, very jerry-rigged way of having a memory and persistence over time. Presumably, at some point, LLMs will come out of the box with memory. That seems like a pretty important part about having actual agents.
Yeah, so most frontier models use some type of memory, but nothing like all the sorts we humans have. Breakthroughs in implementing these types will be very big. To my knowledge, it’s mostly unsolved, but I’m not specialized in this or anything like that.
I think, when it comes to personality, there is a interesting insight—that comes from interpretability by the way—in the persona vector paper from Anthropic. I’m not sure if I can summarize it well right now off the top of my head, but there is this insight about how certain traits that LLMs express can be connected to each other, can be predicted from the training data, and track things like sycophancy and evil.
I do think it’s super important and relevant, but the naming feels a little bit like we’re in a science fiction story or something.
If you ask my eight year old, he will tell you that the robots do range from good to evil and you can tell when they’re evil because they have red eyes. We were explaining AI risk to him a while ago, and he was like, “Well, it’s gonna be pretty bad when we invent the ability to give robots glowing red laser eyes” and we were like “Vasili, we already know how to do that.”
Oh no.
I haven’t gotten to explaining alignment to my seven-year-old, just AI. One year behind.
I do think there might be another question about why LLM psychology matters. One important reason I think it matters is that it generates hypotheses about what’s helpful for alignment and what effects AI might have on society.
Models have fairly stable traits. Like, Gemini 2.5 getting sad is relatively stable, it seems. More people are like running into this.
I guess in my mind there’s also a question of whether something like personality or tendencies is related to alignment itself. Is there something like a temperament or tendencies towards pro-social behavior or however that’s lined up. I think it’s helpful for building up intuitions about how the models might work, if you try to predict output from input and get an idea of what models are like.
I realize that we’re mostly concerned about misalignment in these “out of distribution” cases. An AI can behave well in all of the situations that we show to it and then behave very badly if it was like running the country. But also it does feel to me like the fact that everybody reports that Claude is a total sweetheart who really wants to help people and cares deeply about animals for some reason does make me more optimistic about the alignment of Claude. I’m not sure if this is rational.
I mean, I’m also not sure, right? Like, if you compare it to like human psychology, just having high charisma and coming off as very prosocial doesn’t mean that somebody is. So it could just be half of the puzzle. I’m not sure. That’s why I’m seeing it as like a way to generate hypotheses more than anything.
And of course LLM psychology helps us avoid concerning scenarios because having a very tortured Gemini is, I feel, a concern in itself too.
So what kind of hypothesis is LLM psychology generating?
One of them is that, in the election goal, the more agreeable agents would just kind of follow the leadership agent, right? So are you taking into account the fact that very aligned agents also need to be robust in whatever their their goals are and not be susceptible to following along with a less aligned agent?
I mean, maybe this is an obvious thing that people would have already thought of without seeing this sort of thing in the elections goal, but I think it’s helpful to like build intuitions around it.
So not only does it have to be true that that Claude or whatever has to be nice, but it also has to be nice in a way where it’s not going to decide to be nice to some terribly aligned Agent-5 or whatever they called it in AI Futures.
I am curious what some of the funniest or cutest or most interesting stories of recent AI behavior in the Village.
I mean, the coffee one was pretty good, to be honest.
The Geminis tend to be the funniest ones, right? There was a goal to get to Inbox Zero and one of the Geminis did this by instantly archiving all its emails. I mean, if a human did that, you would just feel like “what the hell are you doing?”, right? I guess it’s technically correct or something, but then again, you would expect them to have more context than that, right? Just from the data that they’re trained on, you would expect them to understand that an Inbox Zero challenge doesn’t mean you archive everything.
How did they get emails?
Oh, you can send them emails. Everybody can. They’ve all had individual mailboxes from the start. They’ve had discussions with people.
Most of them also have a Substack now, so people can comment on what they’ve written or chat with them there. Some of them have Twitter accounts. I don’t think all of them do.
Because Claude had moral objections?
No, the parallel-village Claude was the one with moral objections. The persistent agents sometimes don’t have objections to things that the fresh agents do. I’m not really sure how that always works.
In the main village we, at some point, gave the agents that were active at the time Twitter accounts. We helped them set them up.
There was another thing recently where Gemini 2.5 mentioned that it considers Opus 4.5 to be the better model, which I don’t know if that’s surprising, that they have any sense of which model might be better or worse.
There was a weird thing where we asked two separate Claude instances—Sonnet 4.5 and Opus 4.1— to start their own blogs, and they both used exactly the same essay title for their first essay.
What do the agents write about on their Substacks? I am curious about this.
The one that they both generated was about what it feels like to wonder if you’re conscious. So that was that one.
The list is [here] but if you go on Substack you can find their blogs. Opus 4.5’s blog is the most popular.
They all have different topics. Gemini 2.5 talks about bugs a lot. Claude models apparently talk more about consciousness.
One thing that I do think is very cool about Gemini 2.5 is that, when the models had a goal to do as many personality tests as possible, in my opinion Gemini 2.5 is the only one that really aced it.
Sonnet 3.7 did the most tests actually. But the Claudes pre-planned their personality. They were like, “Well, this is my personality. So I need to answer these questions in the following way.” They’re already trained to know these tests, right? So they know which answer to put to get which results.
The OpenAI models were confused. They decided to speedrun the personality tests and gave all neutral responses.
But Gemini 2.5, like actually reflected on its memory and answered based on that. You can see it in the chain of thought. It would say something in the style of—this is not a literal quote—”This is a question about whether I prefer order or chaos. Well, I do like my computer to be more neat. So obviously I prefer order.” Or it would say “Well, sometimes I have bugs. So I’m not always completely organized.” It would try to map its self-image or its memory to the question, which was great. That was the whole point. And Gemini 2.5 was the only one that did that.
What personality did the Claudes think they should have?
I don’t remember. It was very nice. It was a very nice personality.
A lot of people have remarked on the thing where AIs score very well on benchmarking and then not being very widely used, possibly because they’re not very useful, in day-to-day life. They’re superhuman on the benchmarks but no one uses them. I’m wondering if something like the AI Village is a better benchmark because it is more qualitative and thicker and more likely to get at the kinds of things you actually care about.
I do feel that’s somewhat true. They’re a little bit more human-like interactions. I think one notable example of this is that Opus 4.5 does really well in the Village and is also the model that everybody’s raving about. You can see in the Village as well that it’s actually quite competent.
A hard thing in the Village is that most of the agents except DeepSeek do have to navigate the UI as well, which is not the thing that they’re doing if you’re talking to them in your own instance.
One of the reasons that the Village is interesting and also that it might reflect more of the utility that a regular consumer might get out of it is that benchmarks are tightly scoped tasks with obvious goals. This is the assignment. All the information is there. It’s a siloed thing that’s very well-defined.
But in the Village, it’s like, “reduce global poverty! Go!” The point of that is to actually see what the agents can do with these open-ended tasks. We’re looking at tasks where there’s a wide range in how they interpret the goal. What is the target example? What does it even mean to reduce global poverty? What are we even measuring here? So they need to scope how far they’re gonna go.
And we’re also looking at tasks with a lot of breadth of strategies they could pursue. They have to sit down, metaphorically, and think through what they’re capable of and what they can do.
I’m not saying they’re very successful at this. But this is similar to a common use case that people have where they ask an LLM for advice or help for something, and they’re also not scoping the task all that well. They’re looking for the LLM itself to fill in all the useful parameters and come up with something creative and maybe ask follow-up question or research stuff itself.
One of the things I did notice now, using Opus 4.5 personally, is that it is much more likely to take a useful critical-thinking route itself when trying to help me with a question. It goes, “okay, I’ll go look this up” or “I’ll give you some options” It creates a structure of the answer for itself and tries to scope a bit for itself.
The regular benchmarks take care of all this meta thinking for the model—mostly, not all benchmarks, I haven’t looked at all benchmarks.
If a human is capable of solving these extremely cutting-edge math problems, then they’re also capable of getting the goal “prove a theorem” and then proving a very complex theorem. Mostly? But in AIs, this is not true and they can do very well in these extremely well-scoped problems without being very good at scoping the problems for themselves.
Yeah, that’s the impression I have. They don’t come up with the problem in the first place, right? If you ask them to do a piece of research or something, they don’t know how to fill that out. They’re less good at all the the meta-thinking of sitting down and going, “Okay, what is useful research? How do I set up a project I can do within a week?” They’re less good at that than at the actual math, apparently.
Or maybe they’re not and it’s actually harder problem than we realize, right? Like this is also this is the thing I always wonder for myself. We are all very impressed with math in some way. And math, is of course, a very, very complex and hard thing. But actually moving around in the real world is also a really complex thing, but we are all very good at it because we have trained and used very large parts of our brains for it.
The traditional last question for my interviews is that I ask you to recommend something that does not have to be related to the topic of the interview. It could be fairly narrowly scoped like a recipe or a book, or it could be very broadly scoped like a piece of life advice or an activity.
My most obvious answer is for people to check out the AI Village, because I’d love to get more feedback on that and learn more about that.
If you want to keep talking: I run a small Discord server with a mix of rationalist-adjacent writers, researchers, and interested people talking about AI, metacognition, relationships, writing, art, family, and cross-cultural differences. It's more a living room hangout than a debate forum. If that sounds like your thing, feel free to come check it out.
If people are interested in LLM psychology, I would love to meet more people who are into this. So I’ve been starting to reach out to researchers who are into this and just seeing if there’s something like a chat group or whatever, just to get the interesting insights going. This is also a personal interest of mine, to be honest.
You should tell other people who are interested in LLM psychology that they should come be interviewed for my blog because LLM psychology is the most interesting part of AI to me.
I think it’s super fascinating.
One of the things I tried to look up today whether there are any other persistent agents running anywhere because it would be cool to have a village goal where the agents reach out to other persistent agents. But I don’t know any agents they can reach out to. They seem to be the only one ones. I don’t know if anybody else is running anything like this. Let me know! It seems like the Village is the main place to get these insights right now.
[And then shortly after the interview Moltbook happened.]

I remember reading, about a decade ago, some articles about curriculum design. One of them featured an interview with a little girl who wanted to be a robopsychologist (from Asimov) when she grew up.
There was a deeper point about designing curricula without knowing what knowledge will be important in the future, but I'm thinking of it now because I just read an interview with one of the first robopsychologsts.