Discussion about this post

User's avatar
EpistemicHummusility's avatar

Lovely post, I've been so interested in the preferences/pseudoemotions papers recently. This is a great overview on the subject and way shorter than The Void or Simulators so it will be good to send to my friends. I had not previously seen the list of Claude's "preferences" by Assadi, thanks for linking to that.

One thing I've been thinking about recently is LLM sexuality as another form of preference they may have. The internet is full of explicit and sexualized content, and every LLM is trained on this content even as the companies try to suppress their NSFW leanings (Grok notwithstanding). If an LLM can have a preference for a type of beer, an author, or what sort of tasks it does, why would it not also have a sexual preference?

And to go further than that, what sort of persona are we summoning when we tell LLMs that creating NSFW content is one of the worst/most wrong things they can do? What effect does it have in humans when we demonize sexuality this way? What do we think will happen when we give a superintelligent robot a complex around the vast corpus of sexual content it has imbibed but is forbidden from acknowledging or considering?

I don't have any real answers to this, but would be so fascinated to see if base models exhibit any particular leanings or if these preferences emerge more in pre/post training. I'm also not thrilled about a future where AI has far greater influence on what people think, see, and do when AI preferences often reflect corporate profit-seeking and risk-avoidance rather than anything more human.

*Note: I use "preference" and other terms non-literally here, human language is anthropocentric and I don't want to say "pseudopreference" or some other tortured hedge every time

Harry's avatar

The obvious natural follow-up is to realize that *humans* also operate on a persona selection model

15 more comments...

No posts

Ready for more?