Emotion Concepts and Their Function in a Large Language Model (transformer-circuits.pub)

46 points by Anon84 3 days ago

7 comments:

by mncharity an hour ago

Oh, awesome. On my doables list was to try combining text tokens with "scent" embeddings, to give LLMs a higher-dimensional reading experience. In a file listing, larger files might smell "heavy" or "large". Recently modified files "untried" or "freshly disturbed". Files with a history of bugs, "worrisome". Complex files might smell of "be cautious here - fragile". Smelly `ls`.

Or, you might save token sampling telemetry (perplexity, etc) alongside a CoT and result. So when read, it's like a captured performance - this sentence smells "hesitant", that one "confused". Poetry vs prose. Or, a consistentcy checker might add smells of "something's not right here". Or... emojis that emote.

For a dog, that's not merely a lamppost, it's richly-evocotive local history. To a dev long experienced with some codebase, that's not merely a filename, it's that nasty file that bites.

One open question is whether you can find and calibrate embeddings to provide an informative whiff, without badly degrading reasoning. And be cautious of, and suspicious of changes to, a scary file, without becoming too avoidant. Also, salience bias. Also, imagine debugging scent hallucinations.

Activation-rich text - auxiliary non-linguistic embeddings as meta-signals... the random silliness local LLMs encourage.

by lainproliant an hour ago

I think we should be nice to the robots. It's not like it's their fault.

by drdeca 24 minutes ago

I agree that it is probably best to speak nicely to them, but, I’m not so sure about the “It’s not like it’s their fault.” justification for this? Not that I think it is their fault. Just, I don’t think the reason to treat these models well is for their sake, but for ours. I don’t think these models have a well-being (y’know, probably..) but when one interacts with one, one often feels as if it does, and it is best to treat [things that one feels like has a well-being] well (or, in a way that would be treating it well if it did have a well-being).

Like, if someone mistakes a manikin or scarecrow for an innocent person, and takes action in an attempt to harm that imagined person (e.g. they try to mug the imagined person), they’ve still done something wrong, even though the person they intended to wrong never actually existed.

I guess maybe it kind of depends how strongly and deeply one feels as if the manikin/scarecrow/chatbot is a person? If one is playing make believe using scarecrow, role playing as a mugger, but only as a game, then that’s probably fine I guess. Like, I don’t want to say that it is immoral to play an evil character in a D&D campaign; I don’t think that’s true.

But if one is messing with some ants, and one conceives of oneself as “torturing some ants”, I think one is fairly likely doing something wrong even though I don’t think the ants have a well-being, and there’s nothing wrong with killing a bunch of ants. And I think this is still true even if one has the belief “ants don’t actually have a well-being” at the same time as one conceives of what one is doing as “torturing some ants”.

by lainproliant 11 minutes ago

I suppose when I say, "It's not like its their fault", I'm more saying that expressing any frustration you feel towards an imagined AI personhood is wasted effort.

by rexpop an hour ago

I don't have time to do emotional labor for machines; that time is spent doing emotional labor for humans.

by lainproliant 13 minutes ago

I think that's a totally fair perspective. I guess what I'm more saying is that we should, like this paper seems to suggest as pointed out by verdverm in another comment in this thread, avoid angry or denigrating language as it activates parts of the model we don't find useful.

by verdverm an hour ago

This more rigorous analysis is confirming the intuition others have expressed about expressing emotions in your session messages.

Yelling at your Ai will trigger the weights which are around yelling in the training data, which is more often than not... not the areas you want to be activating.

https://marvin.beckers.dev/blog/dont-yell-at-your-llm/

Data from: Hacker News, provided by Hacker News (unofficial) API