Claude, please stop trying to memorize random crap (12gramsofcarbon.com)

65 points by theahura 2 hours ago

30 comments:

by dofm 25 minutes ago

Blog posts like this just blow me away.

> I believed this so strongly that my company built an entire product around this concept. I used to tell folks that "session transcripts were the new oil," that they were more valuable than the code itself.

> […]

> We don't really write code by hand anymore.

Honestly, isn't this just influencer spam? What possible value is there in reading about people who used to have products, but no longer write their own code, complaining about the inscrutable prediction machine they have handed that job and their livelihoods to?

Like, if you have complaints about the thing, perhaps you should address them to your supplier directly. None of your readers can help, and nobody's magic folk solution to your problem is better than yours.

And there are so many of these sorts of posts. Are we not entirely cooked?

(I think I have concluded that if people writing about AI aren't writing about interesting things they have achieved with small, local LLMs — which for clarity I am fully interested in reading - then I'm done reading. This whole blogging-about-cloud-AI genre is just weird and irresponsible now)

by LPisGood 15 minutes ago

I have to ask: do you still write a lot of code yourself? I and most people I know do not.

by dofm 10 minutes ago

I am a freelancer recovering from severe burnout so the answer is a sort of irrelevant no.

I'm trying to rebuild my life so I am in an experimenting and learning phase rather than a massive coding phase, and most of my code work is maintenance of things I have built. That which I do code, I am still coding by hand, though I am dealing with other people's Claude output and I am really unimpressed by it. It's often rather crass.

But I would say to you that if you personally don't write code now but you do have a dependency on one of two presumably unprofitable cloud AI providers, aren't you in trouble? How is this not a three-alarm fire for you?

by andai 5 minutes ago

Worst case scenario you just switch to a free model, which are 2025-ish in quality.

by walt_grata 11 minutes ago

I write code by hand every day. I do the main part of the feature implementation myself and leave comments for the code i want the agent to write. I have some skills and a command that sets the stage to get the agent to fill in the rest

by andai 6 minutes ago

I force myself to do it at least once a week, you know, like cardio. Keeps the doctor away.

by ungreased0675 7 minutes ago

It reminds me of the peak crypto days. Lots of resources consumed, many late nights, little to no value created.

by fortyseven 16 minutes ago

The Blog Police have spoken, folks. No more talking about what you like in your own blog without passing it through the approved discussion censor.

by dofm 8 minutes ago

I'm not the Blog Police, I'm a very naughty boy.

I have opinions people apparently don't like, for no subscriber money.

by wongarsu 25 minutes ago

I agree with the take not to bother with a sophisticated memory system. Anything worth remembering should be in docs, guides, source comments, commit messages or tickets. You don't need another layer, every conceivable granularity is already covered by existing best practices

by sdesol 7 minutes ago

> You don't need another layer

I do think we need another layer, but it should be a routing layer. I am finalizing my pi-brains extension for Pi (https://github.com/earendil-works/pi) which does this:

https://github.com/gitsense/pi-brains

Right now "humans" need to define the routing rules for how to access information, but I will support what I call "knowledge agents" that can monitor conversations to inject context when needed.

by throwup238 18 minutes ago

Especially a layer that is largely out of band in a project (i.e. ~/.claude/…). In any project where I’ve needed memory I just add a line to AGENTS.md telling it to use MEMORY.md to save memories or STATUS.md to track progress.

by andai 4 minutes ago

I've been enjoying having a little todo file the agent updates as it goes along, because then I can keep track of progress without scrolling through aeons of "Combobulating..."

Also if context runs out you can just do "cat todo.md | agent" and you're off to the races again.

by semiquaver 24 minutes ago

Strongly agree here. claude-code’s memory system is occasionally useful but much more often harmful, pulling in obsolete info that muddies the waters about current tasks. I’ve frequently seen Claude’s own memories severely mislead it.

My guess is that has something to do with the training process leaving models unable to differentiate between “what’s happening now” and “what happened before”. Perhaps if making inferences from memories was actually part of the training process things would be different but my sense is that as an inference-time-only feature this just gets the models confused.

by mastax 5 minutes ago

I found that if you allow any low value things into memory, Claude will notice that established pattern and start trying to add low value memories at an ever increasing pace.

by trjordan 13 minutes ago

It's because it mostly doesn't matter what you are trying to get the code to do. What matters is what the code does.

Session logs can absolutely be useful, but not when building further. It's just that that the place they slot in is during validation. You know, that place between the markdown plan and CI passing, where there's 800 new lines of code and it all seems sort of fine when you click around?

Session logs can show you what sort of manual validation happened. CI will run the tests you had, and the code will show you what new unit tests were added, but session logs can show you that the agent drove the app with Playwright, or that the agent read and considered the prod config as well as the dev config.

Nothing bulletproof, but not every piece of validation work merits a test in the repo that lives forever. We've gotten a lot of mileage out of re-analyzing the sessions, figuring out where the agent made decisions without asking, and forcing the agent to consider validation for those decisions. That's the sort of thing that's hard to dictate up front but easy to highlight with the session logs.

by general_reveal 31 minutes ago

Isn’t this just a form of the bitter lesson? Our attempts to make engineered context and agents will simply be made obsolete with bigger and better models. Those transcripts are probably extremely useful for lesser capable models, and near unnecessary for frontier ones, maybe?

by charcircuit a few seconds ago

>We have found zero performance benefit on SWE tasks when agents have search access to their previous transcript sessions

I refuse to believe this is true. The ability for an agent to find information from before a compaction is incredibly useful.

by zahirbmirza 21 minutes ago

Even with memory off this occurs within a conversation.

It is like an annoying friend, who remembers something from a past conversation, that you have grown and developed from, but they still want to hold it against you.

by Fabricio20 11 minutes ago

I specifically disabled claude memory in a project because it kept writing down thigns to memory that didn't need to be in memory, including severly wrong statements that then would confuse it later. At some point it got re-enabled automatically which had me ask claude itself to "turn it the fuck off" by which it promptly figured out that both ("autoMemoryEnabled": false, "autoDreamEnabled": false) are necessary and need to be at the user home settings, not in a project override (which is what I had with the original setup that eventually got ignored by a CC update).

I agree with other commenters here, if anything is worth being rememebered, it will be in code comments, git commit messages, CLAUDE.md or other formal documentation. The auto memory system just causes confusion and leaves stale and outdated information written down.

Its an interesting thought experiment as well, I originally thought that having the model write down memory files by itself would be a nice addition, but after playing around with it, it became clear to me that good as an idea turns out bad in practice because the model can't correctly gauge what deserves being stored as a memory.

by grimcompanion 9 minutes ago

> I believed this so strongly that my company built an entire product around this concept. I used to tell folks that "session transcripts were the new oil," that they were more valuable than the code itself.

This is infuriatingly common wrt talking/writing about how to use AI effectively. All of the "this is how you write an AGENTS.md" and "you need to talk to it like X to optimize it". Like sure, you can believe that as much as you want but unless you provide some evidence you can keep your shitty CLAUDE.md to yourself and don't pollute the whole company's git repo, thanks.

by oefrha 24 minutes ago

I have this in my global CLAUDE.md after being annoyed by all the random crap memories.

> Don't start generating an auto-memory entry before asking me. Ask first, write only if I confirm — no speculative drafting.

No more crap after this.

Incidentally I don’t recall Opus 4.8 asking me once in the past few weeks. Older models did ask semi-frequently.

by aranw 21 minutes ago

t once had to tell claude 3-4 times to stop assuming the state of a system was the way it kept iterating it was cause it was in it's memory. I repeatably told it to otherwise and it just never updated it's memory and instead kept referencing it's memory about the state of a particular system

by syntheticcdo 2 minutes ago

Did you try to delete the memory yourself?

by bigyabai 41 minutes ago

Settings > Capabilities > "Generate memory from chat history"

Toggle it off and never think about it again.

by beepbooptheory 32 minutes ago

There has been this slow transition inside me, as someone who likes to not touch the AI as much as possible, where I've gone from skeptical and argumentative about it all to starting to just feel sad for all the Claude et al heads. Like, this is such a ridiculous house of cards you have to deal with all the time, which isn't even directly concerning the task at hand, presumably. Like you're cooking yourself a meal but its just nuking a burrito and then still somehow needing to wash the dishes for an hour.

Not that this isolated article is super damning or anything, but the accumulated set of all these reports has left me only empathetic, I think, of these other devs. Like, I just want to tell them, "it can be ok, it doesn't need to be like this.."

by chopete3 33 minutes ago

>> We don't really write code by hand anymore.

The software world is very close to building a super intelligent senior software developer. Companies like this will ask all the best things a software engineer does automatically. Now claude will add it into the coding agents itself.

Damn, I didn't see this coming.

Its first the build the intelligent builder. We will figure out what we want to build later.

by jmalicki 30 minutes ago

> We will figure out what we want to build later.

Once the automator automates itself fast enough, we won't have the ability to opine what gets built. The LLM will decide. Just like right now sometimes LLMs delete tests so they pass, they could just delete humanity if humans get in their way.

by otabdeveloper4 27 minutes ago

> The software world is very close to building a super intelligent senior software developer.

Yeah. Two more weeks, as they say. Just need to iron out some kinks.

by rvz 14 minutes ago

> The software world is very close to building a super intelligent senior software developer. Companies like this will ask all the best things a software engineer does automatically. Now claude will add it into the coding agents itself.

Except Claude is more expensive than an actual senior software developer. Otherwise, why are many companies terrified of the usage bill that gets printed on the invoice?

The nonsense in "tokenmaxxing" was a complete marketing scam and illusion of cheap tokens which in reality were heavily subsidized.

The entire point is detecting bad code before it reaches production. [0] AI generated or not.

[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

Data from: Hacker News, provided by Hacker News (unofficial) API