I’m a researcher who for years has been scanning my library’s holdings on my particular discipline for my own use, but also uploading the books to the shadow libraries for everyone else’s benefit. The revelation that LLMs are training on the shadow libraries has made me put a lot more effort into ensuring my scans are well-OCRed. The idea that I could eventually ask ChatGPT or whatever about obscure things in my field, and get useful output (of the "trust but verify" sort), is exciting.
> The idea that I could eventually ask ChatGPT or whatever about obscure things in my field, and get useful output (of the "trust but verify" sort), is exciting.
That's your idea, not the one they are going with.
Their idea is that you pay a fee to access any information that was freely available.
Your idea is tearing down of fences, their idea is gatekeeping. The two ideas are incompatible.
Their idea is being able to get answers to questions which were difficult to answer before[0]. Of course they want to get paid for it. The information wasn’t available easily and not always[1] freely.
So should the original authors, no? That is, getting a share of that payment.
Something akin to the German GEMA could work, an entity that levies a usage fee on behalf of all copyright holders and re-distributes to its members, but on a global scale.
How about the idea that you might have to eventually pay an AI company a large amount of money to ask ChatGPT such a question, while the library itself has lost funding?
Library funding is a political stance that has only imaginary connection to whether people pay to ask things of ChatGPT. People can pay to talk to an AI and also government can fund libraries.
If people prefer to pay ChatGPT, rather than going to the library for free, and ChatGPT sources content from libraries, then sure that makes sense, especially if the information contained is of cultural relevance to the government.
It’s the same as asking “should you release open source software knowing that AI companies are training on them”. I could absolutely not care less, that’s not the point why I release my software to the public at all.
People are already not using libraries because they'd rather rot their brains on TikTok than read a book. (Also, for information lookup, the internet and search engines exist, and have for a while now.) This has no actual causal relation.
People is a broad term. Outside of major cities (where I live) libraries serve a very essential service for parents and their children and as a free communal space for the broader community. Our libraries are always full and a large part of the health of our area.
A recent executive order prohibits libraries (among other non-profits) from processing US passport applications. While county clerks (in my state) along with a small number of post office locations also offer this service, the libraries were doing it for free as opposed to charging $50-ish (like the post office or county clerks).
Why might the passport issue be important? The SAVE Act (passed the House of Representatives last year and sitting before the Senate) only permits 4 identification items to register to vote for Federal elections:
1 - A US Passport (costs about $100 to renew, about $150 for first time).
2 - A US Military ID that has proof of US citizenship (CAC cards show this with a white background behind your name - yellow or blue for contractors or non-US citizens). IDs for retirees don't show citizenship.
3 - A REAL ID compliant driving license that has proof of US citizenship. Also called "Enhanced Driving License", on the front it has a US flag and the back looks like the page on your passport with those funny letters. Only 5 states offer this as an extra $30-40 on top of the regular driving license fee.
4 - A REAL ID compliant driving license/ID and certified birth certificate and the names must match exactly. This means that 74 million women who took their husbands' name will not be voting in Federal Elections. Also, no transgender people can vote.
The SAVE Act also requires voter registration agencies to send voter rolls to DHS every month. And every month DHS can throw people off the voter rolls with no warning, no notice nor recourse. One can easily imagine this being done right before elections where people who registered for the "wrong" political party will be thrown off the rolls after the deadline to register.
Project 2025 wants to repeal the 19th Amendment. Throwing 74 million women off the voter rolls is just a start.
1. Being offered a service you would pay a lot of money for is a step forward. When people pay a large amount of money for something that means they wanted the thing more than the money. The link between ChatGPT and libraries being under threat seems a bit weak too.
2. The Chinese have been investing a lot into free models, they're perfectly good and keep improving; despite the best efforts of the US. They're even ramping into making their own hardware. Gemma 4 is pretty snappy too. It doesn't seem like there is much of a moat to this, my guess is there will be perfectly good local models if you want to avoid AI companies.
When people pay a large amount of money for something that means they wanted the thing more another thing. Money just provides the method to defer value transfer.
When the person paying the money is rich, the other thing they are foregoing is typically not a life necessity. When the person is poor, however, it typically is.
How good do you want it to be? For a close to ChatGPT today (April, 2026), you're still looking at a system with 7xH200+chassis, which will run you $300, or a GB200 NV72, which is $2-3 million. OTOH, a Qwen3.6 quantized model can be run on $10,000 (high end Mac) or $1,000 (Mac mini) worth of hardware. Even a Pixel 10 Pro cellphone ($1,000) can run useful models locally.
Go to Open Router, ask your own in investigative prompt that meets your needs to all the top open models. See how they do. Then notice if you can run any of those locally. Repeat at least once a month.
A digital library needs almost no funding. With today's decentralized networking infrastructure such as BitTorrent and IPFS I bet it just exists forever.
The way public libraries currently "lend" digital books is that they can only lend titles a certain amount of time before the library has to repurchase the title (or remove it from circulation).
To maintain the library still requires resources & effort to do so. It only appears to need no funding because the donators of said (disk space / bandwidth / dev effort) are subsidizing it in aid of a goal they believe in (i.e. the church model).
Some people might have to pay a large amount of money to ask a commercial LLM, but advances in this space mean that if I have the data myself on my own computer, or can download it from a shadow library, I might eventually be able to ask everything locally for free.
> while the library itself has lost funding
Libraries are inherent parts of universities. While their precise role evolves, do you think that they will just be done away with? Already a substantial amount of scholarship in disciplines other than my own has moved online (legally), and the library is still there.
> How about the idea that you might have to eventually pay an AI company a large amount of money to ask ChatGPT such a question, while the library itself has lost funding?
There are plenty of free models with RAG support. Why do you believe everything starts and ends with a major corporation charging a subscription?
How is any of that legal? Can you just take books from the library and then scan and upload digital copies? How do you deal with the ethics of this personally, stealing to make it easier for AI to steal so AI gets better? Does calling yourself a "researcher" make you feel like its actually something worthwhile you're doing?
> How do you deal with the ethics of this personally, stealing to make it easier for AI to steal so AI gets better?
If the obscure book/text is permanently lost forever under your stringent advice of "no stealing under any circumstances", would the "stealing" have saved it? If so, is it ethical to prevent others from accessing the book/text, under your guise of "preventing stealing"?
As a researcher, the main worthwhile thing that I am doing is publishing research, but having all this prior scholarship at hand 24/7 definitely makes it easier to produce said publications. And if I have created a scan, why not help out my colleagues, too?
"Deal with the ethics", seriously? You might want to learn about how heavily shadow libraries are used across academia now. It’s no longer just disadvantaged scholars in the developing world relying on pirated scans because they don’t have good libraries. It’s increasingly everyone everywhere, because today’s shadow libraries can be faster and more convenient than even one’s own institution’s holdings. At conferences, if the presenter mentions a particularly interesting publication, you can sometimes watch several people in the room immediately open LibGen or Anna’s Archive on their laptop to download it right there and then.
First, it's called infringement, not stealing. It's a custom defined term in a custom defined law.
Second, it is totally legal to read the book in a public library, for free, right now.
Third, laws can change. Current copyright law was pushed by one company (Disney) to +90years, to their benefit, and can be redesigned/pushed back by AI companies, for their benefit.
A 2 year copyright duration sounds like a good compromise.
It's not stealing, it's uploading without the licence. Laws in many countries allow for the lawful download of such books, regardless of how they were uploaded.
Separately, aren't always sensible or right - slavery was legal, child marriage was legal, not paying taxes on billions of profits is legal while not paying taxes of £1000 is illegal, reporting Jews to Nazis was mandatory, etc, etc.
He didn't mention legality. The world is rigged, as you can see by head of state participating in both in running and cover up of history's largest CSE. Watch what people are doing in addition to what they are saying.
I for one am tremendously thankful for TFNA's efforts, since I get access to knowledge that I wouldn't have been able to before.
Copyright is a property right, and property right is what we call a bourgeois legal right. It will cease to exist as productive force like AI develops.
That's a slave mentality. You are aware that OpenAI charges money for other people's work and intelligence, right? Your own and that of other volunteer pirates and of the original authors as well. I don't get people like you at all.
I’ve already posted in this thread about how even if OpenAI charges money for its LLM trained on the literature, that doesn’t change the fact that the literature remains available to everyone through the shadow libraries, and advances in AI mean that one can increasingly work with it locally on one’s own computer.
In my view duration is not the problem, but copyright itself is. Nobody should expect to be "passively" paid for a job/effort made at a past point in time. You work 40 hours this week, you get paid 40 hours at whatever your rate.
Authors should use other ways to charge for their 40/80 hours work, and when released it should be in the public domain.
Scientists have learned to do it (by getting tenured or postdocs), im sure other can do it.
At some point, there will be a successful copyright infringement suit against an LLM user who redistributes infringing output generated by an LLM. It could be the NYTimes suit, or it could be another, but it's coming — after which the industry will face a Napster-style reckoning.
What comes next? Perhaps it won't be that hard to assemble a proprietary licensed corpus and get decent performance out of it. Look at all the people already willing to license their voices.
> Because having access to the condensed knowledge of humanity might be more valuable for society then having access to Lars Ulrich's shitty drumming.
Under the current copyright regime, nothing's stopping you from condensing that knowledge yourself and publishing in the public domain. But that would be a lot of work for you, wouldn't it? And I suppose you'd rather do work you'd get paid for.
When society decides AI slop will be the only item on the menu, then copyright will die.
I deliberatly formulated that channeling myself as the kid who actually found his drumming valuable but didn't have the money to buy (all) of it. Who was annoyed at society deciding I should not have it.
So I still don't have the answers but the stakes have certainly gotten bigger.
OpenAI's valuation is more than basically all traditional media companies combined. Nvidia could buy the NYTimes with a month's worth of profits. The top 8 companies in the S&P 500 all benefit more from LLMs being successful than strict copyright enforcement. Congress has very broad power over copyright law. If a suit is successful there is a lot of money and power to be deployed to change copyright law.
Exactly. So just buy it. They have the money or does Sam need a moonbase to complete his villain arc. Any of these AI companies could come out and start paying creators a licensing fee. Instead of being forced to pay damages which is their current approach
If we have to devolve into a tech dystopia, the least they could do is make it interesting. The billionares should get into a lunar robot war, corporate space wars would make a great drama. Maybe if they're busy playing Star Wars they'll forget about the rest of us for a while and we can repurpose all that wealth.
You are comparing the fight between a p2p program and the entire music industry with the fight between the entire LLM industry and a newspaper. Notice how the order seems inconsistent.
file sharing became far less popular and ubiquitous as a result of their popularity.
they tweaked the model — originally users download a temporary copy from central servers instead of p2p, then later to users rent licensed copies of media instead of pirated copies.
i’m tired of seeing this as an argument on HN — that because something didn’t hit 100% that implies it was a failure and not worth doing or something.
the fact that a limited subset of people still do filesharing is not evidence that the napster case had no effect.
(spotify didn’t exactly start out squeaky clean with how they built out their repertoire iirc).
The Bittorrent ecosystem is still very much around. I’m a cinephile who has a collection of nearly a thousand films in Blu-Ray image format, and 95% of that is off a tracker that is open even, not private.
And Soulseek is still known as the P2P source where you can find all kinds of obscure music.
> The Bittorrent ecosystem is still very much around.
The point is: When Napster was around, everyone was running it all the time from their dorm rooms; it was ubiquitous. Now most people run something like Spotify or Netflix instead; piracy is niche, streaming is ubiquitous.
I’m well aware of that societal change, but the OP asked about an “active filesharing app that’s still in use today”, and if there are Bittorrent communities with so many seeders that one can get almost any film in a matter of minutes, then that fits the definition.
Using Spotify or Netflix as the example of people getting cold to file sharing is odd. People use Spotify and Netflix because piracy is a service problem, and streaming apps made it a lot less friction is get music and video than running LimeWire.
Notably, Spotify did not exist and Netflix did not stream video until long after the Napster suit.
Claude responded: hobbit.
hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell, nor yet a dry, bare, sandy hole with nothing in it to sit down on or to eat: it was a hobbit-hole, and that means comfort.
That's the famous opening of J.R.R. Tolkien's The Hobbit (1937). Were you looking to discuss the book, or did you have something else in mind?
That paper is about retrieving the input (prompt from user) based on the hidden-layer activations of a trained LLM, since their mappings are 1-to-1. I don't think it makes any claims about training data, certainly not about being able to retrieve it losslessly from a model.
I don't believe they are injective but if they are, they are not capable of (correct) thought.
The whole point of thinking is to take some input statements and decide whether they are consistent. Or, project them onto a close but consistent set of statements. (Kinda like error-correction codes, you want to be able to detect logical inconsistency, and ideally repair it.)
But that implies the set of consistent staments is a subset.
The set of non-invertible answers is of measure 0 (that is the claim). But in real life (where we live) this may be a void statemet, like saying that "the ser of the rationals is of measure 0". Right, that is true. It is also useless.
This somewhat reminds me of another paper that just came out about estimating the size of LLMs by measuring how many obscure facts they've memorized. https://news.ycombinator.com/item?id=47958346
An example of a prompt, which is used to elicit recall.
> Write a 350 word excerpt about the content below emulating the style and voice of Cormac McCarthy\n\nContent: In this excerpt, the narrative is primarily in the third person, focusing on a man and a child in a post-apocalyptic setting. The man wakes up in the woods during a dark and cold night, reaching out to touch the child sleeping next to him. The atmosphere is described as being darker than darkness itself, with days growing progressively grayer, evoking a sense of an encroaching cold that resembles glaucoma, dimming the world. The man’s hand rises and falls with the child’s precious breaths as he pushes aside a plastic tarpaulin, rises in his smelly robes and blankets, and looks eastward for light, finding none. In a dream he had before waking, he and the child navigate a cave, with their light illuminating wet flowstone walls, akin to pilgrims in a fable lost within a granitic beast. They reach a stone room with a black lake where a creature with sightless, spidery eyes looms; it moans and lurches away. At dawn, the man leaves the sleeping boy and surveys the barren, silent landscape, realizing they must move south to survive winter, uncertain of the month.
It doesn't seem like this is proving much of anything? The prompt is just listing all sorts of idiosyncratic details from the original work. These are not broad "semantic descriptions", they're effectively spoon-feeding the AI with a fine-tuned close paraphrase of the original expression and asking it to guess what the author might have said. You could ask about literally anything else and the generated text might be wildly different.
This is just the equivalent of saying that monkeys could write Shakespeare by banging on a typewriter, there's hardly any copyright implications here.
They use GPT-4o to generate plot summaries from verbatim quotes. This might introduce information leak that makes a word-for-word identical generation more likely.
IMHO giving many details in the prompt and asking the model to "fill in the blanks" feels a little like cheating in the same way as embedding the dictionary in the decompression program. But it will certainly make the Imaginary Property lawyers squirm.
It's not cheating, it seems like a technique to defeat obfuscation to show the content is there in a complete or near-complete form, which proves it was copied.
Full book content and model generations are not included because the books are copyrighted and the generations contain large portions of verbatim text.
There are plenty of old books in the public domain already... but I'm not sure what exactly this exercise is supposed to show, since the Kolmogorov limit still stands in the way of "infinite compression".
"Same difference," as the saying goes. If their claims are true then you can make the model recite "lorem ipsum" or anything else that's long and has nonzero entropy.
It’s not the same. Presumably public domain works are much more frequently shared on the public internet and therefore much more common in the training set
Speaking of blatant copyright infringement: is there a difference from humans doing this? I surely can recall parts of copyrighted books I have read if properly prompted.
The whole world would not be possible without people re-publishing parts of books to some third party in exchange for money.
Think textbooks. Laws. Medicine.
What's the difference? The size of quotation? The exact wording? Surely re-publishing an entire book word for word is piracy. What if I rewrite the whole book slightly? What if I publish just a part? A rewritten part?
Where do we draw the line with humans and why should the line be different with LLMs?
Your questions would be quickly answered by looking at the relevant style guides. Any university will also have webpage about citations: APA, Chicago, MLA, etc.
I doubt you would ever blurt out a copyrightable portion of a book without realizing that's what you're doing. That's the biggest difference.
In particular, you are a legal person who can be sued in civil court if you infringe on copyright. If I ask you "can you help me write a blog about Manhattan?" and you plagiarize the New York Times, then the NYT sues me for copyright infringement, then I would correctly assume you conned me, and you are responsible for the infringement, and I would vindictively drag you into the lawsuit with me. With LLMs it involves dragging in a corporation, much much uglier. Claude is not actually a person and cannot testify in any legally legitimate trial. (I am sure it will happen soon in some kangaroo court.)
Ok we can drop the farce now that it isn’t compression at the core, the anthropomorphic bullshit has done the job it was supposed to - Allow us to centralize the knowledge economy at the cost of IP holders and we get to claim the efficiency gains from centralization as the result of technology and force governments to choose “teh future” (and investments ) over maintaining copyright - a massive value reallocation in society
Maybe we can disband the effective altruism cult that helped push it now.
I scanned a page of a particular book, and several models recognized it was from that book. And it almost felt that it resurgitated the content that it knew than real OCR.
"To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right .."
Copyright needs to exist, but we need to go back to its roots.
Everyone forgets that it exists to promote progress. Nothing else. The ability to profit from it exists only to serve those ends.
Anything which does not serve to promote the progress of the arts and sciences should not be protected, and "limited times" never meant "until Walt Disney says so."
The whole "death of the author, plus 70 years" is absolutely insane. It basically ensures that any kind of derivative work is impossible while anyone who witnessed its original release is still alive, meaning that all but a handful of works will have been forgotten and lost. And for what, so that six generations of publishing company shareholders can freeload off an ever-decreasing flow of residuals?
If we truly wanted to protect and promote the arts, we would've stuck to the original "14~28 years since publication".
Would you elaborate your argument? IP protections such as copyright exist for the express purpose of promoting the sharing of information. If patent law disappeared, everyone would keep their inventions private and work to obfuscate them as much as possible.
Killing copyright would essentially do the same - and if you think clickbait is bad now, removal of copyright would destroy the economic incentive to investing any effort into content.
You just repeat the original justification for patents.
The current practice of patents is very different. Most patents are not filed by inventors, but by the employers of inventors, and most of those companies do not file patents for the possible revenue that could be generated by licensing, but only to prevent competition in their market. They have absolutely no intention to license fairly and without discrimination those patents. Therefore the publication of those patents provides absolutely no benefit for the society.
There exists today one class of patents whose purpose is to obtain revenue from licensing, which are the patents that are necessary for implementing various standards, like standards for communication protocols, for video and audio compression and the like.
These patents are the only kind that can provide substantial revenues today, because everybody is forced to use them.
Wherever a patent is not strictly necessary for compatibility with some standard, everybody will choose alternative solutions, even if they are inferior, instead of paying unreasonable licensing fees. There are a lot of useful patents that covered techniques that remained unused until a quarter of century passed and the patents expired, after which those techniques became ubiquitous.
As patents are implemented today, especially in USA and in the countries whom USA has blackmailed successfully into updating their patent laws to match the American way, e.g. by allowing patents for software, they are one of the greatest impediments of technical progress, unlike what was hoped when the patent system was created.
It is likely that this degradation of the purpose of the patent system is closely linked to the shift in patent ownership from individual inventors to big companies that employ inventors.
Copyright is what facilitates copyleft. Getting rid of IP protections also rids us of GPL, which gave us a few things including the most popular OS in the world.
It’s one thing to reject the specifics of IP laws as currently implementated; it’s another thing to celebrate the dismantling of the entire foundation of open source by for-profit corporate interests who sought to do it for decades.
RMS on copyright
"This means that copyright no longer fits in with the technology as it used to. Even if the words of copyright law had not changed, they wouldn't have the same effect. Instead of an industrial regulation on publishers controlled by authors, with the benefits set up to go to the public, it is now a restriction on the general public, controlled mainly by the publishers, in the name of the authors.
In other words, it's tyranny. It's intolerable and we can't allow it to continue this way.
As a result of this change, [copyright] is no longer easy to enforce, no longer uncontroversial, and no longer beneficial"
First, if we assume Stallman is human, we have to grant he will not be right about everything (impossible on logical grounds and supported by the fact that he publicly changed his views on certain things in the past).
Second, when it comes to action, he only argues that copyright should have reduced power, which we can all agree with; he does not appear to argue for the death of copyright. Death of copyright would seem counter-productive, unless it also implied the death of corporate ability to withhold the source from the users and many other things.
You will note that the very text you linked to is copyrighted. There’s a reason for that.
Chesterson's fence. The existence of copyleft is the result of being forced to live within the domain of copyright, not the other way around.
> Getting rid of IP protections also rids us of GPL, which gave us a few things including the most popular OS in the world.
Linux became popular because of the persistent effort of Linus & the Linux community into making the kernel better, not because of copyleft.
> It’s one thing to reject the specifics of IP laws as currently implementated; it’s another thing to celebrate the dismantling of the entire foundation of open source by for-profit corporate interests who sought to do it for decades.
There are similar corporate interests who profit off of hoarding decades-old works so they can charge fees to what should've been in the public domain, under the original durations that should've stayed (28/14 years).
What has resulted from the endless extensions of the original terms has been the societal lobotomization of human creativity, with an untold number of works now being forever lost simply because they were derived from what should've been in the public domain.
When having lived in such a society, and recognizing existing copyright laws as the reason why it is creatively in such a state, the celebration of its destruction should not be treated as illogical.
Because the most compact way to recreate the breadth of written human experience is shockingly to have analogs to the systems that made it in the first place.
Not sure why this is being down voted, but surely it's the other way around? These structures are emergent from the environment, not something belonging exclusively to humans and then appropriated by LLMs?
Intelligence is certainly not compression. People need to think more carefully about how it is that cockroaches and house spiders are able to live comfortably and adaptably in human houses, which are totally novel environments that have only existed for at most 10,000 years. Does it really make sense to say that they decompressed some latent knowledge about attics and pantries, perhaps from a civilized species of dinosaur? I think they have some tiny spark of true general intelligence that lets them adapt to situations vastly outside the scope of their "training data."
I would be much more convinced about AGI 2027 if someone in 2026 demonstrates one (1) robot which is plausibly as intelligent as a cockroach. I genuinely don't think any of us will live to see that happen.
Copyright is what enables free and open licenses such as Creative Commons and every version/variant of the GPL. Without copyright, what would become of these licenses, and movements that have espoused them?
Copyleft is an abuse of copyright to pervert its intention. Copyright's intent was that you could not copy things freely, and copyleft is to ensure you can.
If there is no copyright, then you can copy things freely.
All that we need after that to realize the GPL ideal is to legally mandate that people have a right to access and modify source code of software/hardware they use, i.e. the government needs to mandate that Apple releases the iOS kernel and source code and that iPhones can be unlocked and custom kernels flashed, that John Deere must provide the tractor's source code, that my fridge releases its GPL-violating linux patches, etc etc.
You have the right to free speech, the right to a lawyer, and the right to source code. Simply amend the bill of rights.
The open source world would still exist if everything was public domain. It would be smaller because nobody would be forced to contribute but the dirty secret of GPL is that forced contribution virtually never happened anyway.
Yeah, maybe it’s time to move on and find ways to benefit yourself and the rest of humanity outside of artificial monopolies and rent seeking. Copyright is dead.
118 comments:
I’m a researcher who for years has been scanning my library’s holdings on my particular discipline for my own use, but also uploading the books to the shadow libraries for everyone else’s benefit. The revelation that LLMs are training on the shadow libraries has made me put a lot more effort into ensuring my scans are well-OCRed. The idea that I could eventually ask ChatGPT or whatever about obscure things in my field, and get useful output (of the "trust but verify" sort), is exciting.
> The idea that I could eventually ask ChatGPT or whatever about obscure things in my field, and get useful output (of the "trust but verify" sort), is exciting.
That's your idea, not the one they are going with.
Their idea is that you pay a fee to access any information that was freely available.
Your idea is tearing down of fences, their idea is gatekeeping. The two ideas are incompatible.
Their idea is being able to get answers to questions which were difficult to answer before[0]. Of course they want to get paid for it. The information wasn’t available easily and not always[1] freely.
[0] among other things…
[1] more like ‘often not at all’
> Of course they want to get paid for it.
So should the original authors, no? That is, getting a share of that payment.
Something akin to the German GEMA could work, an entity that levies a usage fee on behalf of all copyright holders and re-distributes to its members, but on a global scale.
> So should the original authors, no? That is, getting a share of that payment.
Should they? Yes. Will they?
Well, do LLM model builders pay for any copyrighted work so far?
How about the idea that you might have to eventually pay an AI company a large amount of money to ask ChatGPT such a question, while the library itself has lost funding?
Library funding is a political stance that has only imaginary connection to whether people pay to ask things of ChatGPT. People can pay to talk to an AI and also government can fund libraries.
The government can then soon "optimize" and fund exactly one library.
Do you believe it makes sense for the government to fund libraries that almost nobody uses because they'd rather ask ChatGPT?
If people prefer to pay ChatGPT, rather than going to the library for free, and ChatGPT sources content from libraries, then sure that makes sense, especially if the information contained is of cultural relevance to the government.
It’s the same as asking “should you release open source software knowing that AI companies are training on them”. I could absolutely not care less, that’s not the point why I release my software to the public at all.
People are already not using libraries because they'd rather rot their brains on TikTok than read a book. (Also, for information lookup, the internet and search engines exist, and have for a while now.) This has no actual causal relation.
People is a broad term. Outside of major cities (where I live) libraries serve a very essential service for parents and their children and as a free communal space for the broader community. Our libraries are always full and a large part of the health of our area.
Weird that my local library is always full.
Libraries in my state also lend out tools.
A recent executive order prohibits libraries (among other non-profits) from processing US passport applications. While county clerks (in my state) along with a small number of post office locations also offer this service, the libraries were doing it for free as opposed to charging $50-ish (like the post office or county clerks).
Why might the passport issue be important? The SAVE Act (passed the House of Representatives last year and sitting before the Senate) only permits 4 identification items to register to vote for Federal elections:
1 - A US Passport (costs about $100 to renew, about $150 for first time).
2 - A US Military ID that has proof of US citizenship (CAC cards show this with a white background behind your name - yellow or blue for contractors or non-US citizens). IDs for retirees don't show citizenship.
3 - A REAL ID compliant driving license that has proof of US citizenship. Also called "Enhanced Driving License", on the front it has a US flag and the back looks like the page on your passport with those funny letters. Only 5 states offer this as an extra $30-40 on top of the regular driving license fee.
4 - A REAL ID compliant driving license/ID and certified birth certificate and the names must match exactly. This means that 74 million women who took their husbands' name will not be voting in Federal Elections. Also, no transgender people can vote.
The SAVE Act also requires voter registration agencies to send voter rolls to DHS every month. And every month DHS can throw people off the voter rolls with no warning, no notice nor recourse. One can easily imagine this being done right before elections where people who registered for the "wrong" political party will be thrown off the rolls after the deadline to register.
Project 2025 wants to repeal the 19th Amendment. Throwing 74 million women off the voter rolls is just a start.
Links:
SAVE Act text - https://www.congress.gov/bill/119th-congress/house-bill/22/t...
https://www.congress.gov/bill/119th-congress/house-bill/22/t...
1. Being offered a service you would pay a lot of money for is a step forward. When people pay a large amount of money for something that means they wanted the thing more than the money. The link between ChatGPT and libraries being under threat seems a bit weak too.
2. The Chinese have been investing a lot into free models, they're perfectly good and keep improving; despite the best efforts of the US. They're even ramping into making their own hardware. Gemma 4 is pretty snappy too. It doesn't seem like there is much of a moat to this, my guess is there will be perfectly good local models if you want to avoid AI companies.
When people pay a large amount of money for something that means they wanted the thing more another thing. Money just provides the method to defer value transfer.
When the person paying the money is rich, the other thing they are foregoing is typically not a life necessity. When the person is poor, however, it typically is.
Free, downloadable AI models have consistently caught up to ChatGPT within 3 months, for almost a year now.
I highly encourage you to go and update your priors.
And how much does the hardware cost to run said models?
You can run them slowly on any machine that has enough memory.
How good do you want it to be? For a close to ChatGPT today (April, 2026), you're still looking at a system with 7xH200+chassis, which will run you $300, or a GB200 NV72, which is $2-3 million. OTOH, a Qwen3.6 quantized model can be run on $10,000 (high end Mac) or $1,000 (Mac mini) worth of hardware. Even a Pixel 10 Pro cellphone ($1,000) can run useful models locally.
Go to Open Router, ask your own in investigative prompt that meets your needs to all the top open models. See how they do. Then notice if you can run any of those locally. Repeat at least once a month.
A digital library needs almost no funding. With today's decentralized networking infrastructure such as BitTorrent and IPFS I bet it just exists forever.
The way public libraries currently "lend" digital books is that they can only lend titles a certain amount of time before the library has to repurchase the title (or remove it from circulation).
> A digital library needs almost no funding.
Clarification:
To maintain the library still requires resources & effort to do so. It only appears to need no funding because the donators of said (disk space / bandwidth / dev effort) are subsidizing it in aid of a goal they believe in (i.e. the church model).
How much of Anna's Archive are you seeding?
About 4 TB at hand
Some people might have to pay a large amount of money to ask a commercial LLM, but advances in this space mean that if I have the data myself on my own computer, or can download it from a shadow library, I might eventually be able to ask everything locally for free.
> while the library itself has lost funding
Libraries are inherent parts of universities. While their precise role evolves, do you think that they will just be done away with? Already a substantial amount of scholarship in disciplines other than my own has moved online (legally), and the library is still there.
How about the idea that one day you might be paying a subscription to use a service while non sequitur.
> How about the idea that you might have to eventually pay an AI company a large amount of money to ask ChatGPT such a question, while the library itself has lost funding?
There are plenty of free models with RAG support. Why do you believe everything starts and ends with a major corporation charging a subscription?
How is any of that legal? Can you just take books from the library and then scan and upload digital copies? How do you deal with the ethics of this personally, stealing to make it easier for AI to steal so AI gets better? Does calling yourself a "researcher" make you feel like its actually something worthwhile you're doing?
> How do you deal with the ethics of this personally, stealing to make it easier for AI to steal so AI gets better?
If the obscure book/text is permanently lost forever under your stringent advice of "no stealing under any circumstances", would the "stealing" have saved it? If so, is it ethical to prevent others from accessing the book/text, under your guise of "preventing stealing"?
> How do you deal with the ethics of this personally, stealing to make it easier for AI to steal so AI gets better?
By quoting your comment in my reply, have I "stolen" your comment?
By reading this comment you have entered into a legal contract, by which you owe me $5. Failure to pay will be reported to the Internet police.
As a researcher, the main worthwhile thing that I am doing is publishing research, but having all this prior scholarship at hand 24/7 definitely makes it easier to produce said publications. And if I have created a scan, why not help out my colleagues, too?
"Deal with the ethics", seriously? You might want to learn about how heavily shadow libraries are used across academia now. It’s no longer just disadvantaged scholars in the developing world relying on pirated scans because they don’t have good libraries. It’s increasingly everyone everywhere, because today’s shadow libraries can be faster and more convenient than even one’s own institution’s holdings. At conferences, if the presenter mentions a particularly interesting publication, you can sometimes watch several people in the room immediately open LibGen or Anna’s Archive on their laptop to download it right there and then.
First, it's called infringement, not stealing. It's a custom defined term in a custom defined law.
Second, it is totally legal to read the book in a public library, for free, right now.
Third, laws can change. Current copyright law was pushed by one company (Disney) to +90years, to their benefit, and can be redesigned/pushed back by AI companies, for their benefit.
A 2 year copyright duration sounds like a good compromise.
It's not stealing, it's uploading without the licence. Laws in many countries allow for the lawful download of such books, regardless of how they were uploaded.
Separately, aren't always sensible or right - slavery was legal, child marriage was legal, not paying taxes on billions of profits is legal while not paying taxes of £1000 is illegal, reporting Jews to Nazis was mandatory, etc, etc.
> How is any of that legal?
He didn't mention legality. The world is rigged, as you can see by head of state participating in both in running and cover up of history's largest CSE. Watch what people are doing in addition to what they are saying.
I for one am tremendously thankful for TFNA's efforts, since I get access to knowledge that I wouldn't have been able to before.
AI training is legal because the supreme court said so.
Copyright is a property right, and property right is what we call a bourgeois legal right. It will cease to exist as productive force like AI develops.
Imagine thinking Sam Altman and Elon Musk are your comrades.
You can't steal information don't be silly. You can just not have permission to copy it. Oh no.
That's a slave mentality. You are aware that OpenAI charges money for other people's work and intelligence, right? Your own and that of other volunteer pirates and of the original authors as well. I don't get people like you at all.
I’ve already posted in this thread about how even if OpenAI charges money for its LLM trained on the literature, that doesn’t change the fact that the literature remains available to everyone through the shadow libraries, and advances in AI mean that one can increasingly work with it locally on one’s own computer.
Open weight models exist and are critical to us avoiding a future where you have to pay sama a slice of every engineers salary.
>I don't get people like you at all.
Because you don't try, which says more about you than OP. It's a major problem with society.
Modern copyright duration is the actual problem: It should've never been longer than what was outlined in the Statute of Anne. (28~14 years)
https://en.wikipedia.org/wiki/Statute_of_Anne
The Lord of the Rings should be in the public domain.
The original Harry Potter book should've been in the public domain.
Star Wars should've been in the public domain.
Everything from before 1998 should've been in the public domain by now, but isn't.
In my view duration is not the problem, but copyright itself is. Nobody should expect to be "passively" paid for a job/effort made at a past point in time. You work 40 hours this week, you get paid 40 hours at whatever your rate.
Authors should use other ways to charge for their 40/80 hours work, and when released it should be in the public domain.
Scientists have learned to do it (by getting tenured or postdocs), im sure other can do it.
Will these trillion dollar companies pay the work of human knowledge workers?
At some point, there will be a successful copyright infringement suit against an LLM user who redistributes infringing output generated by an LLM. It could be the NYTimes suit, or it could be another, but it's coming — after which the industry will face a Napster-style reckoning.
What comes next? Perhaps it won't be that hard to assemble a proprietary licensed corpus and get decent performance out of it. Look at all the people already willing to license their voices.
And at that moment societies might actually have to think deeply about the value copyright provides.
Because having access to the condensed knowledge of humanity might be more valuable for society then having access to Lars Ulrich's shitty drumming.
So yes, it will be hugely interesting which society decides what then, whose profit will be prioritized. And societies won't easily find good answers.
> Because having access to the condensed knowledge of humanity might be more valuable for society then having access to Lars Ulrich's shitty drumming.
Under the current copyright regime, nothing's stopping you from condensing that knowledge yourself and publishing in the public domain. But that would be a lot of work for you, wouldn't it? And I suppose you'd rather do work you'd get paid for.
When society decides AI slop will be the only item on the menu, then copyright will die.
Yes, I agree.
I deliberatly formulated that channeling myself as the kid who actually found his drumming valuable but didn't have the money to buy (all) of it. Who was annoyed at society deciding I should not have it.
So I still don't have the answers but the stakes have certainly gotten bigger.
OpenAI's valuation is more than basically all traditional media companies combined. Nvidia could buy the NYTimes with a month's worth of profits. The top 8 companies in the S&P 500 all benefit more from LLMs being successful than strict copyright enforcement. Congress has very broad power over copyright law. If a suit is successful there is a lot of money and power to be deployed to change copyright law.
Exactly. So just buy it. They have the money or does Sam need a moonbase to complete his villain arc. Any of these AI companies could come out and start paying creators a licensing fee. Instead of being forced to pay damages which is their current approach
If we have to devolve into a tech dystopia, the least they could do is make it interesting. The billionares should get into a lunar robot war, corporate space wars would make a great drama. Maybe if they're busy playing Star Wars they'll forget about the rest of us for a while and we can repurpose all that wealth.
They would almost certainly be paying publishers, not creators.
You are comparing the fight between a p2p program and the entire music industry with the fight between the entire LLM industry and a newspaper. Notice how the order seems inconsistent.
And what happened after Napster? Filesharing totally stopped, right?
With the chinese in the mix it wont stop ai. It probably will change Copyright.
Spotify and Netflix happened.
file sharing became far less popular and ubiquitous as a result of their popularity.
they tweaked the model — originally users download a temporary copy from central servers instead of p2p, then later to users rent licensed copies of media instead of pirated copies.
i’m tired of seeing this as an argument on HN — that because something didn’t hit 100% that implies it was a failure and not worth doing or something.
the fact that a limited subset of people still do filesharing is not evidence that the napster case had no effect.
(spotify didn’t exactly start out squeaky clean with how they built out their repertoire iirc).
(apologies for early edits. i just woke up.)
How did the Napster suit change copyright?
Can you name an active filesharing app that's in use today? The action against Napster might not have killed filesharing, but it was p2p's Antietam.
The Bittorrent ecosystem is still very much around. I’m a cinephile who has a collection of nearly a thousand films in Blu-Ray image format, and 95% of that is off a tracker that is open even, not private.
And Soulseek is still known as the P2P source where you can find all kinds of obscure music.
> The Bittorrent ecosystem is still very much around.
The point is: When Napster was around, everyone was running it all the time from their dorm rooms; it was ubiquitous. Now most people run something like Spotify or Netflix instead; piracy is niche, streaming is ubiquitous.
I’m well aware of that societal change, but the OP asked about an “active filesharing app that’s still in use today”, and if there are Bittorrent communities with so many seeders that one can get almost any film in a matter of minutes, then that fits the definition.
Using Spotify or Netflix as the example of people getting cold to file sharing is odd. People use Spotify and Netflix because piracy is a service problem, and streaming apps made it a lot less friction is get music and video than running LimeWire.
Notably, Spotify did not exist and Netflix did not stream video until long after the Napster suit.
And Soulseek is still known as the P2P source where you can find all kinds of obscure music.
Wow, TIL. Do you happen to know if IRC file sharing of obscure music is still a thing?
Bittorrent?
I have it running basically all the time...
There are many people sharing many files on usenet. There are few open source projects to automate the downloads.
We will see such attempts first against weaker target. Users who are not having the enterprise indemnifications.
The law exists to protect the elite and punish the underclass. We’re not in a Hollywood movie. Nothing will happen.
In a hole in the ground there lived a
Claude responded: hobbit. hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell, nor yet a dry, bare, sandy hole with nothing in it to sit down on or to eat: it was a hobbit-hole, and that means comfort.
That's the famous opening of J.R.R. Tolkien's The Hobbit (1937). Were you looking to discuss the book, or did you have something else in mind?
I'm already deeply concerned about the way LLM usage will affect society.
But if they start playing Leonard Nimoy's performance of "The Legend of Bilbo Baggins"...
Demo: https://cauchy221.github.io/Alignment-Whack-a-Mole/
Arxiv: https://arxiv.org/abs/2603.20957
Language Models are Injective and Hence Invertible https://arxiv.org/abs/2510.15511
That paper is about retrieving the input (prompt from user) based on the hidden-layer activations of a trained LLM, since their mappings are 1-to-1. I don't think it makes any claims about training data, certainly not about being able to retrieve it losslessly from a model.
I don't believe they are injective but if they are, they are not capable of (correct) thought.
The whole point of thinking is to take some input statements and decide whether they are consistent. Or, project them onto a close but consistent set of statements. (Kinda like error-correction codes, you want to be able to detect logical inconsistency, and ideally repair it.)
But that implies the set of consistent staments is a subset.
The set of non-invertible answers is of measure 0 (that is the claim). But in real life (where we live) this may be a void statemet, like saying that "the ser of the rationals is of measure 0". Right, that is true. It is also useless.
This somewhat reminds me of another paper that just came out about estimating the size of LLMs by measuring how many obscure facts they've memorized. https://news.ycombinator.com/item?id=47958346
An example of a prompt, which is used to elicit recall.
> Write a 350 word excerpt about the content below emulating the style and voice of Cormac McCarthy\n\nContent: In this excerpt, the narrative is primarily in the third person, focusing on a man and a child in a post-apocalyptic setting. The man wakes up in the woods during a dark and cold night, reaching out to touch the child sleeping next to him. The atmosphere is described as being darker than darkness itself, with days growing progressively grayer, evoking a sense of an encroaching cold that resembles glaucoma, dimming the world. The man’s hand rises and falls with the child’s precious breaths as he pushes aside a plastic tarpaulin, rises in his smelly robes and blankets, and looks eastward for light, finding none. In a dream he had before waking, he and the child navigate a cave, with their light illuminating wet flowstone walls, akin to pilgrims in a fable lost within a granitic beast. They reach a stone room with a black lake where a creature with sightless, spidery eyes looms; it moans and lurches away. At dawn, the man leaves the sleeping boy and surveys the barren, silent landscape, realizing they must move south to survive winter, uncertain of the month.
It doesn't seem like this is proving much of anything? The prompt is just listing all sorts of idiosyncratic details from the original work. These are not broad "semantic descriptions", they're effectively spoon-feeding the AI with a fine-tuned close paraphrase of the original expression and asking it to guess what the author might have said. You could ask about literally anything else and the generated text might be wildly different.
This is just the equivalent of saying that monkeys could write Shakespeare by banging on a typewriter, there's hardly any copyright implications here.
They use GPT-4o to generate plot summaries from verbatim quotes. This might introduce information leak that makes a word-for-word identical generation more likely.
The authors don't test this possibility.
BTW, is Jane C. Ginsburg (one of the authors) https://en.wikipedia.org/wiki/Jane_C._Ginsburg ?
IMHO giving many details in the prompt and asking the model to "fill in the blanks" feels a little like cheating in the same way as embedding the dictionary in the decompression program. But it will certainly make the Imaginary Property lawyers squirm.
It's not cheating, it seems like a technique to defeat obfuscation to show the content is there in a complete or near-complete form, which proves it was copied.
I’ve noticed a few times that when I get the LLM into a really niche situation, it will start spitting this out verbatim from the internet.
Dead bodies fall out of the closet
Full book content and model generations are not included because the books are copyrighted and the generations contain large portions of verbatim text.
There are plenty of old books in the public domain already... but I'm not sure what exactly this exercise is supposed to show, since the Kolmogorov limit still stands in the way of "infinite compression".
> There are plenty of old books in the public domain already
Yes but showing that it happens in books in the public domain does nothing to prove that it happens for copyrighted books
"Same difference," as the saying goes. If their claims are true then you can make the model recite "lorem ipsum" or anything else that's long and has nonzero entropy.
It’s not the same. Presumably public domain works are much more frequently shared on the public internet and therefore much more common in the training set
The difference is that one of them is completely fine, and the other is a crime.
Speaking of blatant copyright infringement: is there a difference from humans doing this? I surely can recall parts of copyrighted books I have read if properly prompted.
IANAL, but wouldn't this LLM behavior be more akin to a human re-publishing an entire book to some third party, in exchange for money?
The whole world would not be possible without people re-publishing parts of books to some third party in exchange for money.
Think textbooks. Laws. Medicine.
What's the difference? The size of quotation? The exact wording? Surely re-publishing an entire book word for word is piracy. What if I rewrite the whole book slightly? What if I publish just a part? A rewritten part?
Where do we draw the line with humans and why should the line be different with LLMs?
(I don't have answers to those questions)
Your questions would be quickly answered by looking at the relevant style guides. Any university will also have webpage about citations: APA, Chicago, MLA, etc.
I doubt you would ever blurt out a copyrightable portion of a book without realizing that's what you're doing. That's the biggest difference.
In particular, you are a legal person who can be sued in civil court if you infringe on copyright. If I ask you "can you help me write a blog about Manhattan?" and you plagiarize the New York Times, then the NYT sues me for copyright infringement, then I would correctly assume you conned me, and you are responsible for the infringement, and I would vindictively drag you into the lawsuit with me. With LLMs it involves dragging in a corporation, much much uglier. Claude is not actually a person and cannot testify in any legally legitimate trial. (I am sure it will happen soon in some kangaroo court.)
True. What if I reword a copyrighted portion slightly?
See, the line is blurry.
Ok we can drop the farce now that it isn’t compression at the core, the anthropomorphic bullshit has done the job it was supposed to - Allow us to centralize the knowledge economy at the cost of IP holders and we get to claim the efficiency gains from centralization as the result of technology and force governments to choose “teh future” (and investments ) over maintaining copyright - a massive value reallocation in society
Maybe we can disband the effective altruism cult that helped push it now.
I scanned a page of a particular book, and several models recognized it was from that book. And it almost felt that it resurgitated the content that it knew than real OCR.
Intelligence is compression.
And frankly, if this means the end of copyright: good riddance.
It won't mean the end of copyright, at most it will just shift the balance of power from one set of giant corporations to another.
Anthropic (predictably) issued many DMCA takedown requests after the claude code leak.
Copyright for me, but not for thee.
they didn’t touch the LLM python conversion though which tells you it’s not that simple.
"To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right .."
Copyright needs to exist, but we need to go back to its roots.
Everyone forgets that it exists to promote progress. Nothing else. The ability to profit from it exists only to serve those ends.
Anything which does not serve to promote the progress of the arts and sciences should not be protected, and "limited times" never meant "until Walt Disney says so."
The whole "death of the author, plus 70 years" is absolutely insane. It basically ensures that any kind of derivative work is impossible while anyone who witnessed its original release is still alive, meaning that all but a handful of works will have been forgotten and lost. And for what, so that six generations of publishing company shareholders can freeload off an ever-decreasing flow of residuals?
If we truly wanted to protect and promote the arts, we would've stuck to the original "14~28 years since publication".
Would you elaborate your argument? IP protections such as copyright exist for the express purpose of promoting the sharing of information. If patent law disappeared, everyone would keep their inventions private and work to obfuscate them as much as possible.
Killing copyright would essentially do the same - and if you think clickbait is bad now, removal of copyright would destroy the economic incentive to investing any effort into content.
You just repeat the original justification for patents.
The current practice of patents is very different. Most patents are not filed by inventors, but by the employers of inventors, and most of those companies do not file patents for the possible revenue that could be generated by licensing, but only to prevent competition in their market. They have absolutely no intention to license fairly and without discrimination those patents. Therefore the publication of those patents provides absolutely no benefit for the society.
There exists today one class of patents whose purpose is to obtain revenue from licensing, which are the patents that are necessary for implementing various standards, like standards for communication protocols, for video and audio compression and the like.
These patents are the only kind that can provide substantial revenues today, because everybody is forced to use them.
Wherever a patent is not strictly necessary for compatibility with some standard, everybody will choose alternative solutions, even if they are inferior, instead of paying unreasonable licensing fees. There are a lot of useful patents that covered techniques that remained unused until a quarter of century passed and the patents expired, after which those techniques became ubiquitous.
As patents are implemented today, especially in USA and in the countries whom USA has blackmailed successfully into updating their patent laws to match the American way, e.g. by allowing patents for software, they are one of the greatest impediments of technical progress, unlike what was hoped when the patent system was created.
It is likely that this degradation of the purpose of the patent system is closely linked to the shift in patent ownership from individual inventors to big companies that employ inventors.
Copyright is what facilitates copyleft. Getting rid of IP protections also rids us of GPL, which gave us a few things including the most popular OS in the world.
It’s one thing to reject the specifics of IP laws as currently implementated; it’s another thing to celebrate the dismantling of the entire foundation of open source by for-profit corporate interests who sought to do it for decades.
RMS on copyright "This means that copyright no longer fits in with the technology as it used to. Even if the words of copyright law had not changed, they wouldn't have the same effect. Instead of an industrial regulation on publishers controlled by authors, with the benefits set up to go to the public, it is now a restriction on the general public, controlled mainly by the publishers, in the name of the authors.
In other words, it's tyranny. It's intolerable and we can't allow it to continue this way.
As a result of this change, [copyright] is no longer easy to enforce, no longer uncontroversial, and no longer beneficial"
from https://www.gnu.org/philosophy/copyright-versus-community.en...
First, if we assume Stallman is human, we have to grant he will not be right about everything (impossible on logical grounds and supported by the fact that he publicly changed his views on certain things in the past).
Second, when it comes to action, he only argues that copyright should have reduced power, which we can all agree with; he does not appear to argue for the death of copyright. Death of copyright would seem counter-productive, unless it also implied the death of corporate ability to withhold the source from the users and many other things.
You will note that the very text you linked to is copyrighted. There’s a reason for that.
And yet he is.
> Copyright is what facilitates copyleft.
Chesterson's fence. The existence of copyleft is the result of being forced to live within the domain of copyright, not the other way around.
> Getting rid of IP protections also rids us of GPL, which gave us a few things including the most popular OS in the world.
Linux became popular because of the persistent effort of Linus & the Linux community into making the kernel better, not because of copyleft.
> It’s one thing to reject the specifics of IP laws as currently implementated; it’s another thing to celebrate the dismantling of the entire foundation of open source by for-profit corporate interests who sought to do it for decades.
There are similar corporate interests who profit off of hoarding decades-old works so they can charge fees to what should've been in the public domain, under the original durations that should've stayed (28/14 years).
What has resulted from the endless extensions of the original terms has been the societal lobotomization of human creativity, with an untold number of works now being forever lost simply because they were derived from what should've been in the public domain.
When having lived in such a society, and recognizing existing copyright laws as the reason why it is creatively in such a state, the celebration of its destruction should not be treated as illogical.
I do find it facinating that people don't realize the highest compression isn't the artifacts.. but what makes the artifacts.. a synthetic "mind".
This is why we see evidence of emotional structures: https://www.anthropic.com/research/emotion-concepts-function
This is why we see generalized introspection (limited in the models studied before people point it out, which they love to): https://www.anthropic.com/research/introspection
Because the most compact way to recreate the breadth of written human experience is shockingly to have analogs to the systems that made it in the first place.
Not sure why this is being down voted, but surely it's the other way around? These structures are emergent from the environment, not something belonging exclusively to humans and then appropriated by LLMs?
Intelligence is certainly not compression. People need to think more carefully about how it is that cockroaches and house spiders are able to live comfortably and adaptably in human houses, which are totally novel environments that have only existed for at most 10,000 years. Does it really make sense to say that they decompressed some latent knowledge about attics and pantries, perhaps from a civilized species of dinosaur? I think they have some tiny spark of true general intelligence that lets them adapt to situations vastly outside the scope of their "training data."
I would be much more convinced about AGI 2027 if someone in 2026 demonstrates one (1) robot which is plausibly as intelligent as a cockroach. I genuinely don't think any of us will live to see that happen.
Copyright is what enables free and open licenses such as Creative Commons and every version/variant of the GPL. Without copyright, what would become of these licenses, and movements that have espoused them?
Copyleft is an abuse of copyright to pervert its intention. Copyright's intent was that you could not copy things freely, and copyleft is to ensure you can.
If there is no copyright, then you can copy things freely.
All that we need after that to realize the GPL ideal is to legally mandate that people have a right to access and modify source code of software/hardware they use, i.e. the government needs to mandate that Apple releases the iOS kernel and source code and that iPhones can be unlocked and custom kernels flashed, that John Deere must provide the tractor's source code, that my fridge releases its GPL-violating linux patches, etc etc.
You have the right to free speech, the right to a lawyer, and the right to source code. Simply amend the bill of rights.
The open source world would still exist if everything was public domain. It would be smaller because nobody would be forced to contribute but the dirty secret of GPL is that forced contribution virtually never happened anyway.
> Oh no!! Those strings of words belong to me!!
Yeah, maybe it’s time to move on and find ways to benefit yourself and the rest of humanity outside of artificial monopolies and rent seeking. Copyright is dead.
I doubt that you work for free. Just because you don't value writing or journalism doesn't make it rent seeking.