I am of the opinion that LLMs do not, at all, belong in something as critical or as monumentally complicated as the kernel of an operating system. IMO that is just as stupid as using an LLM in a cryptographic library/implementation. There are, I believe, far, far more ways that that could go wrong than go right. Especially considering that a kernel is pretty much God when it comes to a computer: there is very little a kernel can't do. I'm sure some will argue that it "might" boost productivity (whatever that means), but IMO the cost is way, way too high for that kind of risk. This is, I think, even more important when you consider all the environments the Linux kernel runs in. The shear number of ways that LLM-generated code can introduce subtle bugs that will never be caught because the hardware trusts the kernel and is rarely going to stop it from doing something is practically infinite. To an extent the risk can be mitigated, sort of. But if this goes anywhere and people start using LLMs to generate code in a file system, or network controller, or other very important subsystems, this could get really bad, really fast. I'm sure some will say "Well just don't do that!" but really, we tell people not to do a lot of things with LLMs and they do it anyway so...
Replace "LLM" with "C".
Runs
This is a very informative article, a good starting point to understand the complexities and nuances of integrating these tools into large software projects.
As one commenter notes, we seem to be heading towards a “don't ask, don't tell policy”. I do find that unfortunate, because there is great potential in sharing solutions and ideas more broadly among experienced developers.
It's a really difficult problem. I read a comment on here the other day about the increased burden on project maintainers that I sympathized with, but I wonder if the solution isn't actually just more emphasis on reputation tools for individual committers. It seems like the metric shouldn't just be "uses AI assistance" vs "doesn't", which as you note just leads to people hiding their workflow, but something more tied to "average quality of PR." I worked in finance briefly and was always really intrigued by the way responsibility worked for the bankers themselves: they could use any tools they wanted to produce results, but it had to be transparent and if someone was wrong the pretty strict burden fell on that IC personally.
The worst case for AI and OSS is a flood of vibe-coded PRs that increase bugs/burden on project maintainers; the best case is that talented but time-starved engineers are more likely to send the occasional high-quality PR as the time investment per PR decreases.
I used Claude Code to review a kernel driver patch last week. If found an issue that was staring me in the face, indeed one I would’ve expected the compiler to flag.
Yea, there will always be the pro vs anti 'thing' argument; and like always the correct answer is responsible use of 'thing'. I consider debugging to be an almost universally responsible use of LLMs.
Link to the patch and details on the issue the compiler missed that was found by the model or it didn't happened.
Sure. It was an uninitialized variable in the retry loop below. Fixed now.
https://github.com/PADL/linux/commit/b83b9619eecc02c5e95a1d3...
Look like valgrind would be able to catch that too.
There's no such thing being fixed in those diffs
I think it contains the bug (min_attempts), not the fix.
This is the fix. The bug commit was squashed. I don’t really know what I need to prove here, except that Claude Code helped identify an issue and fix it.
Compilers have been warning about unitialized variables for a very long time now. The claim was that the compiler missed that in a patch you reviewed for the kernel. The link posted does not show the review discussion nor a commit log showing that incremental fix, and so does not at all support the claim.
What about: Could you please give us links to patches before and after or any other details
(has already happened, just commenting on the comment style)
> The copyright status of LLM-generated code is of concern to many developers; if LLM-generated code ends up being subject to somebody's copyright claim, accepting it into the kernel could set the project up for a future SCO-lawsuit scenario.
Ain't that anticipatory obedience?
Yes, but two fold.
There is no reason why I can't sue every single developer to ever use an LLM and publish and/or distribute that code for AGPLv3 violations. They cannot prove to the court that their model did not use AGPLv3 code, as they did not make the model. I can also, independently, sue the creator of the model, for any model that was made outside of China.
No wonder the model makers don't want to disclose who they pirated content from.
Isn't it up to you to prove the model used AGPLv3 code, target then for them to prove they didn't?
Not inherently.
If their model reproduces enough of an AGPLv3 codebase near verbatim, and it cannot be simply handwaved away as a phonebook situation, then it is a foregone conclusion that they either ingested the codebase directly, or did so through somebody or something that did (which dooms purely synthetic models, like what Phi does).
I imagine a lot of lawyers are salivating over the chance of bankrupting big tech.
The onus is on you to prove that the code was reproduced and is used by the entity you're claiming violated copyright. Otherwise literally all tools capable of reproduction — printing presses, tape recorders, microphones, cameras, etc — would pose existential copyright risks for everyone who owns one. The tool having the capacity for reproduction doesn't mean you can blindly sue everyone who uses it: you have to show they actually violated copyright law. If the code it generated wasn't a reproduction of the code you have the IP rights for, you don't have a case.
TL;DR: you have not discovered an infinite money glitch in the legal system.
Yes! All of those things DO pose existential copyright risks if they use them to violate copyright!. We're both on the same page.
If you have a VHS deck, copy a VHS tape, then start handing out copies of it, I pick up a copy of it from you, and then see, lo and behold, it contains my copyrighted work, I have sufficient proof to sue you and most likely win.
If you train an LLM on pirated works, then start handing out copies of that LLM, I pick up a copy of it, and ask it to reproduce my work, and it can do so, even partially, I have sufficient proof to sue you and most likely win.
Technically, even involving "which license" is a bit moot, AGPLv3 or not, its a copyright violation to reproduce the work without license. GPL just makes the problem worse for them: anything involving any flavor of GPLv3 can end up snowballing with major GPL rightsholders enforcing the GPLv3 curing clause, as they will most likely also be able to convince the LLM to reproduce their works as well.
The real TL;DR is: they have not discovered an infinite money glitch. They must play by the same rules everyone else does, and they are not warning their users of the risk of using these.
BTW, if I was wrong about this, (IANAL after all), then so are the legal departments at companies across the world. Virtually all of them won't allow AGPLv3 programs in the door just because of the legal risk, and many of them won't allow the use of LLMs with the current state of the legal landscape.
I think you are confused about how LLMs train and store information. These models aren't archives of code and text, they are surprisingly small, especially relative to the training dataset.
A recent anthropic lawsuit decision also reaffirms that training on copyright is not a violation of copyright.[1]
However outputting copyright still would be a violation, the same as a person doing it.
Most artists can draw a batman symbol. Copyright means they can't monetize that ability. It doesn't mean they can't look at bat symbols.
[1]https://www.npr.org/2025/06/25/nx-s1-5445242/federal-rules-i...
Reviewer burden is going to worsen. Luckily, LLMs are good at writing code that checks code quality.
This is not running a prompt as that’s probabilistic so doesn’t guarantee anything! This is having an agent create a self-contained check that becomes part of the codebase and runs in milliseconds. It could do anything - walk the AST of the code looking for one anti-pattern, check code conventions.. a linter on steroids.
Building and refining a library of such checks relieves maintainers’ burden and lets submitters check their own code.
I’m not just saying it - its worked super well for me. I am always adding checks to my codebase. They enforce architecture “routes are banned from directly importing the DB they must go via the service layer” or “no new dependencies”, they inspect frontend code to find all the fetch calls & href’s then flag dead API routes and unlinked pages. With informative error messages, agents can tell they’ve half finished/half assed an implementation. My favorite prompt is “keep going til the checks pass”.
What kernel reviewers do is complex - but I wonder how much can be turned into lore in this way. Refined over time to make kernel development even more foolproof as it becomes more complex.
This resonates with my experience of using LLMs to build tooling.
I have a repo with several libraries where i need error codes to be globally unique, as well as adhere to a set of prefixes attributed to each library. This was enforced by carefully reviewing any commits that touched the error code headers.
I’ve had a ticket open for years to write a tool to do this and the general idea of the tool’s architecture but never got around to implementing it.
I used the LLMs to research design alternatives (clang tools, tree sitter, etc) and eventually implement a tree sitter based python tool that: given a json config of the library prefixes, checks they all adhere and that there are no duplicate error codes within a library.
This would probably have taken me at least a few days to do on my own (or probably would just sit in the backlog forever), took about 3 hours.
The ROI on those 3 hours is immense. Runs in milliseconds. No capitalized instructions in AGENTS.md begging models to behave. And you can refine it anytime to cover more cases!
This is a great article in discussing the pros / cons in adopting LLM-generated patches in critical projects such as the kernel. Even some of the comments give their nuanced observations on this, for exammple the top comment gives an accurate assessment of the strengths and limitations of LLMs perfectly:
> LLMs are particularly effective for language-related tasks - obviously. For example, they can proof-read text, generate high-quality commit messages, or at least provide solid drafts.
> LLMs are not so strong for programming, especially when it comes to creating something totally new. They usually need very limited and specific context to work well.
The big takeaway is regardless of whoever generated the code: "...it is the human behind the patch who will ultimately be responsible for its contents." which implies they need* to understand what the code does with no regressions introduced.
It is hard to get an unbiased discussion like it would have been in the 1990s. The Linux Foundation is corporate sponsored and even funds PyTorch, with all the exuberant "AI" language:
https://www.linuxfoundation.org/blog/blog/welcoming-pytorch-...
Kernel developers work for large "AI" booster corporations and may or may not experience indirect pressure. It is encouraging that there is still dissent despite all of this. Perhaps there should by anonymous proposals and secret voting.
The Linux Foundation's "AI" policy is sketchy. It allows content generated "whole or in part using AI tools".
https://www.linuxfoundation.org/legal/generative-ai
Definitely code generated "in whole" cannot be copyrighted or put under the GPL, as discussed recently here, see the top comment for example:
I'm glad open source projects will profit from the huge productivity gains AI promises. I haven't noticed them in any of the software I use, but any day now.