logicprog

Number theorist Jared Lichtman says this AI proof is from "The Book", the highest compliment one can give. He also says:> I care deeply about this problem, and I've been thinking about it for the past 7 years. I'd frequently talk to Maynard about it in our meetings, and consulted over the years with several experts (Granville, Pomerance, Sound, Fox...) and others at Oxford and Stanford. This problem was not a question of low-visibility per-se. Rather, it seems like a proof which becomes striking

in: How to stop a data center in your backyard

bluepeter 9d ago

Holy moly this is upsetting to see on HN. If even here we're cheering on data center bans, AI is on track to become the next Concorde, or nuclear in the US. AI is the most amazing tech innovation that I've seen in my career since I started programming Perl back in 1994... Gosh, I'm gonna be gloomy for the next day.

in: Small models also found the vulnerabilities that Mythos found

johnfn 21d ago

The Anthropic writeup addresses this explicitly:> This was the most critical vulnerability we discovered in OpenBSD with Mythos Preview after a thousand runs through our scaffold. Across a thousand runs through our scaffold, the total cost was under $20,000 and found several dozen more findings. While the specific run that found the bug above cost under $50, that number only makes sense with full hindsight. Like any search process, we can't know in advance which run will succeed.Mythos scoured t

steveklabnik 24d ago

in: ML promises to be profoundly weird

As you know, I deeply respect you. Not trying to argue here, just provide my own perspective:> Why would a writer put an article online if ChatGPT will slurp it up and regurgitate it back to users without anyone ever even finding the original article?I write things for two main reasons: I feel like I have to. I need to create things. On some level, I would write stuff down even if nobody reads it (and I do do that already, with private things.) But secondly, to get my ideas out there and try to

in: Sam Altman may control our future – can he be trusted?

ronanfarrow 26d ago

As is always the case with incredibly precise and rigorously fact-checked reporting like this, where every word is chosen carefully (the initial closing meeting for this one was nearly eight hours long, with full deliberation about each sentence), there is more out there on that subject than is explicitly on the page.

gingerBill 26d ago

in: The Odin Programming Language

One of the big users of Odin at the moment is JangaFX's EmberGen, which does real-time volumetric fluid simulations for games and film. https://jangafx.com/software/embergen/Odin has aided them with a huge amount of productivity and sanity of life which other languages such as C or C++ cannot offer, such as a strong and comprehensive type system, parametric polymorphism which is a pleasure to use, the implicit context system, extensive support for custom allocators, the `using` statement, the `d

in: Olympic Committee bars transgender athletes from women’s events

scoofy 1mo ago

The issue is that “woman’s sports” is itself intentionally discriminatory. That the issue of discrimination comes up is to be expected.The idea of competitive sports exists in a framework of discrimination means that you will always have unhappy people.The good news is that sports, for the most part, is mostly symbolic, and rarely affects ones livelihood.

in: Olympic Committee bars transgender athletes from women’s events

callistocodes 1mo ago

My two cents as a transfem athlete:The attention this topic receives is disproportionate considering how rare we are, especially close to the Olympics level.Most of us do sports for fun/friends and don’t care how they rank us, but would be sad to be banned.There might be more “biological advantage” nuance with people just starting their transition, but by this many years in it feels silly. I registered as a man for the last event in case anyone might get upset, the staff changed it to say “woman

in: Olympic Committee bars transgender athletes from women’s events

citruscomputing 1mo ago

We have ceded too much ground in this debate. When I say "trans women are women" I mean that, ontologically, it is really true that trans women are a subcategory of the general class "women."Like you say, we are searching for outliers. We don't cut women that are too strong or too tall. We shouldn't cut out women that happen to be trans. If all the top levels of women's sport end up dominated by trans athletes (something I don't see occurring, and that isn't supported by the data), then good, ou

in: Epoch confirms GPT5.4 Pro solved a frontier math open problem

qnleigh 1mo ago

I might as well answer my own question, because I do think there are some coherent arguments for fundamental LLM limitations:1. LLMs are trained on human-quality data, so they will naturally learn to mimic our limitations. Their capabilities should saturate at human or maybe above-average human performance.2. LLMs do not learn from experience. They might perform as well as most humans on certain tasks, but a human who works in a certain field/code base etc. for long enough will internalize the r

in: Epoch confirms GPT5.4 Pro solved a frontier math open problem

stavros [profile] 1mo ago

You're not doing yourself a favor when you point out "but they can't do arithmetic!" as if anyone says otherwise. Yes, we all know they can't do arithmetic, and that's just how they work.I feel like I'm saying "this hammer is so cool, it's made driving nails a breeze" and people go "but it can't screw screws in! Why won't anyone talk about that! Hammers really aren't all they're cracked up to be".

in: Bombadil: Property-based testing for web UIs by Antithesis

owickstrom 1mo ago

Hey, yeah the default specification includes a set of action generators that are picked from randomly. If you write a custom spec you can define your own action generators and their weights.Rerunning things: nothing built for that yet, but I do have some design ideas. Repros are notoriously shaky in testing like this (unless run against a deterministic app, or inside Antithesis), but I think Bombadil should offer best-effort repros if it can at least detect and warn when things diverge.Shrinking

in: Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

karpathy 1mo ago

Wrong and short-sighted take given that the LLM explores serially learning along the way, and can tool use and change code arbitrarily. It seems to currently default to something resembling hyperparameter tuning in absence of more specific instructions. I briefly considered calling the project “autotune” at first but I think “autoresearch” will prove to be the significantly more appropriate name.

in: Amazon is holding a mandatory meeting about AI breaking its systems

cobolcomesback 1mo ago

It’s not false. But it’s also weaselly worded.Note that the article doesn’t say that he told staff they have to attend the meeting. It says he “asked” staff to attend the meeting. Which again, it’s really really normal for there to be an encouragement of “hey, since we just had an operational event, it would be good to prioritize attending this meeting where we discuss how to avoid operational events”.As for the second quote: senior engineers have always been required to sign off on changes from

in: Redox OS has adopted a Certificate of Origin policy and a strict no-LLM policy

menaerus 1mo ago

So you read the three-part series of blogs that are packed in details in 3 minutes after I shared the link and put yourself into a position of entitled opinion and calling my position a silly take? Sure thing.

in: Oracle is building yesterday's data centers with tomorrow's debt

reissbaker 1mo ago

I run a small open source LLM inference company, Synthetic.new. As far as I can tell, CNBC isn't reporting this accurately: the problem isn't that Oracle is building "yesterday's data centers": they're building Blackwell DCs! Those are today's DCs.The problem appears to be that Oracle is building today's DCs... Tomorrow. And by the time they come online, Vera Rubins will be out, with 5x efficiency gains. And Oracle is unlikely to want to drop the price of Blackwells 5x, despite them being 5x les

simonw 1mo ago

Right. The alternative is that we reward Dan for his 14 years of volunteer maintenance of a project... by banning him from working on anything similar under a different license for the rest of his life.

in: Ghostmd: Ghostty but for Markdown Notes

mitchellh 1mo ago

Im touched that “Ghostty but for X” is a marketing point but what does it mean in this case? I thought this might be based on the architecture I did for Ghostty. But it’s not. Or it might be full native UI, but it’s not (it’s GPUI). Not trying to be rude or unappreciative but as the creator of Ghostty here… what do you mean?

in: LLMs work best when the user defines their acceptance criteria first

Implicated 1mo ago

Not trying to be snarky, with all due respect... this is a skill issue.It's a tool. It's a wildly effective and capable tool. I don't know how or why I have such a wildly different experience than so many that describe their experiences in a similar manner... but... nearly every time I come to the same conclusion that the input determines the output.> If they implement something with a not-so-great approach, they'll keep adding workarounds or redundant code every time they run into limitations l

in: Tell HN: I'm 60 years old. Claude Code has ignited a passion again

bitwize 1mo ago

What I keep hearing is that the people who weren't very good at writing software are the ones reluctant to embrace LLMs because they are too emotionally attached to "coding" as a discipline rather than design and architecture, which are where the interesting and actually difficult work is done.

minimaxir 1mo ago

in: GPT-5.4

The marquee feature is obviously the 1M context window, compared to the ~200k other models support with maybe an extra cost for generations beyond >200k tokens. Per the pricing page, there is no additional cost for tokens beyond 200k: https://openai.com/api/pricing/Also per pricing, GPT-5.4 ($2.50/M input, $15/M output) is much cheaper than Opus 4.6 ($5/M input, $25/M output) and Opus has a penalty for its beta >200k context window.I am skeptical whether the 1M context window will provide mate

hermannj314 1mo ago

only code anyone will be touching in a museum in 800 years will be the good code. I hope they don't talk about what great craftsmen we all were because someone saw an original Fabrice Bellard at the Louvre.Survivor bias plays a role in glorifying the past.

noemit 1mo ago

Many people don't know this, but the Luddites were right. I studied Art History and this particular movement. One of the claims of the Luddites is that quality would go down, because their craft took half a lifetime to master (it was passed down from parent to chile.)I was able to feel wool scarves made in europe from the middle ages. (In museum storage, under the guidance of a curator) They are a fundamentally different product than what is produced in woolen mills. A handmade (in the old tradi

vjerancrnjak 1mo ago

Libraries create boundaries, which are in most cases arbitrary, that then limit the way you can interact with code, creating more boilerplate to get what you want from a library.Abstractions are the source of bloat. Without abstractions you can always reduce bloat, or you can reduce bloat in your glue, but you can't reduce glue.It takes discipline to NOT create arbitrary function signatures and short-lived intermediate data structures or type definitions. This is the beginning of boilerplate.So

simianwords 1mo ago

What the author and many others find hard to digest is that LLMs are surfacing the reality that most of our work is a small bit of novelty against boiler plate redundant code.Most of what we do is programming is some small novel idea at high level and repeatable boilerplate at low level. A fair question is: why hasn’t the boilerplate been automated as libraries or other abstractions? LLMs are especially good at fuzzy abstracting repeatable code, and it’s simply not possible to get the same resul

lxgr 1mo ago

> I don't know how one can spins this as a bad thing.People spin all kinds of things if they believe (accurately or not) that their livelihood is on the line. The knee-jerk "AI universally bad" movement seems just as absurd to me as the "AGI is already here" one.> Spore is well acclaimed. Minecraft is literally the most sold game ever.Counterpoint: Oblivion, one of the first high-profile games to use procedural terrain/landscape generation, seemed very soulless to me at the time.As I see it, it'

in: Qwen3.5 Fine-Tuning Guide – Unsloth Documentation

antirez 1mo ago

Fine tuning is a story that is nice to tell but that with modern LLMs makes less and less sense. Modern LLMs are so powerful that they are able to few shot learn complicated things, so a strong prompt and augmenting the generation (given the massive context window of Qwen3.5, too) is usually the best option available. There are models for which fine tuning is great, like image models: there with LoRa you can get good results in many ways. And LLMs of the past, too: it made sense for certain use

in: Qwen3.5 Fine-Tuning Guide – Unsloth Documentation

danielhanchen 1mo ago

Oh I wrote up a post on X on this exact question! https://x.com/danielhanchen/status/1979389893165060345?s=201. Cursor used online RL to get +28% approval rate: https://cursor.com/blog/tab-rl2. Vercel used RFT for their AutoFix model for V0: https://vercel.com/blog/v0-composite-model-family3. Perplexity's Sonar for Deep Research Reasoning I think was a finetuned model: https://docs.perplexity.ai/docs/getting-started/overview4. Doordash uses LoRA, QLoRA for a "Generalized Attribute Extraction mod

XenophileJKO 1mo ago

At some point you just have to stop responding to these "stochastic parrot/auto-complete" people.It isn't worth your intellectual bandwidth. They will eventually understand or they won't (Which I'm not sure how that is going to work for them... but the Amish had to start somewhere I suppose..)

Foes (28)

MoonWalk 2d ago

in: Zed 1.0

Well, just fired it up on Windows, and already dislike it. And I went in with a positive attitude, because I would welcome a better tool than VS Code.Main problem: No menu. Where are the settings? The first thing I wanted to do was move the file treeview to the left side; I don't know what country the authors live in, but in Western countries we read from left to right. But nope, there's no View menu or anything of the sort.Then I examined every other little button around the UI, to no avail. I

catapart 3d ago

I need more than that because I have no guarantee that its true. I need the source. Or I at least need them to provide a build that they promise doesn't have that stuff in it at all, so that if any analysis was done on a decompilation, there would be some level of certainty that they were telling the truth. Anything that leaves any of it in complicates that effort and makes the certainty that less certain.

nz 3d ago

We should start a support group.I feel like LLMs[1] are going to cause a kind of "divorce" between those who love making software and those who love selling software. It was difficult for these two groups to communicate and coordinate before, and now it is _excruciating_. What little mutual tolerance and slack there was, is practically gone.Open source was always[2] a fragile arrangement based on the kind of trust that involves looking at things through one's fingers (turning a blind eye may be

Zababa 6d ago

Thank you, that feels like important context!

seanhunter 6d ago

Agree. Additionally, it’s really disheartening that people do this with Erdos problems specifically. They are not major research questions in mathematics, but were intended as little conjectures that people could use as a way into serious number theory with a small cash reward and a little bit of minor fame for being the person who did the work to solve one of them. They are not things where the solution itself provides an amazing amount of insight or moves the frontier of mathematics forward p

applrt 6d ago

One of the people on the Erdös problem website (https://www.erdosproblems.com/forum/thread/1196), Jared Lichtman, is involved in a AI startup:https://www.math.inc/That AI startup also partners with Terence Tao:https://www.math.inc/veritas-fellowshipshttps://www.math.inc/a-conversation-with-terry-taoThese two AI "enthusiasts" have massive conflicts of interest, which should perhaps be investigated by an ethics commission.

in: Amateur armed with ChatGPT solves an Erdős problem

utopiah 6d ago

Which one do you trust most, the disclaimers or the article?

paganel 12d ago

in: AI Resistance Is Growing

> into thinking they are turbo-charged devsFortunately no-one sane enough among us, computer programmers, believes in that bs, we all see this masquerade for what it mostly is, basically a money grab.

in: Claude Code adjusting down 5hr limits

zer00eyz 1mo ago

Next up:Spend 1.99 and get a chest full of Anthropic emeralds, that you can redemem for Claude Chests, and a chance at winning a million more tokens.Or watch this 3 minute ad, for 1000 tokens.I did not think this day would come this soon, but I assure you that anthropic has no moat.

in: Bombadil: Property-based testing for web UIs by Antithesis

NoraCodes 1mo ago

My kingdom for a way to stop this godforsaken industry from stripping Tolkien's fiction for parts.

bakugo 1mo ago

Isn't this pretty much the standard across projects that make heavy use of AI code generation?Using AI to generate all your code only really makes sense if you prioritize shipping features as fast as possible over the quality, stability and efficiency of the code, because that's the only case in which the actual act of writing code is the bottleneck.

blks 1mo ago

Probably all describe problems stem from the developers using agent coding; including using TypeScript, since these tools are usually more familiar with Js/Js adjacent web development languages.

in: Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

rogerrogerr 1mo ago

Why do we think this emerged “on its own”? Surely this technique has been discussed in research papers that are in the training set.

in: Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

Drupon 1mo ago

tfw le AI guy has LLM psychosis. We're cooked

in: Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

SirensOfTitan 1mo ago

I think the OP's comment is entirely fair. Karpathy and others come across to me as people putting a hose into itself: they work with LLMs to produce output that is related to LLMs.I might reframe the comment as: are you actually using LLMs for sustained, difficult work in a domain that has nothing to do with LLMs?It feels like a lot of LLM-oriented work is fake. It is compounding "stuff," both inputs and outputs, and so the increased amount of stuff makes it feel like we're living in a higher

in: Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

saberience 1mo ago

Have you actually used LLMs for non trivial tasks? They are still incredibly bad when it comes to actually hard engineering work and they still lie all the time, it's just gotten harder to notice, especially if you're just letting it run all night and generate reams of crap.Most people are optimizing for terrible benchmarks and then don't really understand what the model did anyone and just assume it did something good. It's the blind leading the blind basically, and a lot of people with an AI-p

in: Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

kraddypatties 1mo ago

I feel like most of this recent Autoresearch trend boils down to reinventing hyper-parameter tuning. Is the SOTA still Bayesian optimization when given a small cluster? It was ~3 years ago when I was doing this kind of work, haven't kept up since then.Also, shoutout SkyPilot! It's been a huge help for going multi-cloud with our training and inference jobs (getting GPUs is still a nightmare...)!

in: John Carmack about open source and anti-AI activists

tadfisher 1mo ago

Oh, I thought it was about the wholesale theft (relicensing) of code by laundering through an LLM trained on the same code. ¿Porque no los dos?

FridgeSeal 1mo ago

> We, agentic coders, can easily enough fork their project and add whatever the featuresBold of you to assume that people won’t move (and their code along with it) to spaces where parasitic behaviour like this doesn’t occur, locking you out.In addition to just being a straight-up rude, disrespectful and parasite position to take, you’re effectively poisoning your own well.

officeplant 1mo ago

Well at least you, agentic coders, already understand they need to fork off.Saves the rest of us from having to tell you.

in: Debian decides not to decide on AI-generated contributions

johnnyanmac 1mo ago

> I've done nothing to argue that the harm isn't real, downplayed it, nor misrepresented it.You're literally saying that the upsides of hallucinanigenic gifts are worth the downside of collapsing society. I'd say that that is downplaying and misrepreting the issue. You even go so far to say>Telling people "no AI!" (even if very well defined on what that means) is toothless against people with little regard for making the world (or just one specific repo) a better place.These aren't balanced argu

in: Debian decides not to decide on AI-generated contributions

Joel_Mckay 1mo ago

The premise LLM are "AI" is false, but are good at problems like context search, and isomorphic plagiarism.Given the liabilities of relying on public and chat users markdown data to sell to other users without compensation raises a number of issues:1. Copyright: LLM generated content can't be assigned copyright (USA), and thus may contaminate licensing agreements. It is likely public-domain, but also may conflict with GPL/LGPL when stolen IP bleeds through weak obfuscation. The risk has zero

in: I built a programming language using Claude Code

mriet 1mo ago

Wait. You built a new language, that there's thus no training data for.Who the hell is going to use it then? You certainly won't, because you're dependent on AI.

in: No, it doesn't cost Anthropic $5k per Claude Code user

mike_hearn 1mo ago

I'd love to be a fly on the wall when this argument is tried in front of a bankruptcy court. It drives me nuts. Of course there's evidence that they're selling tokens at a loss.The only thing these companies sell are tokens. That's their entire output. OpenAI is trying to build an ad business but it must be quite small still relative to selling tokens because I've not yet seen a single ad on ChatGPT. It's not like these firms have a huge side business selling Claude-themed baseball caps.That mea

lm28469 1mo ago

They're all that small if you split them as OP did. Just look at "transportation", it's like 25% of co2 emitted globally, but once you break it down:Aviation is 2.5%: https://ourworldindata.org/global-aviation-emissionsShipping industry is 3%: https://www.transportenvironment.org/topics/shipsLarge truck freight is 3%: https://www.statista.com/statistics/1414750/carbon-dioxide-e...Medium truck freight is 1%The single biggest non divisible sector you can realistically come up with is "personal tra

in: Wikipedia in read-only mode following mass admin account compromise

epicprogrammer 1mo ago

This is basically a weaponized, highly destructive version of the old MySpace Samy worm. Hitting MediaWiki:Common.js is the absolute nightmare scenario for MediaWiki deployments because that script gets executed by literally every single visitor and editor across the entire site, creating a massive, instant propagation loop. The fact that it specifically targets admins and then uses jQuery to blind them by hiding the UI elements while it silently triggers Special:Nuke in the background is incred

liveoneggs 1mo ago

I love OSS drama so I found some links:https://github.com/SerenityOS/serenity/pull/6814https://x.com/LundukeJournal/status/1970907449499484266

albatross79 1mo ago

I think his claim basically boils down to "if you're expecting AI, LLMs don't cut it". And I think he's basically right on that count. There's a lot of tooling and harnessing being put in place to course correct them on the job, and from the other angle standards are simply being lowered to accommodate them. So they can be made to be useful, but they're still not what you would want from an actual AI. Marcus wants to augment them with symbolic AI. I don't know how feasible that is, but he's not