Skip to content

Software developer, here.

Technology
25 16 0
  • I could definitely write it, but probably not as fast, even with fighting it. The report I got in 25-30 minutes would normally take me closer to 45-60, with having to research what to analyze, figure out how to parse different format of logs and break up and collate them and give a pretty output.

  • Software developer, here. (No, not a "vibe coder." I actually know how to read and write my own code and what it does.)

    Just had the opportunity to test GPT 5 as a coding assistant in Copilot for VS Code, which in my opinion is the only legitimately useful purpose for LLMs. (No, not to write everything for me, just to do some of the more tedious tasks faster.) The IDE itself can help keep them in line, because it detects when they screw up. Which is all the time, due to their nature. Even recent and relatively "good" models like Sonnet need constant babysitting.

    GPT 5 failed spectacularly. So badly, in fact, that I'm glad I only set it to analysis tasks and not to any write tasks. I will not be using it for anything else any time soon.

    Even when it gets it right, you have to then check it carefully. It feels like a net loss of speed most of the time. Reading and checking someone else's code is harder than writing your own

  • Even when it gets it right, you have to then check it carefully. It feels like a net loss of speed most of the time. Reading and checking someone else's code is harder than writing your own

    have to agree on that, there's the variation, it's faster if you take it's code verbatim, run it, and debug where there's obvious problems... but then you are vulnerable to unobvious problems, when a hacky way of doing it is weak to certain edge cases... and no real way to do it.

    Reading it's code, understanding it, finding the problems from the core, sounds as time consuming as writing the code.

  • Even when it gets it right, you have to then check it carefully. It feels like a net loss of speed most of the time. Reading and checking someone else's code is harder than writing your own

    On the code competition, I think it can do like 2 or 3 lines in particular scenarios. You have to have an instinct for "are the next three lines so blatantly obvious it is actually worth reading the suggestion, or just ignore it because I know it's going to screw up without even looking".

    Very very very rarely do I find prompt driven coding to be useful, like very boilerplate but also very tedious. Like "show user to specify these three parametets in this cli utility", and poof, you got a reasonable argv handling pretty reliably.

    Rule of thumb is if a viable answer could be expected during an interview by a random junior code applicant, it's worth giving the llm a shot. If it's something that a junior developer could get right after learning on the job a bit, then forget it, the LLM will be useless.

  • Despite the “official” coding score for GPT5 being higher, Claude sonnet still seems to blow it out of the water. That seems to suggest they are training to the test and the test must not be a very good test. Or they are lying.

    Problem with the "benchmarks" is Goodhart's Law: one a measure becomes a target, it ceases to be a good measurement.

    The AI companies obsession with these tests cause them to maniacly train on them, making then better at those tests, but that doesn't necessarily map to actual real world usefulness. Occasionally you'll see a guy that interviews well, but it's petty useless in general on the job. LLMs are basically those all the time, but at least useful because they are cheap and fast enough to be worth it for super easy bits.