Skip to content

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Technology
356 149 3.3k
  • They aren't bullshitting because the training data is based on reality. Reality bleeds through the training data into the model. The model is a reflection of reality.

    An approximation of a very small limited subset of reality with more than a 1 in 20 error rate who produces massive amounts of tokens in quick succession is a shit representation of reality which is in every way inferior to human accounts to the point of being unusable for the industries in which they are promoted.

    And that Error Rate can only spike when the training data contains errors itself, which will only grow as it samples its own content.

  • I did and it was because it didn't have the previous context. But it did find the fallacies as present. Logic is literally what a chat AI is going. A human still needs to review the output but it did what it was asked. I don't know AI programming well. But I can say that logic is algorithmic. An AI has no problem parsing an argument and finding the fallacies. It's a tool like any other.

    That was a roundabout way of admitting you have no idea what logic is or how LLMs work. Logic works with propositions regardless of their literal meaning, LLMs operate with textual tokens irrespective of their formal logical relations. The chatbot doesn't actually do the logical operations behind the scenes, it only produces the text output that looks like the operations were done (because it was trained on a lot of existing text that reflects logical operations in its content).

  • Yeah I've always said the the flaw in Turing's Imitation Game concept is that if an AI was indistinguishable from a human it wouldn't prove it's intelligent. Because humans are dumb as shit. Dumb enough to force one of the smartest people in the world take a ton of drugs which eventually killed him simply because he was gay.

    Yeah we’re so stupid we’ve figured out advanced maths, physics, built incredible skyscrapers and the LHC, we may as individuals be less or more intelligent but humans as a whole are incredibly intelligent

  • That entire paragraph is much better at supporting the precise opposite argument. Computers can beat Kasparov at chess, but they're clearly not thinking when making a move - even if we use the most open biological definitions for thinking.

    By that metric, you can argue Kasparov isn't thinking during chess, either. A lot of human chess "thinking" is recalling memorized openings, evaluating positions many moves deep, and other tasks that map to what a chess engine does. Of course Kasparov is thinking, but then you have to conclude that the AI is thinking too. Thinking isn't a magic process, nor is it tightly coupled to human-like brain processes as we like to think.

  • Apple is significantly behind and arrived late to the whole AI hype, so of course it's in their absolute best interest to keep showing how LLMs aren't special or amazingly revolutionary.

    They're not wrong, but the motivation is also pretty clear.

    Apple always arrives late to any new tech, doesn't mean they haven't been working on it behind the scenes for just as long though...

  • Yeah I've always said the the flaw in Turing's Imitation Game concept is that if an AI was indistinguishable from a human it wouldn't prove it's intelligent. Because humans are dumb as shit. Dumb enough to force one of the smartest people in the world take a ton of drugs which eventually killed him simply because he was gay.

    I think that person had to choose between the drugs or hard core prison of the 1950s England where being a bit odd was enough to guarantee an incredibly difficult time as they say in England, I would've chosen the drugs as well hoping they would fix me, too bad without testosterone you're going to be suicidal and depressed, I'd rather choose to keep my hair than to be horny all the time

  • LOOK MAA I AM ON FRONT PAGE

    Fucking obviously. Until Data's positronic brains becomes reality, AI is not actual intelligence.

    AI is not A I. I should make that a tshirt.

  • That was a roundabout way of admitting you have no idea what logic is or how LLMs work. Logic works with propositions regardless of their literal meaning, LLMs operate with textual tokens irrespective of their formal logical relations. The chatbot doesn't actually do the logical operations behind the scenes, it only produces the text output that looks like the operations were done (because it was trained on a lot of existing text that reflects logical operations in its content).

    This is why I said I wasn't sure how AI works behind the scenes. But I do know that logic isn't difficult. Just to not fuck around between us. I have a CS background. Only saying this because I think you may have it as well and we can save some time.

    It makes sense to me that logic is something AI can parse easily. Logic in my mind is very easy if it can tokenize some text. Wouldn't the difficulty be if the AI has the right context.

  • TBH idk how people can convince themselves otherwise.

    They don’t convince themselves. They’re convinced by the multi billion dollar corporations pouring unholy amounts of money into not only the development of AI, but its marketing. Marketing designed to not only convince them that AI is something it’s not, but also that that anyone who says otherwise (like you) are just luddites who are going to be “left behind”.

    LLMs are also very good at convincing their users that they know what they are saying.

    It's what they're really selected for. Looking accurate sells more than being accurate.

    I wouldn't be surprised if many of the people selling LLMs as AI have drunk their own kool-aid (of course most just care about the line going up, but still).

  • Right now the hype from most is finding issues with chatgpt

    hype noun (1)

    publicity

    especially : promotional publicity of an extravagant or contrived kind

    You're abusing the meaning of "hype" in order to make the two sides appear the same, because you do understand that "hype" really describes the pro-AI discourse much better.

    It did find the fallacies based on what it was asked to do.

    It didn't. Put the text of your comment back into GPT and tell it to argue why the fallacies are misidentified.

    You act like this is fire and forget.

    But you did fire and forget it. I don't even think you read the output yourself.

    First I wanted to be honest with the output and not modify it.

    Or maybe you were just lazy?

    Personally I'm starting to find these copy-pasted AI responses to be insulting. It has the "let me Google that for you" sort of smugness around it. I can put in the text in ChatGPT myself and get the same shitty output, you know. If you can't be bothered to improve it, then there's absolutely no point in pasting it.

    Given what this output gave me, I can easily keep working this to get better and better arguments.

    That doesn't sound terribly efficient. Polishing a turd, as they say. These great successes of AI are never actually visible or demonstrated, they're always put off - the tech isn't quite there yet, but it's just around the corner, just you wait, just one more round of asking the AI to elaborate, just one more round of polishing the turd, just a bit more faith on the unbelievers' part...

    I just feel like you can’t honestly tell me that within 10 seconds having that summary is not beneficial.

    Oh sure I can tell you that, assuming that your argumentative goals are remotely honest and you're not just posting stupid AI-generated criticism to waste my time. You didn't even notice one banal way in which AI misinterpreted my comment (I didn't say SMBC is bad), and you'd probably just accept that misreading in your own supposed rewrite of the text. Misleading summaries that you have to spend additional time and effort double checking for these subtle or not so subtle failures are NOT beneficial.

    Ok let's give a test here. Let's start with understand logic. Give me a paragraph and let's see if it can find any logical fallacies. You can provide the paragraph. Only constraint is that the context has to exist within the paragraph.

  • I think because it's language.

    There's a famous quote from Charles Babbage when he presented his difference engine (gear based calculator) and someone asking "if you put in the wrong figures, will the correct ones be output" and Babbage not understanding how someone can so thoroughly misunderstand that the machine is, just a machine.

    People are people, the main thing that's changed since the Cuneiform copper customer complaint is our materials science and networking ability. Most things that people interact with every day, most people just assume work like it appears to on the surface.

    And nothing other than a person can do math problems or talk back to you. So people assume that means intelligence.

    "if you put in the wrong figures, will the correct ones be output"

    To be fair, an 1840 “computer” might be able to tell there was something wrong with the figures and ask about it or even correct them herself.

    Babbage was being a bit obtuse there; people weren't familiar with computing machines yet. Computer was a job, and computers were expected to be fairly intelligent.

    In fact I'd say that if anything this question shows that the questioner understood enough about the new machine to realise it was not the same as they understood a computer to be, and lacked many of their abilities, and was just looking for Babbage to confirm their suspicions.

  • While both Markov models and LLMs forget information outside their window, that’s where the similarity ends. A Markov model relies on fixed transition probabilities and treats the past as a chain of discrete states. An LLM evaluates every token in relation to every other using learned, high-dimensional attention patterns that shift dynamically based on meaning, position, and structure.

    Changing one word in the input can shift the model’s output dramatically by altering how attention layers interpret relationships across the entire sequence. It’s a fundamentally richer computation that captures syntax, semantics, and even task intent, which a Markov chain cannot model regardless of how much context it sees.

    an llm also works on fixed transition probabilities. All the training is done during the generation of the weights, which are the compressed state transition table. After that, it's just a regular old markov chain. I don't know why you seem so fixated on getting different output if you provide different input (as I said, each token generated is a separate independent invocation of the llm with a different input). That is true of most computer programs.

    It's just an implementation detail. The markov chains we are used to has a very short context, due to combinatorial explosion when generating the state transition table. With llms, we can use a much much longer context. Put that context in, it runs through the completely immutable model, and out comes a probability distribution. Any calculations done during the calculation of this probability distribution is then discarded, the chosen token added to the context, and the program is run again with zero prior knowledge of any reasoning about the token it just generated. It's a seperate execution with absolutely nothing shared between them, so there can't be any "adapting" going on

  • Most humans don't reason. They just parrot shit too. The design is very human.

    Thata why ceo love them. When your job is 90% spewing bs a machine that does that is impressive

  • You either an llm, or don't know how your brain works.

    LLMs don't know how how they work

  • Yeah, well there are a ton of people literally falling into psychosis, led by LLMs. So it’s unfortunately not that many people that already knew it.

    Dude they made chat gpt a little more boit licky and now many people are convinced they are literal messiahs. All it took for them was a chat bot and a few hours of talk.

  • LLMs (at least in their current form) are proper neural networks.

    Well, technically, yes. You're right. But they're a specific, narrow type of neural network, while I was thinking of the broader class and more traditional applications, like data analysis. I should have been more specific.

  • Fucking obviously. Until Data's positronic brains becomes reality, AI is not actual intelligence.

    AI is not A I. I should make that a tshirt.

    It’s an expensive carbon spewing parrot.

  • "if you put in the wrong figures, will the correct ones be output"

    To be fair, an 1840 “computer” might be able to tell there was something wrong with the figures and ask about it or even correct them herself.

    Babbage was being a bit obtuse there; people weren't familiar with computing machines yet. Computer was a job, and computers were expected to be fairly intelligent.

    In fact I'd say that if anything this question shows that the questioner understood enough about the new machine to realise it was not the same as they understood a computer to be, and lacked many of their abilities, and was just looking for Babbage to confirm their suspicions.

    "Computer" meaning a mechanical/electro-mechanical/electrical machine wasn't used until around after WWII.

    Babbag's difference/analytical engines weren't confusing because people called them a computer, they didn't.

    "On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

    • Charles Babbage

    If you give any computer, human or machine, random numbers, it will not give you "correct answers".

    It's possible Babbage lacked the social skills to detect sarcasm. We also have several high profile cases of people just trusting LLMs to file legal briefs and official government 'studies' because the LLM "said it was real".

  • LOOK MAA I AM ON FRONT PAGE

    I think it's important to note (i'm not an llm I know that phrase triggers you to assume I am) that they haven't proven this as an inherent architectural issue, which I think would be the next step to the assertion.

    do we know that they don't and are incapable of reasoning, or do we just know that for x problems they jump to memorized solutions, is it possible to create an arrangement of weights that can genuinely reason, even if the current models don't? That's the big question that needs answered. It's still possible that we just haven't properly incentivized reason over memorization during training.

    if someone can objectively answer "no" to that, the bubble collapses.

  • LOOK MAA I AM ON FRONT PAGE

    What's hilarious/sad is the response to this article over on reddit's "singularity" sub, in which all the top comments are people who've obviously never got all the way through a research paper in their lives all trashing Apple and claiming their researchers don't understand AI or "reasoning". It's a weird cult.

  • Huawei shows off AI computing system to rival Nvidia's top product

    Technology technology
    15
    21 Stimmen
    15 Beiträge
    73 Aufrufe
    C
    Huawei was uniquely, specifically, forced out of the US market around the time they were completing for 5G Tower standards.
  • 184 Stimmen
    37 Beiträge
    506 Aufrufe
    C
    Some of the stories do also include solutions to those same issues, though that also tends to lead to limiting the capabilities of the robots. The message could be interpreted as it being a trade off between versatility and risk.
  • 33 Stimmen
    6 Beiträge
    78 Aufrufe
    G
    Yes. I can't imagine that they will go after individuals. Businesses can't be so cavalier. But if creators don't pay the extra cost to make their models compliant with EU law, then they can't be used in the EU anyway. So it probably doesn't matter much. The Llama models with vision have the no-EU clause. It's because Meta wasn't allowed to train on European's data because of GDPR. The pure LLMs are fine. They might even be compliant, but we'll have to see what the courts think.
  • Men are opening up about mental health to AI instead of humans

    Technology technology
    341
    522 Stimmen
    341 Beiträge
    22k Aufrufe
    kingthrillgore@lemmy.mlK
    If I had nothin else going on I'd probably do it
  • 83 Stimmen
    13 Beiträge
    129 Aufrufe
    M
    It's a bit of a sticking point in Australia which is becoming more and more of a 'two-speed' society. Foxtel is for the rich classes, it caters to the right wing. Sky News is on Foxtel. These eSafety directives killing access to youtube won't affect those rich kids so much, but for everyone else it's going to be a nightmare. My only possible hope out of this is that maybe, Parliament and ACMA (Australian Communications and Media Authority, TV standards) decide that since we need a greater media landscape for kids and they can't be allowed to have it online, that maybe more than 3 major broadcasters could be allowed. It's not a lack of will that stops anyone else making a new free-to-air network, it's legislation, there are only allowed to be 3 commercial FTA broadcasters in any area. I don't love Youtube or the kids watching it, it's that the alternatives are almost objectively worse. 10 and 7 and garbage 24/7 and 9 is basically a right-wing hugbox too.
  • 815 Stimmen
    199 Beiträge
    4k Aufrufe
    Z
    It's clear you don't really understand the wider context and how historically hard these tasks have been. I've been doing this for a decade and the fact that these foundational models can be pretrained on unrelated things then jump that generalization gap so easily (within reason) is amazing. You just see the end result of corporate uses in the news, but this technology is used in every aspect of science and life in general (source: I do this for many important applications).
  • 310 Stimmen
    37 Beiträge
    360 Aufrufe
    S
    Same, especially when searching technical or niche topics. Since there aren't a ton of results specific to the topic, mostly semi-related results will appear in the first page or two of a regular (non-Gemini) Google search, just due to the higher popularity of those webpages compared to the relevant webpages. Even the relevant webpages will have lots of non-relevant or semi-relevant information surrounding the answer I'm looking for. I don't know enough about it to be sure, but Gemini is probably just scraping a handful of websites on the first page, and since most of those are only semi-related, the resulting summary is a classic example of garbage in, garbage out. I also think there's probably something in the code that looks for information that is shared across multiple sources and prioritizing that over something that's only on one particular page (possibly the sole result with the information you need). Then, it phrases the summary as a direct answer to your query, misrepresenting the actual information on the pages they scraped. At least Gemini gives sources, I guess. The thing that gets on my nerves the most is how often I see people quote the summary as proof of something without checking the sources. It was bad before the rollout of Gemini, but at least back then Google was mostly scraping text and presenting it with little modification, along with a direct link to the webpage. Now, it's an LLM generating text phrased as a direct answer to a question (that was also AI-generated from your search query) using AI-summarized data points scraped from multiple webpages. It's obfuscating the source material further, but I also can't help but feel like it exposes a little of the behind-the-scenes fuckery Google has been doing for years before Gemini. How it bastardizes your query by interpreting it into a question, and then prioritizes homogeneous results that agree on the "answer" to your "question". For years they've been doing this to a certain extent, they just didn't share how they interpreted your query.
  • Britain’s Companies Are Being Hacked

    Technology technology
    9
    1
    21 Stimmen
    9 Beiträge
    81 Aufrufe
    D
    Is that "goodbye" in Russian? Why?