Skip to content

AI agents wrong ~70% of time: Carnegie Mellon study

Technology
92 52 0
  • 1 Stimmen
    1 Beiträge
    0 Aufrufe
    Niemand hat geantwortet
  • 311 Stimmen
    37 Beiträge
    35 Aufrufe
    S
    Same, especially when searching technical or niche topics. Since there aren't a ton of results specific to the topic, mostly semi-related results will appear in the first page or two of a regular (non-Gemini) Google search, just due to the higher popularity of those webpages compared to the relevant webpages. Even the relevant webpages will have lots of non-relevant or semi-relevant information surrounding the answer I'm looking for. I don't know enough about it to be sure, but Gemini is probably just scraping a handful of websites on the first page, and since most of those are only semi-related, the resulting summary is a classic example of garbage in, garbage out. I also think there's probably something in the code that looks for information that is shared across multiple sources and prioritizing that over something that's only on one particular page (possibly the sole result with the information you need). Then, it phrases the summary as a direct answer to your query, misrepresenting the actual information on the pages they scraped. At least Gemini gives sources, I guess. The thing that gets on my nerves the most is how often I see people quote the summary as proof of something without checking the sources. It was bad before the rollout of Gemini, but at least back then Google was mostly scraping text and presenting it with little modification, along with a direct link to the webpage. Now, it's an LLM generating text phrased as a direct answer to a question (that was also AI-generated from your search query) using AI-summarized data points scraped from multiple webpages. It's obfuscating the source material further, but I also can't help but feel like it exposes a little of the behind-the-scenes fuckery Google has been doing for years before Gemini. How it bastardizes your query by interpreting it into a question, and then prioritizes homogeneous results that agree on the "answer" to your "question". For years they've been doing this to a certain extent, they just didn't share how they interpreted your query.
  • The British jet engine that failed in the 'Valley of Death'

    Technology technology
    16
    1
    40 Stimmen
    16 Beiträge
    64 Aufrufe
    R
    Giving up advancements in science and technology is stagnation. That's not what I'm suggesting. I'm suggesting giving up some particular, potential advancements in science and tecnology, which is a whole different kettle of fish and does not imply stagnation. Thinking it’s a good idea to not do anything until people are fed and housed is stagnation. Why do you think that?
  • An earnest question about the AI/LLM hate

    Technology technology
    57
    73 Stimmen
    57 Beiträge
    146 Aufrufe
    ineedmana@lemmy.worldI
    It might be interesting to cross-post this question to !fuck_ai@lemmy.world but brace for impact
  • AI and misinformation

    Technology technology
    3
    20 Stimmen
    3 Beiträge
    18 Aufrufe
    D
    Don’t lose hope, just pretend to with sarcasm. Or if you are feeling down it could work the other way too. https://aibusiness.com/nlp/sarcasm-is-really-really-really-easy-for-ai-to-handle#close-modal
  • Tide42 – A Fast, Minimalist CLI IDE for Terminal-Centric Devs

    Technology technology
    6
    2
    96 Stimmen
    6 Beiträge
    34 Aufrufe
    anzo@programming.devA
    Emacs has panes. Is this supposed to imitate a fraction of the holy power?
  • 24 Stimmen
    14 Beiträge
    30 Aufrufe
    S
    I think you're missing some key points. Any file hosting service, no matter what, will have to deal with CSAM as long as people are able to upload to it. No matter what. This is an inescapable fact of hosting and the internet in general. Because CSAM is so ubiquitous and constant, one can only do so much to moderate any services, whether they're a large corporation are someone with a server in their closet. All of the larger platforms like 'meta', google, etc., mostly outsource that moderation to workers in developing countries so they don't have to also provide mental health counselling, but that's another story. The reason they own their own hardware is because the hosting services can and will disable your account and take down your servers if there's even a whiff of CSAM. Since it's a constant threat, it's better to own your own hardware and host everything from your closet so you don't have to eat the downtime and wait for some poor bastard in Nigeria to look through your logs and reinstate your account (not sure how that works exactly though).
  • 104 Stimmen
    169 Beiträge
    72 Aufrufe
    M
    Yes I did, on page 243: It was employed in the Philosophical Transactions by the Dutch astronomer N. Cruquius; ÷ is found in Hübsch and Crusius. It was used very frequently as the symbol for subtraction and ``minus´´ in the Maandelykse Mathematische Liefbebbery, Purmerende (1754-69)