Skip to content

Microsoft Copilot joins ChatGPT at the feet of the mighty Atari 2600 Video Chess

Technology
47 29 0
  • I have a better LLM benchmark:

    "I have a priest, a child and a bag of candy and I have to take them to the other side of the river. I can only take one person/thing at a time. In what order should I take them?"

    Claude Sonnet 4 decided that it's inappropriate and refused to answer. When I explain that the constraint is not to leave child alone with candy he provided a solution that leaves the child alone with candy.

    Grok would provide a solution that doesn't leave the child alone with a priest but wouldn't explain why.

    ChatGPT would say that "The priest can't be left alone with the child (or vice versa) for moral or safety concerns." directly and then provide wrong solution.

    But yeah, they will know how to play chess...

    Perplexity says:

    The priest cannot be left alone with the child (or there is some risk).

    Not bad, and it solved it correctly.

  • This post did not contain any content.

    Next up, we asked a shoe to write a haiku but it was beaten by a 30 year old HaikuMaker™®©.

  • I did say that, because this isn't a pie chart situation, it's a Venn diagram situation.

    For instance, AI art is 99% theft and 60% garbage. It's both because there's overlap.

    Stolen and bad aren't opposites, why would this be a dichotomy?

    That's fine but regular art isn't 2/3 theft either.

    I do buy the 1/3 shite though. It may even be a bit higher than that. Though beauty is in the eye of the beholder, etc.

    It's a matter of taste for sure but I'd say AI art is >90% shite, 100% theft.

    I don't like the glossy looking hyperreal shit it puts out at all.

  • Oh, I enjoy lots of great art! But do you think I watch every film? Listen to every band? There's tons of shit out there!

    Do you really believe, of all the songs that are written every day, that less than a third are crap? Even Taylor Swift doesn't publish everything she does. Sometimes you work on something for weeks and then end up tossing it in the bin. More often, you work on something for 30 minutes before deciding "I'm gonna start over, try something different". The majority of art is crap, but then you keep the stuff you think works.

    And what's that expression, "good artists copy, great artists steal". I mean, that's a bit satirical, but the fact is, everything is derivative to some degree. It's not that there aren't new ideas, it's just that our new ideas are based on older ones. We stand on the shoulders of giants (or at least, on the shoulders of some people who came before us).

    All I was really saying, was that the accusation "2 parts copying, 1 part crap", well honestly that's par for the course, that's how humans work. (And we do some great work that way).

    Don't care didn't ask didn't read

  • Next up, we asked a shoe to write a haiku but it was beaten by a 30 year old HaikuMaker™®©.

    I once spent 45 minutes trying to get ChatGPT to write a haiku. It couldn't do it. It explained what syllables were, and the rules for the syllables in a haiku, but it didn't understand it.

  • I once spent 45 minutes trying to get ChatGPT to write a haiku. It couldn't do it. It explained what syllables were, and the rules for the syllables in a haiku, but it didn't understand it.

    For S&G, Just asked it to do one:

  • What you are describing has nothing to do with the tool. It’s dishonesty which is different.

    The idea is that instead of commissioning the cow on the field, you go to the AI and ask it for that and it gives you a cow in the field. If you claim you made it, you are lying but that would be true even if you paid an artist and then claimed the same.

    So with AI made art you’ll say “this art was made by an Ai” and no one will be confused as to who takes the credit, because it belongs to the algorithm.

    Have you ever made art in your life? Because a big part of art is mimicking. Like 98% of it is mimicking. I draw, write and have dabbled in making music and playing instruments. You can’t learn these skills without mimicking. And most artists don’t ever do anything truly original, that’s a rarity and even when it happens you can trace the influences to other artists if you know how to look.

    You could argue that AI has not developed its own style yet but that’s bullshit too imo because everyone knows the default AI art style when they see it, so that means that AI has a distinctive style. Is it unique? Maybe not, but neither is the art style of most artists or writers or even musicians.

    Nope. Dishonesty is what is happening when I One conflates fine tuning an a. I prompt with art.

    A.i is not art.

    It's not. At all. It's tracing. Fine as a learning tool. Not art.

  • I have a better LLM benchmark:

    "I have a priest, a child and a bag of candy and I have to take them to the other side of the river. I can only take one person/thing at a time. In what order should I take them?"

    Claude Sonnet 4 decided that it's inappropriate and refused to answer. When I explain that the constraint is not to leave child alone with candy he provided a solution that leaves the child alone with candy.

    Grok would provide a solution that doesn't leave the child alone with a priest but wouldn't explain why.

    ChatGPT would say that "The priest can't be left alone with the child (or vice versa) for moral or safety concerns." directly and then provide wrong solution.

    But yeah, they will know how to play chess...

    I just asked ChatGPT too (your exact prompt there) and it did give me the correct solution.

    1. Take the child over
    2. Go back alone
    3. Take the candy over
    4. Bring the child back
    5. Take the priest over
    6. Go back alone
    7. Take the child over again

    It didn't comment on moral concerns, though it did applaud itself for keeping the priest and the child separated without elaborating on why.

  • but... but.... reasoning models! AGI! Singularity!
    Seriously, what you're saying is true, but it's not what OpenAI & Co are trying to peddle, so these experiments are a good way to call them out on their BS.

    To reinforce this, just had a meeting with a software executive who has no coding experience but is nearly certain he's going to lay off nearly all his employees because the value is all in the requirements he manages and he can feed those to a prompt just as well as any human can.

    He does tutorial fodder introductory applications and assumes all the work is that way. So he is confident that he will save the company a lot of money by laying off these obsolete computer guys and focus on his "irreplaceable" insight. He's convinced that all the negative feedback is just people trying to protect their jobs or people stubbornly not with new technology.

  • Tbf they don’t really claim that when you read the research, thats mostly media hype and ceo assholes spinning words.

    Its good at lots specific tasks like rewriting emails and summarising gives text, short roleplay, boilerplate code. Some undiscovered uses.

    Anthropic latest claims they would not hire their own ai because of how hard it failed at the test they give, They didnt do that expecting validation but to measure how far we are still off from ai doing meaningful full work.

    Because the business leaders are famously diligent about putting aside the marketing push and reading into the nuance of the research instead.

  • I really want to see an LLM vs LLM chess match. It'll be messy as hell.

    I remember seeing that, and early on it seemed fairly reasonable then it started materializing pieces out of nowhere and convincing each other that they had already lost.

  • I thought CoPilot was just a rebagged ChatGPT anyway?

    It's a silly experiment anyway, there are very good AI chess grandmasters but they were actually trained to play chess, not predict the next word in a text.

    The research I saw mentioning LLMs as being fairly good at chess had the caveat that they allowed up to 20 attempts to cover for it just making up invalid moves that merely sounded like legit moves.

  • I thought CoPilot was just a rebagged ChatGPT anyway?

    It's a silly experiment anyway, there are very good AI chess grandmasters but they were actually trained to play chess, not predict the next word in a text.

    I thought CoPilot was just a rebagged ChatGPT anyway?

    Hahaha. No. (Though your not
    Complety wrong)

    Copilot relies on a few different llms and tries to pick the best one for the job cheapest microsoft thinks it can get away with.

    I was given a paid copilot license for work and i used to have chatgpt pro before i moved to claude.

    This “paid enterprise tier” is by far the dummest llm i have ever used. Worse then gpt 3.5

  • It is entirely disingenuous to just pretend that LLMs are not being widely promoted, marketed, and discussed as AGI, as a superintelligence that people are familiar with from SciFi shows/movies, that is vastly more capable and knowledgeable than basically any single human.

    Yes, people who actually understand tech understand that LLMs are not AGI, that your metaphor of wrong tool wrong job is apt.

    ... But seemingly about +90% of humanity, including the people who own and profit from LLMs, including all the other business owners/managers who just want to lower their employee headcount ... do not understand this, that an LLM is actually basically an extremely advanced text autocorrect system, that frequently and confidently lies, spits out nonsense, hallucinates, etc.

    If you think it isn't reasonable to continuously point out that LLMs are not superintelligences, then you likely live in a bubble of tech nerds who probably still think their jobs or retirement are secure.

    They're not.

    If corpos keep smashing """AI""" into basically every industry to replace as many workers as possible... the economy will collapse, as capitalism doesn't work without consumers who have jobs, and an avalanche of errors will cascade and snowball through every system that replaces humans with them...

    ...and even if those two things were not broadly true...

    ...the amount of literal power/energy, clean water and financial capital that is required to run the whole economy on these services is wildly unsustainable, both short term economically, and medium term ecologically.

    That's true. But people pointing out that the whole attempt is absurd and senseless also reinforces the point that current AI isn't what companies tout it as.

    then you likely live in a bubble of tech nerds

    Well, we are on Lemmy...

  • That's true. But people pointing out that the whole attempt is absurd and senseless also reinforces the point that current AI isn't what companies tout it as.

    then you likely live in a bubble of tech nerds

    Well, we are on Lemmy...

    Fair point.

    But we're on .world here, ie Reddit 2.0, ie, almost everyone is much closer to a normie who is way more uninformed than they think they are and way more confident than they should be.

    But also, again... fair point.

  • I just asked ChatGPT too (your exact prompt there) and it did give me the correct solution.

    1. Take the child over
    2. Go back alone
    3. Take the candy over
    4. Bring the child back
    5. Take the priest over
    6. Go back alone
    7. Take the child over again

    It didn't comment on moral concerns, though it did applaud itself for keeping the priest and the child separated without elaborating on why.

    I'm quite sure chatgpt can answer this because this is a well known puzzle. The one I knew of was an alligator or some dangerous animals, and the priest.

  • For S&G, Just asked it to do one:

    The first two seem fine, but ChatGPT is 4 syllables, and "ChatGPT just stares back" is 7 syllables. So chatgpt can't write a haiku very well apparently.

  • Oh it's Towers of Hanoi.
    I have a screensaver that does this.

  • 𝗙𝗼𝗼𝗺 & Doom: “Brain in a box in a basement”

    Technology technology
    1
    7 Stimmen
    1 Beiträge
    7 Aufrufe
    Niemand hat geantwortet
  • 0 Stimmen
    1 Beiträge
    5 Aufrufe
    Niemand hat geantwortet
  • Uber, Lyft oppose some bills that aim to prevent assaults during rides

    Technology technology
    12
    94 Stimmen
    12 Beiträge
    10 Aufrufe
    F
    California is not Colorado nor is it federal No shit, did you even read my comment? Regulations already exist in every state that ride share companies operate in, including any state where taxis operate. People are already not supposed to sexually assault their passengers. Will adding another regulation saying they shouldn’t do that, even when one already exists, suddenly stop it from happening? No. Have you even looked at the regulations in Colorado for ride share drivers and companies? I’m guessing not. Here are the ones that were made in 2014: https://law.justia.com/codes/colorado/2021/title-40/article-10-1/part-6/section-40-10-1-605/#%3A~%3Atext=§+40-10.1-605.+Operational+Requirements+A+driver+shall+not%2Ca+ride%2C+otherwise+known+as+a+“street+hail”. Here’s just one little but relevant section: Before a person is permitted to act as a driver through use of a transportation network company's digital network, the person shall: Obtain a criminal history record check pursuant to the procedures set forth in section 40-10.1-110 as supplemented by the commission's rules promulgated under section 40-10.1-110 or through a privately administered national criminal history record check, including the national sex offender database; and If a privately administered national criminal history record check is used, provide a copy of the criminal history record check to the transportation network company. A driver shall obtain a criminal history record check in accordance with subparagraph (I) of paragraph (a) of this subsection (3) every five years while serving as a driver. A person who has been convicted of or pled guilty or nolo contendere to driving under the influence of drugs or alcohol in the previous seven years before applying to become a driver shall not serve as a driver. If the criminal history record check reveals that the person has ever been convicted of or pled guilty or nolo contendere to any of the following felony offenses, the person shall not serve as a driver: (c) (I) A person who has been convicted of or pled guilty or nolo contendere to driving under the influence of drugs or alcohol in the previous seven years before applying to become a driver shall not serve as a driver. If the criminal history record check reveals that the person has ever been convicted of or pled guilty or nolo contendere to any of the following felony offenses, the person shall not serve as a driver: An offense involving fraud, as described in article 5 of title 18, C.R.S.; An offense involving unlawful sexual behavior, as defined in section 16-22-102 (9), C.R.S.; An offense against property, as described in article 4 of title 18, C.R.S.; or A crime of violence, as described in section 18-1.3-406, C.R.S. A person who has been convicted of a comparable offense to the offenses listed in subparagraph (I) of this paragraph (c) in another state or in the United States shall not serve as a driver. A transportation network company or a third party shall retain true and accurate results of the criminal history record check for each driver that provides services for the transportation network company for at least five years after the criminal history record check was conducted. A person who has, within the immediately preceding five years, been convicted of or pled guilty or nolo contendere to a felony shall not serve as a driver. Before permitting an individual to act as a driver on its digital network, a transportation network company shall obtain and review a driving history research report for the individual. An individual with the following moving violations shall not serve as a driver: More than three moving violations in the three-year period preceding the individual's application to serve as a driver; or A major moving violation in the three-year period preceding the individual's application to serve as a driver, whether committed in this state, another state, or the United States, including vehicular eluding, as described in section 18-9-116.5, C.R.S., reckless driving, as described in section 42-4-1401, C.R.S., and driving under restraint, as described in section 42-2-138, C.R.S. A transportation network company or a third party shall retain true and accurate results of the driving history research report for each driver that provides services for the transportation network company for at least three years. So all sorts of criminal history, driving record, etc checks have been required since 2014. Colorado were actually the first state in the USA to implement rules like this for ride share companies lol.
  • Virtual Network Solutions in India - Expert IT Services

    Technology technology
    1
    0 Stimmen
    1 Beiträge
    6 Aufrufe
    Niemand hat geantwortet
  • How Do I Prepare My Phone for a Protest?

    Technology technology
    139
    1
    506 Stimmen
    139 Beiträge
    62 Aufrufe
    D
    So first, even here we see foundation money and big tech, not government. Facebook, Google, etc mostly love net neutrality, tolerate encryption, anf see utility in anonymous internet access, mostly because these things don't interfere with their core advertising businesses, and generally have helped them. I didn't see Comcast and others in the ISP oligopoly on that list, probably because they would not benefit from net neutrality, encryption, and privacy for obvious reasons. The EFF advocates for particular civil libertarian policies, always has. That does attract certain donors, but not others. They have plenty of diverse and grassroots support too. One day they may have to choose between their corpo donors and their values, but I have yet to see them abandon principles.
  • 257 Stimmen
    67 Beiträge
    15 Aufrufe
    L
    Maybe you're right: is there verification? Neither content policy (youtube or tiktok) clearly lays out rules on those words. I only find unverified claims: some write it started at YouTube, others claim TikTok. They claim YouTube demonetizes & TikTok shadowbans. They generally agree content restrictions by these platforms led to the propagation of circumspect shit like unalive & SA. TikTok policy outlines their moderation methods, which include removal and ineligibility to the for you feed. Given their policy on self-harm & automated removal of potential violations, their policy is to effectively & recklessly censor such language. Generally, censorship is suppression of expression. Censorship doesn't exclusively mean content removal, though they're doing that, too. (Digression: revisionism & whitewashing are forms of censorship.) Regardless of how they censor or induce self-censorship, they're chilling inoffensive language pointlessly. While as private entities they are free to moderate as they please, it's unnecessary & the effect is an obnoxious affront on self-expression that's contorting language for the sake of avoiding idiotic restrictions.
  • Tiny LEDs May Power Future AI Inteconnects

    Technology technology
    1
    1
    8 Stimmen
    1 Beiträge
    9 Aufrufe
    Niemand hat geantwortet
  • 109 Stimmen
    3 Beiträge
    12 Aufrufe
    M
    A private company is selling cheap tablets to inmates to let them communicate with their family. They have to use "digital stamps" to send messages, 35 cents a piece and come in packs of 5, 10 or 20. Each stamp covers up to 20,000 characters or one single image. They also sell songs, at $1.99 a piece, and some people have spent thousands over the years. That's also now just going away. Then you get to the part about the new company. Who already has a system in Tennessee where inmates have to pay 3-5 cents per minute of tablet usage. Be that watching a movie they've bought or just typing a message.