Skip to content

OpenAI just launched its new ChatGPT Agent that can make as many as 1 complicated cupcake order per hour, but even Sam Altman says you probably shouldn't trust it for 'high-stakes uses'

Technology
56 34 332
  • I use agents a lot and have written several MCP servers now, the tasks I automate aren't things like order cupcakes, it's mainly the glue between complex things.

    I still can't get Claude to nicely open a JIRA ticket for me, but I can get it to read through a sequence of connected documents and filter that into.

    I don't think agents are ready for the main event and these are some poor examples of their power.

    I'm not saying they won't improve, but using the right tool for the right job is critical. An hour to order cupcakes is silly even for an llm.

    It’s examples for the common guy in the streets who don’t know what an mcp server is.

  • OpenAI launched ChatGPT Agent on Thursday, its latest effort in the industry-wide pursuit to turn AI into a profitable enterprise—not just one that eats investors' billions. In its announcement blog, OpenAI says its Agent "can now do work for you using its own computer," but CEO Sam Altman warns that the rollout presents unpredictable risks.

    [...]

    OpenAI research lead Lisa Fulford told Wired that she used Agent to order "a lot of cupcakes," which took the tool about an hour, because she was very specific about the cupcakes.

    That's quite a bold statement to make since he now has US military contracts. What is he making cupcakes for the Pentagon?

  • No you can't if you don't know the libraries. Python is entirely dependent on what libraries you include. If you don't know what you need you can't do shit.

    No you can't if you don't know the libraries

    IDE.

    Python is entirely dependent on what libraries you include

    ??

    If you don't know what you need you can't do shit.

    IDE.

    The problems you propose in your comment are not only greatly exaggerated but already been solved for decades using conventional tools AND apply to literally all languages, having nothing at all to do with python. Good try! My statement holds true.

    Maybe your assumption is that you're in a cave writing code in pencil on paper, but that's not a typical working condition. If you have access to Claude to use as a crutch, then you have access to search for an available python library and read some "Getting Started" paragraphs.

    Seriously, if the only real value that AI provides is "you don't need to know the libraries you're using" 💀 that's not quite as strong of an argument as you think it is lmaooo "knowing the libraries" isn't exactly an existing challenge or software engineering problem that people struggle with...

  • So much for the internet. We somehow managed to turn one of humanity’s greatest achievements into a hateful echo chamber we use for warfare first and then into a blackbox where inefficient AI agents communicate with each other in the most inefficient way so the planet can cook us alive even faster. God forbid just calling up a bakery to order some cupcakes.

    Companies will dump billions into AI to fuck everyone over but the transition to clean energy is always too expensive.

  • Kagi is all in on AI. Its the AI slop version of a search ranking algorithm

    Kagi has AI tools but they don't shove it down your throat. I don't understand what "all in on AI" means in this context. The company has said that they want to use AI like they use JavaScript, ie they want to use it as a tool but their product should work well without it.

  • for coding you want to use claude

    if you don’t want to pay for claude after so many messages what you can do is use mistral to code it up then use claude to proof check the code

    tx, will try it some time.

  • OpenAI launched ChatGPT Agent on Thursday, its latest effort in the industry-wide pursuit to turn AI into a profitable enterprise—not just one that eats investors' billions. In its announcement blog, OpenAI says its Agent "can now do work for you using its own computer," but CEO Sam Altman warns that the rollout presents unpredictable risks.

    [...]

    OpenAI research lead Lisa Fulford told Wired that she used Agent to order "a lot of cupcakes," which took the tool about an hour, because she was very specific about the cupcakes.

    So now they're raiding We Bare Bears for ideas?

  • No you can't if you don't know the libraries

    IDE.

    Python is entirely dependent on what libraries you include

    ??

    If you don't know what you need you can't do shit.

    IDE.

    The problems you propose in your comment are not only greatly exaggerated but already been solved for decades using conventional tools AND apply to literally all languages, having nothing at all to do with python. Good try! My statement holds true.

    Maybe your assumption is that you're in a cave writing code in pencil on paper, but that's not a typical working condition. If you have access to Claude to use as a crutch, then you have access to search for an available python library and read some "Getting Started" paragraphs.

    Seriously, if the only real value that AI provides is "you don't need to know the libraries you're using" 💀 that's not quite as strong of an argument as you think it is lmaooo "knowing the libraries" isn't exactly an existing challenge or software engineering problem that people struggle with...

    It sounds like you are a much better developer than me, but to be fair I've had to teach myself everything using nothing but books and Google for thirty years. I've rarely had the luxury of working with someone who had the knowledge to mentor me, and never got a degree outside an AAS in electronics, so I've probably missed some critical skills along the way.

    In a lot of ways, the AI fills that role because it's better at answering questions than it is writing code. Earlier today it was explaining to me how a DOM selector could return a stale element in some cases in a failing end to end test. It took a few back and forths with some code examples before I really understood why the selectors might not be working.

    It also suggested some code changes that I had to push back on because, even though the code had errors, the errors weren't causing the problem. While building an array of validators I had awaited them, causing them to run serially instead of in parallel during Promise.all(). So you definitely have to know what you're doing to avoid having the AI waste your time (or at least more time than it takes to push back).

    I'm still trying to debug it, but without the AI, I'd be googling the fuck out of typescript syntax, JavaScript idiosyncrasies, and a whole testing framework I've never seen before.

    So...

    if the only real value that AI provides is "you don't need to know the libraries you're using"

    ...returns false.

  • Did the AI gave you a starting point that would be very different from a bit of code someone submitted 10 years ago on stack exchange? Because in my experience, everything has already been asked and answered. This includes the most basic and naive stuff, and often I am very grateful for it, because, yeah, sometimes I need someone to guide me through the most basic stuff.

    In fact, the AI needed that exact knowledge base and a bunch more to exist in the first place. It's just vaguely competent at retrieving it.

    Anyway, I didn't say I had no experience, just the most minimal python experience. There are definitely a few quirks I had to learn (the data structures mostly), but for the rest is mostly finding the right method in the reference library, like you would in java.

    Logically, you would be right. My practical experience is I waste a lot less time trying to google multiple explanations something because one by itself isn't helping me figure it out, writing bugged PoC test code and thinking something is broken, sorting through a bunch of things that haven't been relevant for 3 versions, etc.

    Of course the AI is trained on the same material we can an all find and read, but it does it orders of magnitude more quickly. The trade off is that it's not always right, but neither am I and neither are most sources on the internet right in all circumstances. But it's so fast and easy that I can iterate and evolve designs and understanding much more quickly than I could on my own.

  • So now they're raiding We Bare Bears for ideas?

    Explain.

  • That's quite a bold statement to make since he now has US military contracts. What is he making cupcakes for the Pentagon?

    Grok has tje Pentagon contract. Does OpenAI also have one?

  • Grok has tje Pentagon contract. Does OpenAI also have one?

    Microsoft's AI, which is OpenAI, is approved for Defense Contracts. https://www.cnbc.com/2025/06/16/openai-wins-200-million-us-defense-contract.html It even has an ominous project name which was posted to a public site which I cannot seem to recall at the moment.

  • No you can't if you don't know the libraries

    IDE.

    Python is entirely dependent on what libraries you include

    ??

    If you don't know what you need you can't do shit.

    IDE.

    The problems you propose in your comment are not only greatly exaggerated but already been solved for decades using conventional tools AND apply to literally all languages, having nothing at all to do with python. Good try! My statement holds true.

    Maybe your assumption is that you're in a cave writing code in pencil on paper, but that's not a typical working condition. If you have access to Claude to use as a crutch, then you have access to search for an available python library and read some "Getting Started" paragraphs.

    Seriously, if the only real value that AI provides is "you don't need to know the libraries you're using" 💀 that's not quite as strong of an argument as you think it is lmaooo "knowing the libraries" isn't exactly an existing challenge or software engineering problem that people struggle with...

    In a cave with pen and paper is nearly what I learned with. I learned with the run time, msdn, notepad and the cmd line. And yes you do end up in many situations where you simply don't have or can't use a full on ide everytime. Sounds like you've never really left your comfort zones and stuck your neck out in some tech you don't understand quite yet. Or worked in areas under strict software controls.

  • In a cave with pen and paper is nearly what I learned with. I learned with the run time, msdn, notepad and the cmd line. And yes you do end up in many situations where you simply don't have or can't use a full on ide everytime. Sounds like you've never really left your comfort zones and stuck your neck out in some tech you don't understand quite yet. Or worked in areas under strict software controls.

    It's telling that you're focused on personal assumptions instead of addressing the argument

  • CEO Sam Altman warns that the rollout presents unpredictable risks.

    But that doesn't prevent his profit motive from consuming untold amounts of electricity to shove this into your face. They know what they're doing. They know their product is used primarily to generate spam, and secondarily is designed to form addictive faux-relationships with their users.

    Burn in hell. Actually, given the direction this is all going, we will all be burning in hell within generations.

    And produced with a shit ton of copyright violations, etc. Just about everything is immoral about it.

  • OpenAI launched ChatGPT Agent on Thursday, its latest effort in the industry-wide pursuit to turn AI into a profitable enterprise—not just one that eats investors' billions. In its announcement blog, OpenAI says its Agent "can now do work for you using its own computer," but CEO Sam Altman warns that the rollout presents unpredictable risks.

    [...]

    OpenAI research lead Lisa Fulford told Wired that she used Agent to order "a lot of cupcakes," which took the tool about an hour, because she was very specific about the cupcakes.

    What’s more high stakes than a complicated cupcake order?

  • What’s more high stakes than a complicated cupcake order?

    An order for a weed smoking cow.

  • I use agents a lot and have written several MCP servers now, the tasks I automate aren't things like order cupcakes, it's mainly the glue between complex things.

    I still can't get Claude to nicely open a JIRA ticket for me, but I can get it to read through a sequence of connected documents and filter that into.

    I don't think agents are ready for the main event and these are some poor examples of their power.

    I'm not saying they won't improve, but using the right tool for the right job is critical. An hour to order cupcakes is silly even for an llm.

    yes in the wired article one of them says they would like to find out where it got stuck taking an hour with an agent replay feature

  • Companies will dump billions into AI to fuck everyone over but the transition to clean energy is always too expensive.

    its easier to rule world that is in ruins than thriving one. They know they have to live on same planet as us yet still they dont seem to care if its going to shit. While so many rich people are dumb as bricks and dont deserve their wealth at all, there are also many who actually know what they are doing yet still they dont want to seriously work towards stopping the climate change, even though it wouldnt even reduce their wealth by that much in comparison.

    So only reasoning i can think of they want to have more complete control over everything, but they cant have it because world is too complicated and healthy. When civilizations start to fall, the rich will still have everything and with that they can start enforcing themselves on everyone.

    I dont have anything to base this on, its just my thought on the matter. It just feels like something billionaire would do, they demonstrate every day that they will not be content with anything and will not care about other people's suffering to get it.

  • 249 Stimmen
    11 Beiträge
    64 Aufrufe
    rivalarrival@lemmy.todayR
    https://en.m.wikipedia.org/wiki/Oskar_Schindler
  • Dubai to debut restaurant operated by an AI chef

    Technology technology
    6
    26 Stimmen
    6 Beiträge
    36 Aufrufe
    G
    Huh, looks like my days of having absolutely zero interest in going to Dubai are coming to a middle
  • 337 Stimmen
    19 Beiträge
    112 Aufrufe
    R
    What I'm speaking about is that it should be impossible to do some things. If it's possible, they will be done, and there's nothing you can do about it. To solve the problem of twiddled social media (and moderation used to assert dominance) we need a decentralized system of 90s Web reimagined, and Fediverse doesn't deliver it - if Facebook and Reddit are feudal states, then Fediverse is a confederation of smaller feudal entities. A post, a person, a community, a reaction and a change (by moderator or by the user) should be global entities (with global identifiers, so that the object by id of #0000001a2b3c4d6e7f890 would be the same object today or 10 years later on every server storing it) replicated over a network of servers similarly to Usenet (and to an IRC network, but in an IRC network servers are trusted, so it's not a good example for a global system). Really bad posts (or those by persons with history of posting such) should be banned on server level by everyone. The rest should be moderated by moderator reactions\changes of certain type. Ideally, for pooling of resources and resilience, servers would be separated by types into storage nodes (I think the name says it, FTP servers can do the job, but no need to be limited by it), index nodes (scraping many storage nodes, giving out results in structured format fit for any user representation, say, as a sequence of posts in one community, or like a list of communities found by tag, or ... , and possibly being connected into one DHT for Kademlia-like search, since no single index node will have everything), and (like in torrents?) tracker nodes for these and for identities, I think torrent-like announce-retrieve service is enough - to return a list of storage nodes storing, say, a specified partition (subspace of identifiers of objects, to make looking for something at least possibly efficient), or return a list of index nodes, or return a bunch of certificates and keys for an identity (should be somehow cryptographically connected to the global identifier of a person). So when a storage node comes online, it announces itself to a bunch of such trackers, similarly with index nodes, similarly with a user. One can also have a NOSTR-like service for real-time notifications by users. This way you'd have a global untrusted pooled infrastructure, allowing to replace many platforms. With common data, identities, services. Objects in storage and index services can be, say, in a format including a set of tags and then the body. So a specific application needing to show only data related to it would just search on index services and display only objects with tags of, say, "holo_ns:talk.bullshit.starwars" and "holo_t:post", like a sequence of posts with ability to comment, or maybe it would search objects with tags "holo_name:My 1999-like Star Wars holopage" and "holo_t:page" and display the links like search results in Google, and then clicking on that you'd see something presented like a webpage, except links would lead to global identifiers (or tag expressions interpreted by the particular application, who knows). (An index service may return, say, an array of objects, each with identifier, tags, list of locations on storage nodes where it's found or even bittorrent magnet links, and a free description possibly ; then the user application can unify responses of a few such services to avoid repetitions, maybe sort them, represent them as needed, so on.) The user applications for that common infrastructure can be different at the same time. Some like Facebook, some like ICQ, some like a web browser, some like a newsreader. (Star Wars is not a random reference, my whole habit of imagining tech stuff is from trying to imagine a science fiction world of the future, so yeah, this may seem like passive dreaming and it is.)
  • 948 Stimmen
    85 Beiträge
    512 Aufrufe
    L
    Yeah this thread ended up being more hostile to regular Americans than I intended but US culture and US global hegemony are the things that attract and amplify the shitty people from around the world. USA is the final boss of capitalist imperialism and the people have completely lost control over the reins. It's now a matter of when they actually say enough is enough, be it now or after Fascism runs its course and hurts millions of others around the world as well.
  • 10 Stimmen
    3 Beiträge
    29 Aufrufe
    T
    "Science" under capitalism has always been funded and developed by/for fascists. The originals in the USA were the founding enslavers. The nazis had their time. Now it's the zios. R&D for genocide as usual.
  • 0 Stimmen
    1 Beiträge
    11 Aufrufe
    Niemand hat geantwortet
  • Microsoft pulls MS365 Business Premium from nonprofits

    Technology technology
    37
    1
    48 Stimmen
    37 Beiträge
    184 Aufrufe
    S
    That's the thing, I wish we could just switch all enterprises to Linux, but Microsoft developed a huge ecosystem that really does have good features. Unless something comparable comes up in the Linux world, I don't see Europe becoming independent of Microsoft any time soon
  • Everyone Is Cheating Their Way Through College

    Technology technology
    23
    1
    170 Stimmen
    23 Beiträge
    115 Aufrufe
    L
    i can this for essay writing, prior to AI people would use prompts and templates of the same exact subject and work from there. and we hear the ODD situation where someone hired another person to do all the writing for them all the way to grad school( this is just as bad as chatgpt) you will get caught in grad school or during your job interview. might be different for specific questions in stem where the answer is more abstract,