Skip to content

OpenAI just launched its new ChatGPT Agent that can make as many as 1 complicated cupcake order per hour, but even Sam Altman says you probably shouldn't trust it for 'high-stakes uses'

Technology
56 34 227
  • Okay but that’s not what easier means.

    Easier would be to call the bakery or spending 10 minutes browsing their website, asking to cast, and checking out.

    I don’t want to spend an hour on tasks that would normally take 10 minutes. My executive dysfunctions already make me good at doing that.

    This might be a revolutionary idea, but what if they helped me do that take an hour in 10 minutes?

    I’m just putting that idea out there totally for free in case any AI companies want to jump on that opportunity.

    It’s a starting point

  • I needed about 30 minutes to do a python application from scratch that took linear JSON data files, merged them and presented them as a tree in a GUI.

    Before that I had barely done anything in python, basically could do a basic function declaration with a simple operation and nothing else. I even didn't have a lot of experience with UI at all.

    But like you I had experience with java and such, and those skills transfer. All it took was searching basic syntax/related code examples and required library imports. And I mean basic, search engine search, not AI answers.

    All I'm saying is, I really don't think AI is providing anything a lot more efficient than doing a good old crawl through API docs and stack overflow. So the fact it's using tremendous amounts of resources to maybe achieve a 10% efficiency boost is bothering me a lot.

    If that was a 10% boost for you and you could've done it in 33 minutes without AI or experience, then my imposter syndrome has been right all along!

    I'd bet that would've taken me a few days and maybe buying a reference book and starting with hello world.

  • So much for the internet. We somehow managed to turn one of humanity’s greatest achievements into a hateful echo chamber we use for warfare first and then into a blackbox where inefficient AI agents communicate with each other in the most inefficient way so the planet can cook us alive even faster. God forbid just calling up a bakery to order some cupcakes.

    Or just sending an email.

  • If that was a 10% boost for you and you could've done it in 33 minutes without AI or experience, then my imposter syndrome has been right all along!

    I'd bet that would've taken me a few days and maybe buying a reference book and starting with hello world.

    Did the AI gave you a starting point that would be very different from a bit of code someone submitted 10 years ago on stack exchange? Because in my experience, everything has already been asked and answered. This includes the most basic and naive stuff, and often I am very grateful for it, because, yeah, sometimes I need someone to guide me through the most basic stuff.

    In fact, the AI needed that exact knowledge base and a bunch more to exist in the first place. It's just vaguely competent at retrieving it.

    Anyway, I didn't say I had no experience, just the most minimal python experience. There are definitely a few quirks I had to learn (the data structures mostly), but for the rest is mostly finding the right method in the reference library, like you would in java.

  • OpenAI launched ChatGPT Agent on Thursday, its latest effort in the industry-wide pursuit to turn AI into a profitable enterprise—not just one that eats investors' billions. In its announcement blog, OpenAI says its Agent "can now do work for you using its own computer," but CEO Sam Altman warns that the rollout presents unpredictable risks.

    [...]

    OpenAI research lead Lisa Fulford told Wired that she used Agent to order "a lot of cupcakes," which took the tool about an hour, because she was very specific about the cupcakes.

    I need an agent who would set up DevOps for me. Then robots would definitely be the ones working hard, not humans.

  • Okay but that’s not what easier means.

    Easier would be to call the bakery or spending 10 minutes browsing their website, asking to cast, and checking out.

    I don’t want to spend an hour on tasks that would normally take 10 minutes. My executive dysfunctions already make me good at doing that.

    This might be a revolutionary idea, but what if they helped me do that take an hour in 10 minutes?

    I’m just putting that idea out there totally for free in case any AI companies want to jump on that opportunity.

    I don’t want to spend an hour on tasks that would normally take 10 minutes.

    I don't get it, do you think she spent an hour talking to ChatGPT to try and get it to order doughnuts?

  • It really is a nightmare brewing. And they will hide behind excuses and keep it all opaque unless they are strongly regulated.

    regulated by who ?
    our senate and congress is filled by pimps who work for pedophiles like epstien and cheer genocider scum murdering children on daily basis. this include the “lesser evil” party. they had 4 years to release the pedo list or even try to slow down the genocide.
    they are not gonna give a fck about us working 3 jobs just to pay rent and live on prison food.

    sad reality is that after a certain threshold in a parasites-host dynamic, there is no other ending other than host dying because parasites has grown too big for it too feed. so unless another deadly parasite like cia or kgb luigi the 1%, the rest 99% are dead.

  • It’s a starting point

    I use agents a lot and have written several MCP servers now, the tasks I automate aren't things like order cupcakes, it's mainly the glue between complex things.

    I still can't get Claude to nicely open a JIRA ticket for me, but I can get it to read through a sequence of connected documents and filter that into.

    I don't think agents are ready for the main event and these are some poor examples of their power.

    I'm not saying they won't improve, but using the right tool for the right job is critical. An hour to order cupcakes is silly even for an llm.

  • I spent maybe 90 minutes trying to get ChatGPT to write me a fucking AppleScript or bash to copy all calendar events from a source calendar to a destination. That shit does not work.

    for coding you want to use claude

    if you don’t want to pay for claude after so many messages what you can do is use mistral to code it up then use claude to proof check the code

  • unfortunately any ai service is going to make things worse.
    right now we can discover and choose. with search and browsing dead, ai provider will shove the product giving them the highest cut aka most garbage or snake oil products.

    even today targeted advertising for poor people is filled with betting, lottery & poker game. similarly elder people are primarily shown ads of miracle cure for chronic illness and scammy religious crap.

    edit: switch to kagi. its paid but well worth it.
    searchXNG is also a good alternative if you have got time for hosting it urself.

    Kagi is all in on AI. Its the AI slop version of a search ranking algorithm

  • It won't do that well. What you have to do is ask it to help you leverage your existing development skills in an unfamiliar domain. I used it to help me write a python program to authenticate, pull and filter data from a GCP firestore database and create an XLSX with summary and detail sheets.

    I've never used Python before in my life. It took me about 4 hours. Of course I've been doing that sort of thing in Java for many years. Turned out I wrote that faster in Python than I could in Java. Configuring the connection to that database in Python was so simple compared to Java.

    The stuff it wrote was sometimes incomplete or wrong in subtle ways, but I could see the bits that didn't make sense which helped me focus on those things and ask better questions to help me figure it out. I think the last hour was just me tweaking stuff by myself because I didn't need help with it by that point.

    Anyone who already knows another programming language but has never used python in their life can write a simple python app quickly, regardless

  • I needed about 30 minutes to do a python application from scratch that took linear JSON data files, merged them and presented them as a tree in a GUI.

    Before that I had barely done anything in python, basically could do a basic function declaration with a simple operation and nothing else. I even didn't have a lot of experience with UI at all.

    But like you I had experience with java and such, and those skills transfer. All it took was searching basic syntax/related code examples and required library imports. And I mean basic, search engine search, not AI answers.

    All I'm saying is, I really don't think AI is providing anything a lot more efficient than doing a good old crawl through API docs and stack overflow. So the fact it's using tremendous amounts of resources to maybe achieve a 10% efficiency boost is bothering me a lot.

    There’s also the fact that

    1. It’s only really good at this if you want it to generate Python, PowerShell, bash, or C++ code. Try any other language and it quickly assumes you’re using outdated and often incompatible libraries or doesn’t really understand how the language functions.
    2. at the end of it all, neither you nor the AI has learned anything new; you’ll have to put in the exact same amount of work the next time. If you do it yourself, then over time that 10% advantage goes away.

    Now, these things could both change over time, but humans are much more efficient to train than current state of the art probability sieves we call GenAI.

  • Anyone who already knows another programming language but has never used python in their life can write a simple python app quickly, regardless

    No you can't if you don't know the libraries. Python is entirely dependent on what libraries you include. If you don't know what you need you can't do shit.

  • There’s also the fact that

    1. It’s only really good at this if you want it to generate Python, PowerShell, bash, or C++ code. Try any other language and it quickly assumes you’re using outdated and often incompatible libraries or doesn’t really understand how the language functions.
    2. at the end of it all, neither you nor the AI has learned anything new; you’ll have to put in the exact same amount of work the next time. If you do it yourself, then over time that 10% advantage goes away.

    Now, these things could both change over time, but humans are much more efficient to train than current state of the art probability sieves we call GenAI.

    It's only assuming if you aren't specific enough. And you do know their training is usually a year or two or 3 old. So they don't know about whatever new shit your trying to work with.

  • I use agents a lot and have written several MCP servers now, the tasks I automate aren't things like order cupcakes, it's mainly the glue between complex things.

    I still can't get Claude to nicely open a JIRA ticket for me, but I can get it to read through a sequence of connected documents and filter that into.

    I don't think agents are ready for the main event and these are some poor examples of their power.

    I'm not saying they won't improve, but using the right tool for the right job is critical. An hour to order cupcakes is silly even for an llm.

    It’s examples for the common guy in the streets who don’t know what an mcp server is.

  • OpenAI launched ChatGPT Agent on Thursday, its latest effort in the industry-wide pursuit to turn AI into a profitable enterprise—not just one that eats investors' billions. In its announcement blog, OpenAI says its Agent "can now do work for you using its own computer," but CEO Sam Altman warns that the rollout presents unpredictable risks.

    [...]

    OpenAI research lead Lisa Fulford told Wired that she used Agent to order "a lot of cupcakes," which took the tool about an hour, because she was very specific about the cupcakes.

    That's quite a bold statement to make since he now has US military contracts. What is he making cupcakes for the Pentagon?

  • No you can't if you don't know the libraries. Python is entirely dependent on what libraries you include. If you don't know what you need you can't do shit.

    No you can't if you don't know the libraries

    IDE.

    Python is entirely dependent on what libraries you include

    ??

    If you don't know what you need you can't do shit.

    IDE.

    The problems you propose in your comment are not only greatly exaggerated but already been solved for decades using conventional tools AND apply to literally all languages, having nothing at all to do with python. Good try! My statement holds true.

    Maybe your assumption is that you're in a cave writing code in pencil on paper, but that's not a typical working condition. If you have access to Claude to use as a crutch, then you have access to search for an available python library and read some "Getting Started" paragraphs.

    Seriously, if the only real value that AI provides is "you don't need to know the libraries you're using" 💀 that's not quite as strong of an argument as you think it is lmaooo "knowing the libraries" isn't exactly an existing challenge or software engineering problem that people struggle with...

  • So much for the internet. We somehow managed to turn one of humanity’s greatest achievements into a hateful echo chamber we use for warfare first and then into a blackbox where inefficient AI agents communicate with each other in the most inefficient way so the planet can cook us alive even faster. God forbid just calling up a bakery to order some cupcakes.

    Companies will dump billions into AI to fuck everyone over but the transition to clean energy is always too expensive.

  • Kagi is all in on AI. Its the AI slop version of a search ranking algorithm

    Kagi has AI tools but they don't shove it down your throat. I don't understand what "all in on AI" means in this context. The company has said that they want to use AI like they use JavaScript, ie they want to use it as a tool but their product should work well without it.

  • for coding you want to use claude

    if you don’t want to pay for claude after so many messages what you can do is use mistral to code it up then use claude to proof check the code

    tx, will try it some time.

  • Vibe coding service Replit deleted production database

    Technology technology
    118
    1
    568 Stimmen
    118 Beiträge
    24 Aufrufe
    iavicenna@lemmy.worldI
    And you are talking about obvious bugs. It likely will make erroneous judgements (because somewhere in its training data someone coded it that way) which will down the line lead to subtle problems that will wreck your system and cost you much more. Sure humans can also make the same mistakes but in the current state of affairs, an experienced software engineer/programmer has a much higher chance of catching such an error. With LLMs it is more hit and miss especially if it is a more niche topic. Currently, it is an assistant tool (sometimes quite helpful, sometimes frustrating at best) not an autonomous coder. Any company that claims so is either a crook or also does not know much about coding.
  • 14 Stimmen
    2 Beiträge
    23 Aufrufe
    lupusblackfur@lemmy.worldL
    Welp, queue up some more multi-million dollar "donations" to have these cases dropped... Not like the TechBros don't have the funds. ‍️ ‍️
  • 454 Stimmen
    149 Beiträge
    670 Aufrufe
    eyekaytee@aussie.zoneE
    They will say something like solar went from 600gw to 1000 thats a 66% increase this year and coal only increased 40% except coal is 3600gw to 6400. Hrmmmm, maybe these numbers are outdated? Based on this coal and gas are down: In Q1 2025, solar generation rose 48% compared to the same period in 2024. Solar power reached 254 TWh, making up 10% of total electricity. This was the largest increase among all clean energy sources. Coal-fired electricity dropped by 4%, falling to 1,421 TWh. Gas-fired power also went down by 4%, reaching 67 TWh https://carboncredits.com/china-sets-clean-energy-record-in-early-2025-with-951-tw/ are no where close to what is required to meet their climate goals Which ones in particular are you talking about? Trump signs executive order directing US withdrawal from the Paris climate agreement — again https://apnews.com/article/trump-paris-agreement-climate-change-788907bb89fe307a964be757313cdfb0 China vowed on Tuesday to continue participating in two cornerstone multinational arrangements -- the World Health Organization and Paris climate accord -- after newly sworn-in US President Donald Trump ordered withdrawals from them. https://www.france24.com/en/live-news/20250121-china-says-committed-to-who-paris-climate-deal-after-us-pulls-out What's that saying? You hate it when the person you hate is doing good? I can't remember what it is I can't fault them for what they're doing at the moment, even if they are run by an evil dictatorship and do pollute the most I’m not sure how european defense spending is relevant It suggests there is money available in the bank to fund solar/wind/battery, but instead they are preparing for? something? what? who knows. France can make a fighter jet at home but not solar panels apparently. Prehaps they would be made in a country with environmental and labour laws if governments legislated properly to prevent companies outsourcing manufacturing. However this doesnt absolve china. China isnt being forced at Gunpoint to produce these goods with low labour regulation and low environmental regulation. You're right, it doesn't absolve china, and I avoid purchasing things from them wherever possible, my solar panels and EV were made in South Korea, my home battery was made in Germany, there are only a few things in my house made in China, most of them I got second hand but unfortunately there is no escaping the giant of manufacturing. With that said it's one thing for me to sit here and tut tut at China, but I realise I am not most people, the most clearest example is the extreme anti-ai, anti-billionaire bias on this platform, in real life most people don't give a fuck, they love Amazon/Microsoft/Google/Apple etc, they can't go a day without them. So I consider myself a realist, if you want people to buy your stuff then you will need to make the conditions possible for them to WANT to buy your stuff, not out of some moral lecture and Europe isn't doing that, if we look at energy prices: Can someone actually point out to me where this comes from? ... At the end of the day energy is a small % of EU household spending I was looking at corporate/business energy use: Major European companies are already moving to cut costs and retain their competitive edge. For example, Thyssenkrupp, Germany’s largest steelmaker, said on Monday it would slash 11,000 jobs in its steel division by 2030, in a major corporate reshuffle. https://oilprice.com/Latest-Energy-News/World-News/High-Energy-Costs-Continue-to-Plague-European-Industry.html Prices have since fallen but are still high compared to other countries. A poll by Germany's DIHK Chambers of Industry and Commerce of around 3,300 companies showed that 37% were considering cutting production or moving abroad, up from 31% last year and 16% in 2022. For energy-intensive industrial firms some 45% of companies were mulling slashing output or relocation, the survey showed. "The trust of the German economy in energy policy is severely damaged," Achim Dercks, DIHK deputy chief executive said, adding that the government had not succeeded in providing companies with a perspective for reliable and affordable energy supply. https://www.reuters.com/business/energy/more-german-companies-mull-relocation-due-high-energy-prices-survey-2024-08-01/ I've seen nothing to suggest energy prices in the EU are SO cheap that it's worth moving manufacturing TO Europe, and this is what annoys me the most. I've pointed this out before but they have an excellent report on the issues: https://commission.europa.eu/document/download/97e481fd-2dc3-412d-be4c-f152a8232961_en?filename=The+future+of+European+competitiveness+_+A+competitiveness+strategy+for+Europe.pdf Then they put out this Competitive Compass: https://commission.europa.eu/topics/eu-competitiveness/competitiveness-compass_en But tbh every week in the EU it seems like they are chasing after some other goal. This would be great, it would have been greater 10 years ago. Agreed
  • 311 Stimmen
    37 Beiträge
    207 Aufrufe
    S
    Same, especially when searching technical or niche topics. Since there aren't a ton of results specific to the topic, mostly semi-related results will appear in the first page or two of a regular (non-Gemini) Google search, just due to the higher popularity of those webpages compared to the relevant webpages. Even the relevant webpages will have lots of non-relevant or semi-relevant information surrounding the answer I'm looking for. I don't know enough about it to be sure, but Gemini is probably just scraping a handful of websites on the first page, and since most of those are only semi-related, the resulting summary is a classic example of garbage in, garbage out. I also think there's probably something in the code that looks for information that is shared across multiple sources and prioritizing that over something that's only on one particular page (possibly the sole result with the information you need). Then, it phrases the summary as a direct answer to your query, misrepresenting the actual information on the pages they scraped. At least Gemini gives sources, I guess. The thing that gets on my nerves the most is how often I see people quote the summary as proof of something without checking the sources. It was bad before the rollout of Gemini, but at least back then Google was mostly scraping text and presenting it with little modification, along with a direct link to the webpage. Now, it's an LLM generating text phrased as a direct answer to a question (that was also AI-generated from your search query) using AI-summarized data points scraped from multiple webpages. It's obfuscating the source material further, but I also can't help but feel like it exposes a little of the behind-the-scenes fuckery Google has been doing for years before Gemini. How it bastardizes your query by interpreting it into a question, and then prioritizes homogeneous results that agree on the "answer" to your "question". For years they've been doing this to a certain extent, they just didn't share how they interpreted your query.
  • My AI Skeptic Friends Are All Nuts

    Technology technology
    31
    1
    13 Stimmen
    31 Beiträge
    162 Aufrufe
    J
    I did read it, and my comment is exactly referencing the attitude of the author which is "It's good enough, so you should use it". I disagree, and say it's another dumbass shortcut to cash grab on a less than stellar ecosystem and product. It's training wheels for failure.
  • Britain’s Companies Are Being Hacked

    Technology technology
    9
    1
    21 Stimmen
    9 Beiträge
    52 Aufrufe
    D
    Is that "goodbye" in Russian? Why?
  • GeForce GTX 970 8GB mod is back for a full review

    Technology technology
    1
    34 Stimmen
    1 Beiträge
    14 Aufrufe
    Niemand hat geantwortet
  • 0 Stimmen
    4 Beiträge
    34 Aufrufe
    redfox@infosec.pubR
    Yeah, damn, I always forget about that...just like they want...