Skip to content

[JS Required] MiniMax M1 model claims Chinese LLM crown from DeepSeek - plus it's true open-source

Technology
12 8 0
  • This post did not contain any content.

    Well… 🤔

  • DeepSeek imposes similar restrictions, but only on their website. You can self-host and then enjoy relatively truthful (as truthful as a bullshit generator can be) answers about both Tianmen Square, Palestine, and South Africa (something American-made bullshit generators apparently like making up, to appease their corporate overlords or conspiracy theorists respectively).

  • This post did not contain any content.

    What exactly makes this more "open source" than DeepSeek? The linked page doesn't make that particularly clear.

    DeepSeek doesn't release their training data (but they release a hell of a lot of other stuff), and I think that's about as "open" as these companies can get before they risk running afoul of copyright issues. Since you can't compile the model from scratch, it's not really open source. It's just freeware. But that's true for both models, as far as I can tell.

  • DeepSeek imposes similar restrictions, but only on their website. You can self-host and then enjoy relatively truthful (as truthful as a bullshit generator can be) answers about both Tianmen Square, Palestine, and South Africa (something American-made bullshit generators apparently like making up, to appease their corporate overlords or conspiracy theorists respectively).

    Nope, Self hosted deepseek 8b thinking and distilled variants still clam up about Tianmen Square

  • Nope, Self hosted deepseek 8b thinking and distilled variants still clam up about Tianmen Square

    If you're talking about the distillations, AFAIK they take somebody else's model and run it through their (actually open-source) distiller. I tried a couple of those models because I was curious. The distilled Qwen model is cagey about Tianmen Square, but Qwen was made by Alibaba. The distillation of a US-made model did not have this problem.

    (Edit: we're talking about these distillations, right? If somebody else ran a test and posted it online, I'm not privy to it.)

    I don't have enough RAM to run the full DeepSeek R1, but AFAIK it doesn't have this problem. Maybe it does.

    In case it isn't clear, BTW, I do despise LLMs and AI in general. The biggest issue with their lies (leaving aside every other issue with them for a moment) isn't the glaringly obvious stuff. Not Tianmen Square, and certainly not the "it's woke!" complaints about generating images of black founding fathers. The worst lies are the subtle and insidious little details like agreeableness - trying to get people to spend a little more time with them, which apparently turns once-reasonable people into members of micro-cults. Like cults, perhaps, spme skeptics think they can join in and not fall for the BS... And then they do.

    All four students had by now joined their chosen groups... Hugh had completely disappeared into a nine-week Arica training seminar; he was incommunicado and had mumbled something before he left about “how my energy has moved beyond academia.”

  • What exactly makes this more "open source" than DeepSeek? The linked page doesn't make that particularly clear.

    DeepSeek doesn't release their training data (but they release a hell of a lot of other stuff), and I think that's about as "open" as these companies can get before they risk running afoul of copyright issues. Since you can't compile the model from scratch, it's not really open source. It's just freeware. But that's true for both models, as far as I can tell.

    Open weights + an OSI approved license is generally what is used to refer to models as open source. the with that said, Deepseek R1 is am MIT license, and this one is Apache 2. Technically that makes Deepseek less restrictive, but who knows.

  • If you're talking about the distillations, AFAIK they take somebody else's model and run it through their (actually open-source) distiller. I tried a couple of those models because I was curious. The distilled Qwen model is cagey about Tianmen Square, but Qwen was made by Alibaba. The distillation of a US-made model did not have this problem.

    (Edit: we're talking about these distillations, right? If somebody else ran a test and posted it online, I'm not privy to it.)

    I don't have enough RAM to run the full DeepSeek R1, but AFAIK it doesn't have this problem. Maybe it does.

    In case it isn't clear, BTW, I do despise LLMs and AI in general. The biggest issue with their lies (leaving aside every other issue with them for a moment) isn't the glaringly obvious stuff. Not Tianmen Square, and certainly not the "it's woke!" complaints about generating images of black founding fathers. The worst lies are the subtle and insidious little details like agreeableness - trying to get people to spend a little more time with them, which apparently turns once-reasonable people into members of micro-cults. Like cults, perhaps, spme skeptics think they can join in and not fall for the BS... And then they do.

    All four students had by now joined their chosen groups... Hugh had completely disappeared into a nine-week Arica training seminar; he was incommunicado and had mumbled something before he left about “how my energy has moved beyond academia.”

    That's not how distillation works if I understand what you're trying to explain.

    If you distill model A to a smaller model, you just get a smaller version of model A with the same approximate distribution curve of parameters, but fewer of them. You can't distill Llama into Deepseek R1.

    I've been able to run distillations of Deepseek R1 up to 70B, and they're all censored still. There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.

  • That's not how distillation works if I understand what you're trying to explain.

    If you distill model A to a smaller model, you just get a smaller version of model A with the same approximate distribution curve of parameters, but fewer of them. You can't distill Llama into Deepseek R1.

    I've been able to run distillations of Deepseek R1 up to 70B, and they're all censored still. There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.

    I've been able to run distillations of Deepseek R1 up to 70B

    Where do you find those?

    There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.

    Thank you for mentioning this, as I finally confronted my own preconceptions and actually found an article by Perplexity that demonstrated R1 itself has demonstrable pro-China bias.

    Although Perplexity's own description should cause anybody who understands the nature of LLMs to pause. They describe it in their header as a

    version of the DeepSeek-R1 model that has been post-trained to provide unbiased, accurate, and factual information.

    That's a bold (read: bullshit) statement, considering the only altered its biases on China. I wouldn't consider the original model to be unbiased either, but apparently perplexity is giving them a pass on everything else. I guess it's part of the grand corporate lie that claims "AI is unbiased," a delusion that perplexity needs to maintain.

  • What exactly makes this more "open source" than DeepSeek? The linked page doesn't make that particularly clear.

    DeepSeek doesn't release their training data (but they release a hell of a lot of other stuff), and I think that's about as "open" as these companies can get before they risk running afoul of copyright issues. Since you can't compile the model from scratch, it's not really open source. It's just freeware. But that's true for both models, as far as I can tell.

    Yup, this is open weights just like DeepSeek. Open source should mean their source data is also openly available, but we all know companies won't do that until they stop violating copyright to train these things.

  • This post did not contain any content.

    Yay another LLM! That's definitely what the world needs and don't let anyone make you think otherwise. This is so fun guys. Let's fund the surveillance, stealing, misinformation, harmful biases, and destruction of the planet. I can't believe some people think that humanity is more important than another "open source" crazy pro max ultra 8K AI 9999!

  • Yup, this is open weights just like DeepSeek. Open source should mean their source data is also openly available, but we all know companies won't do that until they stop violating copyright to train these things.

    I figured as much. Even this line...

    M1's capabilities are top-tier among open-source models

    ... is right above a chart that calls it "open-weight".

    I dislike the conflation of terms that the OSI has helped legitimize. Up until LLMs, nobody called binary blobs "open-source" just because they were compiled using open-source tooling. That would be ridiculous

  • 157 Stimmen
    30 Beiträge
    12 Aufrufe
    D
    These are the 700 Actually Indians
  • Catbox.moe got screwed 😿

    Technology technology
    40
    55 Stimmen
    40 Beiträge
    16 Aufrufe
    archrecord@lemm.eeA
    I'll gladly give you a reason. I'm actually happy to articulate my stance on this, considering how much I tend to care about digital rights. Services that host files should not be held responsible for what users upload, unless: The service explicitly caters to illegal content by definition or practice (i.e. the if the website is literally titled uploadyourcsamhere[.]com then it's safe to assume they deliberately want to host illegal content) The service has a very easy mechanism to remove illegal content, either when asked, or through simple monitoring systems, but chooses not to do so (catbox does this, and quite quickly too) Because holding services responsible creates a whole host of negative effects. Here's some examples: Someone starts a CDN and some users upload CSAM. The creator of the CDN goes to jail now. Nobody ever wants to create a CDN because of the legal risk, and thus the only providers of CDNs become shady, expensive, anonymously-run services with no compliance mechanisms. You run a site that hosts images, and someone decides they want to harm you. They upload CSAM, then report the site to law enforcement. You go to jail. Anybody in the future who wants to run an image sharing site must now self-censor to try and not upset any human being that could be willing to harm them via their site. A social media site is hosting the posts and content of users. In order to be compliant and not go to jail, they must engage in extremely strict filtering, otherwise even one mistake could land them in jail. All users of the site are prohibited from posting any NSFW or even suggestive content, (including newsworthy media, such as an image of bodies in a warzone) and any violation leads to an instant ban, because any of those things could lead to a chance of actually illegal content being attached. This isn't just my opinion either. Digital rights organizations such as the Electronic Frontier Foundation have talked at length about similar policies before. To quote them: "When social media platforms adopt heavy-handed moderation policies, the unintended consequences can be hard to predict. For example, Twitter’s policies on sexual material have resulted in posts on sexual health and condoms being taken down. YouTube’s bans on violent content have resulted in journalism on the Syrian war being pulled from the site. It can be tempting to attempt to “fix” certain attitudes and behaviors online by placing increased restrictions on users’ speech, but in practice, web platforms have had more success at silencing innocent people than at making online communities healthier." Now, to address the rest of your comment, since I don't just want to focus on the beginning: I think you have to actively moderate what is uploaded Catbox does, and as previously mentioned, often at a much higher rate than other services, and at a comparable rate to many services that have millions, if not billions of dollars in annual profits that could otherwise be spent on further moderation. there has to be swifter and stricter punishment for those that do upload things that are against TOS and/or illegal. The problem isn't necessarily the speed at which people can be reported and punished, but rather that the internet is fundamentally harder to track people on than real life. It's easy for cops to sit around at a spot they know someone will be physically distributing illegal content at in real life, but digitally, even if you can see the feed of all the information passing through the service, a VPN or Tor connection will anonymize your IP address in a manner that most police departments won't be able to track, and most three-letter agencies will simply have a relatively low success rate with. There's no good solution to this problem of identifying perpetrators, which is why platforms often focus on moderation over legal enforcement actions against users so frequently. It accomplishes the goal of preventing and removing the content without having to, for example, require every single user of the internet to scan an ID (and also magically prevent people from just stealing other people's access tokens and impersonating their ID) I do agree, however, that we should probably provide larger amounts of funding, training, and resources, to divisions who's sole goal is to go after online distribution of various illegal content, primarily that which harms children, because it's certainly still an issue of there being too many reports to go through, even if many of them will still lead to dead ends. I hope that explains why making file hosting services liable for user uploaded content probably isn't the best strategy. I hate to see people with good intentions support ideas that sound good in practice, but in the end just cause more untold harms, and I hope you can understand why I believe this to be the case.
  • 21 Stimmen
    1 Beiträge
    1 Aufrufe
    Niemand hat geantwortet
  • AI model collapse is not what we paid for

    Technology technology
    20
    1
    84 Stimmen
    20 Beiträge
    4 Aufrufe
    A
    I share your frustration. I went nuts about this the other day. It was in the context of searching on a discord server, rather than Google, but it was so aggravating because of the how the "I know better than you" is everywhere nowadays in tech. The discord server was a reading group, and I was searching for discussion regarding a recent book they'd studied, by someone named "Copi". At first, I didn't use quotation marks, and I found my results were swamped with messages that included the word "copy". At this point I was fairly chill and just added quotation marks to my query to emphasise that it definitely was "Copi" I wanted. I still was swamped with messages with "copy", and it drove me mad because there is literally no way to say "fucking use the terms I give you and not the ones you think I want". The software example you give is a great example of when it would be real great to be able to have this ability. TL;DR: Solidarity in rage
  • 87 Stimmen
    10 Beiträge
    3 Aufrufe
    T
    If you want to stay on the bleeding edge you've got to be a reversal of Europe, which means allowing innovation and competition. Hence why VT is nearly 70% US.
  • Are We All Becoming More Hostile Online?

    Technology technology
    31
    1
    213 Stimmen
    31 Beiträge
    7 Aufrufe
    A
    Back in the day I just assumed everyone was lying. Or trying to get people worked up, and we called them trolls. Learning how to ignore the trolls, and not having trust for strangers on the internet, coupled with the ability to basically not care what random people said is a lost art. Somehow people forgot to give other the people this memo, including the "you don't fucking join social networks as your self". Anonymity makes this all work. Eternal September newbies just didn't get it.
  • 14 Stimmen
    2 Beiträge
    5 Aufrufe
    D
    "Extra Verification steps" I know how large social media companies operate. This is all about increasing the value of Reddit users to advertisers. The goal is to have a more accurate user database to sell them. Zuckerberg literally brags to corporations about how good their data is on users: https://www.facebook.com/business/ads/performance-marketing Here, Zuckerberg tells corporations that Instagram can easily manipulate users into purchasing shit: https://www.facebook.com/business/instagram/instagram-reels Always be wary of anything available for free. There are some quality exceptions (CBC, VLC, The Guardian, Linux, PBS, Wikipedia, Lemmy, ProPublica) but, by and large, "free" means they don't care about you. You are just a commodity that they sell. Facebook, Google, X, Reddit, Instagram... Their goal is keep people hooked to their smartphone by giving them regular small dopamine hits (likes, upvotes) followed by a small breaks with outrageous content/emotional content. Keep them hooked, gather their data, and sell them ads. The people who know that best are former top executives : https://www.theguardian.com/technology/2017/oct/05/smartphone-addiction-silicon-valley-dystopia https://www.nytimes.com/2019/03/01/business/addictive-technology.html https://www.today.com/parents/teens/facebook-whistleblower-frances-haugen-rcna15256
  • 12 Stimmen
    7 Beiträge
    7 Aufrufe
    C
    Sure, he wasn't an engineer, so no, Jobs never personally "invented" anything. But Jobs at least knew what was good and what was shit when he saw it. Under Tim Cook, Apple just keeps putting out shitty unimaginative products, Cook is allowing Apple to stagnate, a dangerous thing to do when they have under 10% market share.