Skip to content

[JS Required] MiniMax M1 model claims Chinese LLM crown from DeepSeek - plus it's true open-source

Technology
12 8 0
  • This post did not contain any content.
  • This post did not contain any content.

    Well… 🤔

  • DeepSeek imposes similar restrictions, but only on their website. You can self-host and then enjoy relatively truthful (as truthful as a bullshit generator can be) answers about both Tianmen Square, Palestine, and South Africa (something American-made bullshit generators apparently like making up, to appease their corporate overlords or conspiracy theorists respectively).

  • This post did not contain any content.

    What exactly makes this more "open source" than DeepSeek? The linked page doesn't make that particularly clear.

    DeepSeek doesn't release their training data (but they release a hell of a lot of other stuff), and I think that's about as "open" as these companies can get before they risk running afoul of copyright issues. Since you can't compile the model from scratch, it's not really open source. It's just freeware. But that's true for both models, as far as I can tell.

  • DeepSeek imposes similar restrictions, but only on their website. You can self-host and then enjoy relatively truthful (as truthful as a bullshit generator can be) answers about both Tianmen Square, Palestine, and South Africa (something American-made bullshit generators apparently like making up, to appease their corporate overlords or conspiracy theorists respectively).

    Nope, Self hosted deepseek 8b thinking and distilled variants still clam up about Tianmen Square

  • Nope, Self hosted deepseek 8b thinking and distilled variants still clam up about Tianmen Square

    If you're talking about the distillations, AFAIK they take somebody else's model and run it through their (actually open-source) distiller. I tried a couple of those models because I was curious. The distilled Qwen model is cagey about Tianmen Square, but Qwen was made by Alibaba. The distillation of a US-made model did not have this problem.

    (Edit: we're talking about these distillations, right? If somebody else ran a test and posted it online, I'm not privy to it.)

    I don't have enough RAM to run the full DeepSeek R1, but AFAIK it doesn't have this problem. Maybe it does.

    In case it isn't clear, BTW, I do despise LLMs and AI in general. The biggest issue with their lies (leaving aside every other issue with them for a moment) isn't the glaringly obvious stuff. Not Tianmen Square, and certainly not the "it's woke!" complaints about generating images of black founding fathers. The worst lies are the subtle and insidious little details like agreeableness - trying to get people to spend a little more time with them, which apparently turns once-reasonable people into members of micro-cults. Like cults, perhaps, spme skeptics think they can join in and not fall for the BS... And then they do.

    All four students had by now joined their chosen groups... Hugh had completely disappeared into a nine-week Arica training seminar; he was incommunicado and had mumbled something before he left about “how my energy has moved beyond academia.”

  • What exactly makes this more "open source" than DeepSeek? The linked page doesn't make that particularly clear.

    DeepSeek doesn't release their training data (but they release a hell of a lot of other stuff), and I think that's about as "open" as these companies can get before they risk running afoul of copyright issues. Since you can't compile the model from scratch, it's not really open source. It's just freeware. But that's true for both models, as far as I can tell.

    Open weights + an OSI approved license is generally what is used to refer to models as open source. the with that said, Deepseek R1 is am MIT license, and this one is Apache 2. Technically that makes Deepseek less restrictive, but who knows.

  • If you're talking about the distillations, AFAIK they take somebody else's model and run it through their (actually open-source) distiller. I tried a couple of those models because I was curious. The distilled Qwen model is cagey about Tianmen Square, but Qwen was made by Alibaba. The distillation of a US-made model did not have this problem.

    (Edit: we're talking about these distillations, right? If somebody else ran a test and posted it online, I'm not privy to it.)

    I don't have enough RAM to run the full DeepSeek R1, but AFAIK it doesn't have this problem. Maybe it does.

    In case it isn't clear, BTW, I do despise LLMs and AI in general. The biggest issue with their lies (leaving aside every other issue with them for a moment) isn't the glaringly obvious stuff. Not Tianmen Square, and certainly not the "it's woke!" complaints about generating images of black founding fathers. The worst lies are the subtle and insidious little details like agreeableness - trying to get people to spend a little more time with them, which apparently turns once-reasonable people into members of micro-cults. Like cults, perhaps, spme skeptics think they can join in and not fall for the BS... And then they do.

    All four students had by now joined their chosen groups... Hugh had completely disappeared into a nine-week Arica training seminar; he was incommunicado and had mumbled something before he left about “how my energy has moved beyond academia.”

    That's not how distillation works if I understand what you're trying to explain.

    If you distill model A to a smaller model, you just get a smaller version of model A with the same approximate distribution curve of parameters, but fewer of them. You can't distill Llama into Deepseek R1.

    I've been able to run distillations of Deepseek R1 up to 70B, and they're all censored still. There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.

  • That's not how distillation works if I understand what you're trying to explain.

    If you distill model A to a smaller model, you just get a smaller version of model A with the same approximate distribution curve of parameters, but fewer of them. You can't distill Llama into Deepseek R1.

    I've been able to run distillations of Deepseek R1 up to 70B, and they're all censored still. There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.

    I've been able to run distillations of Deepseek R1 up to 70B

    Where do you find those?

    There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.

    Thank you for mentioning this, as I finally confronted my own preconceptions and actually found an article by Perplexity that demonstrated R1 itself has demonstrable pro-China bias.

    Although Perplexity's own description should cause anybody who understands the nature of LLMs to pause. They describe it in their header as a

    version of the DeepSeek-R1 model that has been post-trained to provide unbiased, accurate, and factual information.

    That's a bold (read: bullshit) statement, considering the only altered its biases on China. I wouldn't consider the original model to be unbiased either, but apparently perplexity is giving them a pass on everything else. I guess it's part of the grand corporate lie that claims "AI is unbiased," a delusion that perplexity needs to maintain.

  • What exactly makes this more "open source" than DeepSeek? The linked page doesn't make that particularly clear.

    DeepSeek doesn't release their training data (but they release a hell of a lot of other stuff), and I think that's about as "open" as these companies can get before they risk running afoul of copyright issues. Since you can't compile the model from scratch, it's not really open source. It's just freeware. But that's true for both models, as far as I can tell.

    Yup, this is open weights just like DeepSeek. Open source should mean their source data is also openly available, but we all know companies won't do that until they stop violating copyright to train these things.

  • This post did not contain any content.

    Yay another LLM! That's definitely what the world needs and don't let anyone make you think otherwise. This is so fun guys. Let's fund the surveillance, stealing, misinformation, harmful biases, and destruction of the planet. I can't believe some people think that humanity is more important than another "open source" crazy pro max ultra 8K AI 9999!

  • Yup, this is open weights just like DeepSeek. Open source should mean their source data is also openly available, but we all know companies won't do that until they stop violating copyright to train these things.

    I figured as much. Even this line...

    M1's capabilities are top-tier among open-source models

    ... is right above a chart that calls it "open-weight".

    I dislike the conflation of terms that the OSI has helped legitimize. Up until LLMs, nobody called binary blobs "open-source" just because they were compiled using open-source tooling. That would be ridiculous

  • Palantir hits new highs amid Israel-Iran conflict

    Technology technology
    4
    1
    39 Stimmen
    4 Beiträge
    0 Aufrufe
    W
    I think both peace and war are profitable. But those that profit from war may be more pushy than those that profit from peace, and so may get their way even as an unpopular minority . Unless, the left (usually more pro peace) learns a few lessons from the right and places good outcomes above the holier than thou moral purity. "I've never made anyone uncomfortable" is not the merit badge that some think it is. Of course the left can never be a mirror copy of the right because the left cannot afford to give as few fucks about anything as the right (who represent the already-haves economic incumbents; it's not called the "fuck you money" for nothing). But the left can be way tougher and nuancedly uncompromising and even calculatingly and carefully millitant. Might does not make right but might DOES make POLICY. You need both right and might to live under a good policy. Lotta good it does anyone to be right and insightful on all the issues and have zero impact anywhere.
  • 386 Stimmen
    9 Beiträge
    6 Aufrufe
    C
    Melon Usk doomed their FSD efforts from the start with his dunning-kruger-brain take of "humans drive just using their eyes, so cars shouldn't need any sensors besides cameras." Considering how many excellent engineers there are (or were, at least) at his companies, it's kind of fascinating how "stupid at the top" is just as bad, if not worse, than "stupid all the way down."
  • The Arc Browser Is Dead

    Technology technology
    88
    241 Stimmen
    88 Beiträge
    24 Aufrufe
    P
    Haha, it's funny that you went that far. I think the reason why I notice it and you don't, is the 4k factor. My screen is 1920x1200 iirc.
  • The AI girlfriend guy - The Paranoia Of The AI Era

    Technology technology
    1
    1
    6 Stimmen
    1 Beiträge
    3 Aufrufe
    Niemand hat geantwortet
  • Cloudflare built an oauth provider with Claude

    Technology technology
    23
    1
    34 Stimmen
    23 Beiträge
    8 Aufrufe
    A
    I have to say that you just have to sayed something up
  • Why doesn't Nvidia have more competition?

    Technology technology
    22
    1
    33 Stimmen
    22 Beiträge
    5 Aufrufe
    B
    It’s funny how the article asks the question, but completely fails to answer it. About 15 years ago, Nvidia discovered there was a demand for compute in datacenters that could be met with powerful GPU’s, and they were quick to respond to it, and they had the resources to focus on it strongly, because of their huge success and high profitability in the GPU market. AMD also saw the market, and wanted to pursue it, but just over a decade ago where it began to clearly show the high potential for profitability, AMD was near bankrupt, and was very hard pressed to finance developments on GPU and compute in datacenters. AMD really tried the best they could, and was moderately successful from a technology perspective, but Nvidia already had a head start, and the proprietary development system CUDA was already an established standard that was very hard to penetrate. Intel simply fumbled the ball from start to finish. After a decade of trying to push ARM down from having the mobile crown by far, investing billions or actually the equivalent of ARM’s total revenue. They never managed to catch up to ARM despite they had the better production process at the time. This was the main focus of Intel, and Intel believed that GPU would never be more than a niche product. So when intel tried to compete on compute for datacenters, they tried to do it with X86 chips, One of their most bold efforts was to build a monstrosity of a cluster of Celeron chips, which of course performed laughably bad compared to Nvidia! Because as it turns out, the way forward at least for now, is indeed the massively parralel compute capability of a GPU, which Nvidia has refined for decades, only with (inferior) competition from AMD. But despite the lack of competition, Nvidia did not slow down, in fact with increased profits, they only grew bolder in their efforts. Making it even harder to catch up. Now AMD has had more money to compete for a while, and they do have some decent compute units, but Nvidia remains ahead and the CUDA problem is still there, so for AMD to really compete with Nvidia, they have to be better to attract customers. That’s a very tall order against Nvidia that simply seems to never stop progressing. So the only other option for AMD is to sell a bit cheaper. Which I suppose they have to. AMD and Intel were the obvious competitors, everybody else is coming from even further behind. But if I had to make a bet, it would be on Huawei. Huawei has some crazy good developers, and Trump is basically forcing them to figure it out themselves, because he is blocking Huawei and China in general from using both AMD and Nvidia AI chips. And the chips will probably be made by Chinese SMIC, because they are also prevented from using advanced production in the west, most notably TSMC. China will prevail, because it’s become a national project, of both prestige and necessity, and they have a massive talent mass and resources, so nothing can stop it now. IMO USA would clearly have been better off allowing China to use American chips. Now China will soon compete directly on both production and design too.
  • 5 Stimmen
    1 Beiträge
    1 Aufrufe
    Niemand hat geantwortet
  • 24 Stimmen
    2 Beiträge
    4 Aufrufe
    toastedravioli@midwest.socialT
    Im all for making the traditional market more efficient and transparent, if blockchain can accommodate that, so long as we can also make crypto more like the traditional market. At least in terms of criminalizing shit that would obviously be illegal to do with securities