Skip to content

[JS Required] MiniMax M1 model claims Chinese LLM crown from DeepSeek - plus it's true open-source

Technology
9 6 0
  • This post did not contain any content.
  • This post did not contain any content.

    Well… 🤔

  • DeepSeek imposes similar restrictions, but only on their website. You can self-host and then enjoy relatively truthful (as truthful as a bullshit generator can be) answers about both Tianmen Square, Palestine, and South Africa (something American-made bullshit generators apparently like making up, to appease their corporate overlords or conspiracy theorists respectively).

  • This post did not contain any content.

    What exactly makes this more "open source" than DeepSeek? The linked page doesn't make that particularly clear.

    DeepSeek doesn't release their training data (but they release a hell of a lot of other stuff), and I think that's about as "open" as these companies can get before they risk running afoul of copyright issues. Since you can't compile the model from scratch, it's not really open source. It's just freeware. But that's true for both models, as far as I can tell.

  • DeepSeek imposes similar restrictions, but only on their website. You can self-host and then enjoy relatively truthful (as truthful as a bullshit generator can be) answers about both Tianmen Square, Palestine, and South Africa (something American-made bullshit generators apparently like making up, to appease their corporate overlords or conspiracy theorists respectively).

    Nope, Self hosted deepseek 8b thinking and distilled variants still clam up about Tianmen Square

  • Nope, Self hosted deepseek 8b thinking and distilled variants still clam up about Tianmen Square

    If you're talking about the distillations, AFAIK they take somebody else's model and run it through their (actually open-source) distiller. I tried a couple of those models because I was curious. The distilled Qwen model is cagey about Tianmen Square, but Qwen was made by Alibaba. The distillation of a US-made model did not have this problem.

    (Edit: we're talking about these distillations, right? If somebody else ran a test and posted it online, I'm not privy to it.)

    I don't have enough RAM to run the full DeepSeek R1, but AFAIK it doesn't have this problem. Maybe it does.

    In case it isn't clear, BTW, I do despise LLMs and AI in general. The biggest issue with their lies (leaving aside every other issue with them for a moment) isn't the glaringly obvious stuff. Not Tianmen Square, and certainly not the "it's woke!" complaints about generating images of black founding fathers. The worst lies are the subtle and insidious little details like agreeableness - trying to get people to spend a little more time with them, which apparently turns once-reasonable people into members of micro-cults. Like cults, perhaps, spme skeptics think they can join in and not fall for the BS... And then they do.

    All four students had by now joined their chosen groups... Hugh had completely disappeared into a nine-week Arica training seminar; he was incommunicado and had mumbled something before he left about “how my energy has moved beyond academia.”

  • What exactly makes this more "open source" than DeepSeek? The linked page doesn't make that particularly clear.

    DeepSeek doesn't release their training data (but they release a hell of a lot of other stuff), and I think that's about as "open" as these companies can get before they risk running afoul of copyright issues. Since you can't compile the model from scratch, it's not really open source. It's just freeware. But that's true for both models, as far as I can tell.

    Open weights + an OSI approved license is generally what is used to refer to models as open source. the with that said, Deepseek R1 is am MIT license, and this one is Apache 2. Technically that makes Deepseek less restrictive, but who knows.

  • If you're talking about the distillations, AFAIK they take somebody else's model and run it through their (actually open-source) distiller. I tried a couple of those models because I was curious. The distilled Qwen model is cagey about Tianmen Square, but Qwen was made by Alibaba. The distillation of a US-made model did not have this problem.

    (Edit: we're talking about these distillations, right? If somebody else ran a test and posted it online, I'm not privy to it.)

    I don't have enough RAM to run the full DeepSeek R1, but AFAIK it doesn't have this problem. Maybe it does.

    In case it isn't clear, BTW, I do despise LLMs and AI in general. The biggest issue with their lies (leaving aside every other issue with them for a moment) isn't the glaringly obvious stuff. Not Tianmen Square, and certainly not the "it's woke!" complaints about generating images of black founding fathers. The worst lies are the subtle and insidious little details like agreeableness - trying to get people to spend a little more time with them, which apparently turns once-reasonable people into members of micro-cults. Like cults, perhaps, spme skeptics think they can join in and not fall for the BS... And then they do.

    All four students had by now joined their chosen groups... Hugh had completely disappeared into a nine-week Arica training seminar; he was incommunicado and had mumbled something before he left about “how my energy has moved beyond academia.”

    That's not how distillation works if I understand what you're trying to explain.

    If you distill model A to a smaller model, you just get a smaller version of model A with the same approximate distribution curve of parameters, but fewer of them. You can't distill Llama into Deepseek R1.

    I've been able to run distillations of Deepseek R1 up to 70B, and they're all censored still. There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.

  • That's not how distillation works if I understand what you're trying to explain.

    If you distill model A to a smaller model, you just get a smaller version of model A with the same approximate distribution curve of parameters, but fewer of them. You can't distill Llama into Deepseek R1.

    I've been able to run distillations of Deepseek R1 up to 70B, and they're all censored still. There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.

    I've been able to run distillations of Deepseek R1 up to 70B

    Where do you find those?

    There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.

    Thank you for mentioning this, as I finally confronted my own preconceptions and actually found an article by Perplexity that demonstrated R1 itself has demonstrable pro-China bias.

    Although Perplexity's own description should cause anybody who understands the nature of LLMs to pause. They describe it in their header as a

    version of the DeepSeek-R1 model that has been post-trained to provide unbiased, accurate, and factual information.

    That's a bold (read: bullshit) statement, considering the only altered its biases on China. I wouldn't consider the original model to be unbiased either, but apparently perplexity is giving them a pass on everything else. I guess it's part of the grand corporate lie that claims "AI is unbiased," a delusion that perplexity needs to maintain.

  • Science and Technology News and Commentary: Aardvark Daily

    Technology technology
    2
    7 Stimmen
    2 Beiträge
    1 Aufrufe
    I
    What are you on about with this? Last news post 2013?
  • 137 Stimmen
    41 Beiträge
    4 Aufrufe
    R
    And I think you swallowed one too many Apple ads.
  • 7 Stimmen
    6 Beiträge
    3 Aufrufe
    db0@lemmy.dbzer0.comD
    VC-backed OpenAI is the most valuable company in the world and is engaging in massive environmental destruction. The US state just went into cahoots with them to the tune of billions VC-backed Uber and AirBnb disrupted multiple estabilished industries for the worst by undercutting them through loss-leading. VC-backed Facebook killed or purchased all its rivals and consolidated almost all social media to the detriment of the whole world.
  • The AI girlfriend guy - The Paranoia Of The AI Era

    Technology technology
    1
    1
    6 Stimmen
    1 Beiträge
    2 Aufrufe
    Niemand hat geantwortet
  • Britain’s Companies Are Being Hacked

    Technology technology
    9
    1
    21 Stimmen
    9 Beiträge
    4 Aufrufe
    D
    Is that "goodbye" in Russian? Why?
  • 0 Stimmen
    6 Beiträge
    2 Aufrufe
    L
    Divide and conquer. Non state-actors and special interest have a far easier time attacking a hundred small entities than one big one. Because people have much less bandwidth to track all this shit than it is to spread it around. See ALEC and the strategy behind state rights. In the end this is about economic power. The only way to curb it is through a democratic government. Lemmy servers too can be bought and sold and the communities captured that grew on them.
  • The Enshitification of Youtube’s Full Album Playlists

    Technology technology
    3
    1
    108 Stimmen
    3 Beiträge
    5 Aufrufe
    dual_sport_dork@lemmy.worldD
    Especially when the poster does not disclose that it's AI. The perpetual Youtube rabbit hole occasionally lands on one of these for me when I leave it unsupervised, and usually you can tell from the "cover" art. But only if you're looking at it. Because if you just leave it going in the background eventually you start to realize, "Wow, this guy really tripped over the fine line between a groove and rut." Then you click on it and look: Curses! Foiled again. And golly gee, I'm sure glad Youtube took away the option to oughtright block channels. I'm sure that's a total coincidence. W/e. I'm a have-it-on-my-hard-drive kind of bird. Yt-dlp is your friend. Just use it to nab whatever it is you actually want and let your own media player decide how to shuffle and present it. This works great for big name commercial music as well, whereupon the record labels are inevitably dumb enough to post songs and albums in their entirety right there you Youtube. Who even needs piracy sites at that rate? Yoink!
  • 33 Stimmen
    8 Beiträge
    4 Aufrufe
    J
    Apparently, it was required to be allowed in that state: Reading a bit more, during the sentencing phase in that state people making victim impact statements can choose their format for expression, and it's entirely allowed to make statements about what other people would say. So the judge didn't actually have grounds to deny it. No jury during that phase, so it's just the judge listening to free form requests in both directions. It's gross, but the rules very much allow the sister to make a statement about what she believes her brother would have wanted to say, in whatever format she wanted. From: https://sh.itjust.works/comment/18471175 influence the sentence From what I've seen, to be fair, judges' decisions have varied wildly regardless, sadly, and sentences should be more standardized. I wonder what it would've been otherwise.