linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

[JS Required] MiniMax M1 model claims Chinese LLM crown from DeepSeek - plus it's true open-source

Technology

13 Beiträge 9 Kommentatoren 24 Aufrufe

P This user is from outside of this forum
P This user is from outside of this forum
pro@programming.dev

schrieb zuletzt editiert von

#1

This post did not contain any content.

MiniMax Official Website - Intelligence with everyone

MiniMax is a leading global technology company and one of the pioneers of large language models (LLMs) in Asia. Our mission is to build a world where intelligence thrives with everyone.

(www.minimax.io)
C L F 3 Antworten Letzte Antwort

64
P pro@programming.dev

This post did not contain any content.

MiniMax Official Website - Intelligence with everyone

MiniMax is a leading global technology company and one of the pioneers of large language models (LLMs) in Asia. Our mission is to build a world where intelligence thrives with everyone.

(www.minimax.io)
C This user is from outside of this forum
C This user is from outside of this forum
camilobotero@feddit.dk

schrieb zuletzt editiert von

#2

Well…
L 1 Antwort Letzte Antwort

42
C camilobotero@feddit.dk

Well…
L This user is from outside of this forum
L This user is from outside of this forum
lwd@lemm.ee

schrieb zuletzt editiert von

#3

DeepSeek imposes similar restrictions, but only on their website. You can self-host and then enjoy relatively truthful (as truthful as a bullshit generator can be) answers about both Tianmen Square, Palestine, and South Africa (something American-made bullshit generators apparently like making up, to appease their corporate overlords or conspiracy theorists respectively).
T 1 Antwort Letzte Antwort

13
P pro@programming.dev

This post did not contain any content.

MiniMax Official Website - Intelligence with everyone

MiniMax is a leading global technology company and one of the pioneers of large language models (LLMs) in Asia. Our mission is to build a world where intelligence thrives with everyone.

(www.minimax.io)
L This user is from outside of this forum
L This user is from outside of this forum
lwd@lemm.ee

schrieb zuletzt editiert von

#4

What exactly makes this more "open source" than DeepSeek? The linked page doesn't make that particularly clear.

DeepSeek doesn't release their training data (but they release a hell of a lot of other stuff), and I think that's about as "open" as these companies can get before they risk running afoul of copyright issues. Since you can't compile the model from scratch, it's not really open source. It's just freeware. But that's true for both models, as far as I can tell.
F N 2 Antworten Letzte Antwort

9
L lwd@lemm.ee

DeepSeek imposes similar restrictions, but only on their website. You can self-host and then enjoy relatively truthful (as truthful as a bullshit generator can be) answers about both Tianmen Square, Palestine, and South Africa (something American-made bullshit generators apparently like making up, to appease their corporate overlords or conspiracy theorists respectively).
T This user is from outside of this forum
T This user is from outside of this forum
trimatrix@lemmy.world

schrieb zuletzt editiert von

#5

Nope, Self hosted deepseek 8b thinking and distilled variants still clam up about Tianmen Square
L S 2 Antworten Letzte Antwort

4
T trimatrix@lemmy.world

Nope, Self hosted deepseek 8b thinking and distilled variants still clam up about Tianmen Square
L This user is from outside of this forum
L This user is from outside of this forum
lwd@lemm.ee

schrieb zuletzt editiert von lwd@lemm.ee

#6

If you're talking about the distillations, AFAIK they take somebody else's model and run it through their (actually open-source) distiller. I tried a couple of those models because I was curious. The distilled Qwen model is cagey about Tianmen Square, but Qwen was made by Alibaba. The distillation of a US-made model did not have this problem.

(Edit: we're talking about these distillations, right? If somebody else ran a test and posted it online, I'm not privy to it.)

I don't have enough RAM to run the full DeepSeek R1, but AFAIK it doesn't have this problem. Maybe it does.

In case it isn't clear, BTW, I do despise LLMs and AI in general. The biggest issue with their lies (leaving aside every other issue with them for a moment) isn't the glaringly obvious stuff. Not Tianmen Square, and certainly not the "it's woke!" complaints about generating images of black founding fathers. The worst lies are the subtle and insidious little details like agreeableness - trying to get people to spend a little more time with them, which apparently turns once-reasonable people into members of micro-cults. Like cults, perhaps, spme skeptics think they can join in and not fall for the BS... And then they do.

All four students had by now joined their chosen groups... Hugh had completely disappeared into a nine-week Arica training seminar; he was incommunicado and had mumbled something before he left about “how my energy has moved beyond academia.”
X 1 Antwort Letzte Antwort

3
L lwd@lemm.ee

What exactly makes this more "open source" than DeepSeek? The linked page doesn't make that particularly clear.

DeepSeek doesn't release their training data (but they release a hell of a lot of other stuff), and I think that's about as "open" as these companies can get before they risk running afoul of copyright issues. Since you can't compile the model from scratch, it's not really open source. It's just freeware. But that's true for both models, as far as I can tell.
F This user is from outside of this forum
F This user is from outside of this forum
fmstrat@lemmy.nowsci.com

schrieb zuletzt editiert von

#7

Open weights + an OSI approved license is generally what is used to refer to models as open source. the with that said, Deepseek R1 is am MIT license, and this one is Apache 2. Technically that makes Deepseek less restrictive, but who knows.
1 Antwort Letzte Antwort

3
L lwd@lemm.ee

If you're talking about the distillations, AFAIK they take somebody else's model and run it through their (actually open-source) distiller. I tried a couple of those models because I was curious. The distilled Qwen model is cagey about Tianmen Square, but Qwen was made by Alibaba. The distillation of a US-made model did not have this problem.

(Edit: we're talking about these distillations, right? If somebody else ran a test and posted it online, I'm not privy to it.)

I don't have enough RAM to run the full DeepSeek R1, but AFAIK it doesn't have this problem. Maybe it does.

In case it isn't clear, BTW, I do despise LLMs and AI in general. The biggest issue with their lies (leaving aside every other issue with them for a moment) isn't the glaringly obvious stuff. Not Tianmen Square, and certainly not the "it's woke!" complaints about generating images of black founding fathers. The worst lies are the subtle and insidious little details like agreeableness - trying to get people to spend a little more time with them, which apparently turns once-reasonable people into members of micro-cults. Like cults, perhaps, spme skeptics think they can join in and not fall for the BS... And then they do.

All four students had by now joined their chosen groups... Hugh had completely disappeared into a nine-week Arica training seminar; he was incommunicado and had mumbled something before he left about “how my energy has moved beyond academia.”
X This user is from outside of this forum
X This user is from outside of this forum
xcjs@programming.dev

schrieb zuletzt editiert von

#8

That's not how distillation works if I understand what you're trying to explain.

If you distill model A to a smaller model, you just get a smaller version of model A with the same approximate distribution curve of parameters, but fewer of them. You can't distill Llama into Deepseek R1.

I've been able to run distillations of Deepseek R1 up to 70B, and they're all censored still. There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.
L 1 Antwort Letzte Antwort

2
X xcjs@programming.dev

That's not how distillation works if I understand what you're trying to explain.

If you distill model A to a smaller model, you just get a smaller version of model A with the same approximate distribution curve of parameters, but fewer of them. You can't distill Llama into Deepseek R1.

I've been able to run distillations of Deepseek R1 up to 70B, and they're all censored still. There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.
L This user is from outside of this forum
L This user is from outside of this forum
lwd@lemm.ee

schrieb zuletzt editiert von

#9

I've been able to run distillations of Deepseek R1 up to 70B

Where do you find those?

There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.

Thank you for mentioning this, as I finally confronted my own preconceptions and actually found an article by Perplexity that demonstrated R1 itself has demonstrable pro-China bias.

Although Perplexity's own description should cause anybody who understands the nature of LLMs to pause. They describe it in their header as a

version of the DeepSeek-R1 model that has been post-trained to provide unbiased, accurate, and factual information.

That's a bold (read: bullshit) statement, considering the only altered its biases on China. I wouldn't consider the original model to be unbiased either, but apparently perplexity is giving them a pass on everything else. I guess it's part of the grand corporate lie that claims "AI is unbiased," a delusion that perplexity needs to maintain.
1 Antwort Letzte Antwort

0
L lwd@lemm.ee

What exactly makes this more "open source" than DeepSeek? The linked page doesn't make that particularly clear.

DeepSeek doesn't release their training data (but they release a hell of a lot of other stuff), and I think that's about as "open" as these companies can get before they risk running afoul of copyright issues. Since you can't compile the model from scratch, it's not really open source. It's just freeware. But that's true for both models, as far as I can tell.
N This user is from outside of this forum
N This user is from outside of this forum
ngnius@lemmy.ca

schrieb zuletzt editiert von

#10

Yup, this is open weights just like DeepSeek. Open source should mean their source data is also openly available, but we all know companies won't do that until they stop violating copyright to train these things.
L 1 Antwort Letzte Antwort

5
P pro@programming.dev

This post did not contain any content.

MiniMax Official Website - Intelligence with everyone

MiniMax is a leading global technology company and one of the pioneers of large language models (LLMs) in Asia. Our mission is to build a world where intelligence thrives with everyone.

(www.minimax.io)
F This user is from outside of this forum
F This user is from outside of this forum
freewilliam@lemmy.ml

schrieb zuletzt editiert von

#11

Yay another LLM! That's definitely what the world needs and don't let anyone make you think otherwise. This is so fun guys. Let's fund the surveillance, stealing, misinformation, harmful biases, and destruction of the planet. I can't believe some people think that humanity is more important than another "open source" crazy pro max ultra 8K AI 9999!
1 Antwort Letzte Antwort

3
N ngnius@lemmy.ca

Yup, this is open weights just like DeepSeek. Open source should mean their source data is also openly available, but we all know companies won't do that until they stop violating copyright to train these things.
L This user is from outside of this forum
L This user is from outside of this forum
lwd@lemm.ee

schrieb zuletzt editiert von

#12

I figured as much. Even this line...

M1's capabilities are top-tier among open-source models

... is right above a chart that calls it "open-weight".

I dislike the conflation of terms that the OSI has helped legitimize. Up until LLMs, nobody called binary blobs "open-source" just because they were compiled using open-source tooling. That would be ridiculous
1 Antwort Letzte Antwort

1
T trimatrix@lemmy.world

Nope, Self hosted deepseek 8b thinking and distilled variants still clam up about Tianmen Square
S This user is from outside of this forum
S This user is from outside of this forum
semperverus@lemmy.world

schrieb zuletzt editiert von

#13

You want abliterated models, not distilled.
1 Antwort Letzte Antwort

0

Anmelden zum Antworten

T

Windows 11 finally overtakes Windows 10 [in marketshare]
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
32

1

63 Stimmen

32 Beiträge

17 Aufrufe

H

Yeah, and its most likely only due to them killing Windows 10 in the fall, which means a lot of companies have been working hard this year to replace a ton of computers before October. Anyone who has been down this road with 7 to 10 knows it will just cost more money if you need to continue support after that. They sell you a new license thats good for a year that will allow updates to continue. It doubles in cost every year after.
P

Kids are making deepfakes of each other, and laws aren’t keeping up
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
172

1

372 Stimmen

172 Beiträge

308 Aufrufe

S

No problem. If that doesn't work for you, ComfyUI is also a popular option, but it's more complicated.
S

How did the CIA pull off one of the most daring space espionage operations of the Cold War in 1959?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1 Stimmen

2 Beiträge

5 Aufrufe

X

How many times is this putz going to post this article under new titles before they are banned?
P

No JS, No CSS, No HTML: online "clubs" celebrate plainer websites
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
205

2

771 Stimmen

205 Beiträge

427 Aufrufe

R

Gemini is just a web replacement protocol. With basic things we remember from olden days Web, but with everything non-essential removed, for a client to be doable in a couple of days. I have my own Gemini viewer, LOL. This for me seems a completely different application from torrents. I was dreaming for a thing similar to torrent trackers for aggregating storage and computation and indexing and search, with search and aggregation and other services' responses being structured and standardized, and cryptographic identities, and some kind of market services to sell and buy storage and computation in unified and pooled, but transparent way (scripted by buyer\seller), similar to MMORPG markets, with the representation (what is a siloed service in modern web) being on the client native application, and those services allowing to build any kind of client-server huge system on them, that being global. But that's more of a global Facebook\Usenet\whatever, a killer of platforms. Their infrastructure is internal, while their representation is public on the Internet. I want to make infrastructure public on the Internet, and representation client-side, sharing it for many kinds of applications. Adding another layer to the OSI model, so to say, between transport and application layer. For this application: I think you could have some kind of Kademlia-based p2p with groups voluntarily joined (involving very huge groups) where nodes store replicas of partitions of group common data based on their pseudo-random identifiers and/or some kind of ring built from those identifiers, to balance storage and resilience. If a group has a creator, then you can have replication factor propagated signed by them, and membership too signed by them. But if having a creator (even with cryptographically delegated decisions) and propagating changes by them is not ok, then maybe just using whole data hash, or it's bittorrent-like info tree hash, as namespace with peers freely joining it can do. Then it may be better to partition not by parts of the whole piece, but by info tree? I guess making it exactly bittorrent-like is not a good idea, rather some kind of block tree, like for a filesystem, and a separate piece of information to lookup which file is in which blocks. If we are doing directory structure. Then, with freely joining it, there's no need in any owners or replication factors, I guess just pseudorandom distribution of hashes will do, and each node storing first partitions closest to its hash. Now thinking about it, such a system would be not that different from bittorrent and can even be interoperable with it. There's the issue of updates, yes, hence I've started with groups having hierarchy of creators, who can make or accept those updates. Having that and the ability to gradually store one group's data to another group, it should be possible to do forks of a certain state. But that line of thought makes reusing bittorrent only possible for part of the system. The whole database is guaranteed to be more than a normal HDD (1 TB? I dunno). Absolutely guaranteed, no doubt at all. 1 TB (for example) would be someone's collection of favorite stuff, and not too rich one.
L

life trip
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

0 Stimmen

1 Beiträge

8 Aufrufe

Niemand hat geantwortet
A

Trump signs orders to bolster US drone defenses, boost supersonic flight
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
9

72 Stimmen

9 Beiträge

39 Aufrufe

M

Mr President, could you describe supersonic flight? (said with the emotion of "for all us dumbasses") Oh man there's going to be a barrier, but it's invisible, but it's the greatest barrier man has ever known. I gotta stop
P

The Internet of Consent
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

11 Stimmen

1 Beiträge

9 Aufrufe

Niemand hat geantwortet
D

Paul McCartney and Dua Lipa urge UK Prime Minister to rethink his AI copyright plans. A new law could soon allow AI companies to use copyrighted material without permission.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
107

1

873 Stimmen

107 Beiträge

143 Aufrufe

S

How are they going to make money off of these projects if people can legally copy and redistribute them for free? The same reasons everyone doesn't already do this via pirating. You mean copy, not steal. When something is stolen from you, you no longer have it. Wow you are just a troll, thanks for showing me so I don't waste anymore time with you.