linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

ChatGPT Mostly Source Wikipedia; Google AI Overviews Mostly Source Reddit

Technology

21 Beiträge 15 Kommentatoren 50 Aufrufe

C chulk@lemmy.ml

Throughout most of my years of higher education as well as k-12, I was told that sourcing Wikipedia was forbidden. In fact, many professors/teachers would automatically fail an assignment if they felt you were using wikipedia. The claim was that the information was often inaccurate, or changing too frequently to be reliable. This reasoning, while irritating at times, always made sense to me.

Fast forward to my professional life today. I've been told on a number of occasions that I should trust LLMs to give me an accurate answer. I'm told that I will "be left behind" if I don't use ChatGPT to accomplish things faster. I'm told that my concerns of accuracy and ethics surrounding generative AI is simply "negativity."

These tools are (abstractly) referencing random users on the internet as well as Wikipedia and treating them both as legitimate sources of information. That seems crazy to me. How can we trust a technology that just references flawed sources from our past? I know there's ways to improve accuracy with things like RAG, but most people are hitting the LLM directly.

The culture around Generative AI should be scientific and cautious, but instead it feels like a cult with a good marketing team.
C This user is from outside of this forum
C This user is from outside of this forum
crawancon@lemm.ee

schrieb zuletzt editiert von

#6

all good points.

i think the tech is not being governed by the technically inclined and/or the technically inclined are not involved in business enough but either way there's a huge lack of governance over tools that are growing to be sources of search requests. you're right. it feels like marketing won. really a long time ago but still, furthering whatever that means with latest technical progression leads to just awful shit.
see: microtransactions
1 Antwort Letzte Antwort

6
P pro@programming.dev

A study from Profound of OpenAI's ChatGPT, Google AI Overviews and Perplexity shows that while ChatGPT mostly sources its information from Wikipedia, Google AI Overviews and Perplexity mostly source their information from Reddit.

ChatGPT Sources Mostly From Wikipedia While Google AI Overviews Sources Mostly From Reddit

A study from Profound of OpenAI's ChatGPT, Google AI Overviews and Perplexity shows that while ChatGPT mostly sources its information from Wikipedia, Google AI Overviews and Perplexity mostly source their information from Reddit.

Search Engine Roundtable (www.seroundtable.com)
N This user is from outside of this forum
N This user is from outside of this forum
nightlily@leminal.space

schrieb zuletzt editiert von

#7

Anyone who has any domain knowledge and experience knows how much of reddit is just repeated debunked falsehoods and armchair takes. Please continue to poison your LLMs with it.
1 Antwort Letzte Antwort

2
P pro@programming.dev

A study from Profound of OpenAI's ChatGPT, Google AI Overviews and Perplexity shows that while ChatGPT mostly sources its information from Wikipedia, Google AI Overviews and Perplexity mostly source their information from Reddit.

ChatGPT Sources Mostly From Wikipedia While Google AI Overviews Sources Mostly From Reddit

A study from Profound of OpenAI's ChatGPT, Google AI Overviews and Perplexity shows that while ChatGPT mostly sources its information from Wikipedia, Google AI Overviews and Perplexity mostly source their information from Reddit.

Search Engine Roundtable (www.seroundtable.com)
T This user is from outside of this forum
T This user is from outside of this forum
tabular@lemmy.world

schrieb zuletzt editiert von

#8

Wikipedia content is usually copyleft isn't it? BigAI doing the BigEvil, redistribution without attribution or reaffirming the rights given back from Copyright by copyleft.
1 Antwort Letzte Antwort

4
P pro@programming.dev

A study from Profound of OpenAI's ChatGPT, Google AI Overviews and Perplexity shows that while ChatGPT mostly sources its information from Wikipedia, Google AI Overviews and Perplexity mostly source their information from Reddit.

ChatGPT Sources Mostly From Wikipedia While Google AI Overviews Sources Mostly From Reddit

A study from Profound of OpenAI's ChatGPT, Google AI Overviews and Perplexity shows that while ChatGPT mostly sources its information from Wikipedia, Google AI Overviews and Perplexity mostly source their information from Reddit.

Search Engine Roundtable (www.seroundtable.com)
? Offline
? Offline
Gast

schrieb zuletzt editiert von

#9

Original source instead of blogspam: https://www.tryprofound.com/blog/ai-platform-citation-patterns
1 Antwort Letzte Antwort

12
P pro@programming.dev

A study from Profound of OpenAI's ChatGPT, Google AI Overviews and Perplexity shows that while ChatGPT mostly sources its information from Wikipedia, Google AI Overviews and Perplexity mostly source their information from Reddit.

ChatGPT Sources Mostly From Wikipedia While Google AI Overviews Sources Mostly From Reddit

A study from Profound of OpenAI's ChatGPT, Google AI Overviews and Perplexity shows that while ChatGPT mostly sources its information from Wikipedia, Google AI Overviews and Perplexity mostly source their information from Reddit.

Search Engine Roundtable (www.seroundtable.com)
M This user is from outside of this forum
M This user is from outside of this forum
magicshel@lemmy.zip

schrieb zuletzt editiert von

#10

I used ChatGPT on something and got a response sourced from Reddit. I told it I'd be more likely to believe the answer if it told me it had simply made up the answer. It then provided better references.

I don't remember what it was but it was definitely something that would be answered by an expert on Reddit, but would also be answered by idiots on Reddit and I didn't want to take chances.
1 Antwort Letzte Antwort

5
C chulk@lemmy.ml

Throughout most of my years of higher education as well as k-12, I was told that sourcing Wikipedia was forbidden. In fact, many professors/teachers would automatically fail an assignment if they felt you were using wikipedia. The claim was that the information was often inaccurate, or changing too frequently to be reliable. This reasoning, while irritating at times, always made sense to me.

Fast forward to my professional life today. I've been told on a number of occasions that I should trust LLMs to give me an accurate answer. I'm told that I will "be left behind" if I don't use ChatGPT to accomplish things faster. I'm told that my concerns of accuracy and ethics surrounding generative AI is simply "negativity."

These tools are (abstractly) referencing random users on the internet as well as Wikipedia and treating them both as legitimate sources of information. That seems crazy to me. How can we trust a technology that just references flawed sources from our past? I know there's ways to improve accuracy with things like RAG, but most people are hitting the LLM directly.

The culture around Generative AI should be scientific and cautious, but instead it feels like a cult with a good marketing team.
M This user is from outside of this forum
M This user is from outside of this forum
mniot@programming.dev

schrieb zuletzt editiert von

#11

I think the academic advice about Wikipedia was sadly mistaken. It's true that Wikipedia contains errors, but so do other sources. The problem was that it was a new thing and the idea that someone could vandalize a page startled people. It turns out, though, that Wikipedia has pretty good controls for this over a reasonable time-window. And there's a history of edits. And most pages are accurate and free from vandalism.

Just as you should not uncritically read any of your other sources, you shouldn't uncritically read Wikipedia as a source. But if you are going to uncritically read, Wikipedia's far from the worst thing to blindly trust.
C A 2 Antworten Letzte Antwort

6
C chulk@lemmy.ml

Throughout most of my years of higher education as well as k-12, I was told that sourcing Wikipedia was forbidden. In fact, many professors/teachers would automatically fail an assignment if they felt you were using wikipedia. The claim was that the information was often inaccurate, or changing too frequently to be reliable. This reasoning, while irritating at times, always made sense to me.

Fast forward to my professional life today. I've been told on a number of occasions that I should trust LLMs to give me an accurate answer. I'm told that I will "be left behind" if I don't use ChatGPT to accomplish things faster. I'm told that my concerns of accuracy and ethics surrounding generative AI is simply "negativity."

These tools are (abstractly) referencing random users on the internet as well as Wikipedia and treating them both as legitimate sources of information. That seems crazy to me. How can we trust a technology that just references flawed sources from our past? I know there's ways to improve accuracy with things like RAG, but most people are hitting the LLM directly.

The culture around Generative AI should be scientific and cautious, but instead it feels like a cult with a good marketing team.
E This user is from outside of this forum
E This user is from outside of this forum
edryd@lemmy.world

schrieb zuletzt editiert von

#12

The common reasons given why Wikipedia shouldn't be cited is often missing the main reason. You shouldn't cite Wikipedia because it is not a source of information, it is a summary of other sources which are referenced.

You shouldn't cite Wikipedia for the same reason you shouldn't cite a library's book report, you should read and cite the book itself. Libraries are a great resource and their reading lists and summaries of books can be a great starting point for research, just like Wikipedia. But citing the library instead of the book is just intellectual laziness and shows to any researcher you are not serious.

Wikipedia itself also says the same thing:
https://en.m.wikipedia.org/wiki/Wikipedia:Citing_Wikipedia
C 1 Antwort Letzte Antwort

17
E edryd@lemmy.world

The common reasons given why Wikipedia shouldn't be cited is often missing the main reason. You shouldn't cite Wikipedia because it is not a source of information, it is a summary of other sources which are referenced.

You shouldn't cite Wikipedia for the same reason you shouldn't cite a library's book report, you should read and cite the book itself. Libraries are a great resource and their reading lists and summaries of books can be a great starting point for research, just like Wikipedia. But citing the library instead of the book is just intellectual laziness and shows to any researcher you are not serious.

Wikipedia itself also says the same thing:
https://en.m.wikipedia.org/wiki/Wikipedia:Citing_Wikipedia
C This user is from outside of this forum
C This user is from outside of this forum
chulk@lemmy.ml

schrieb zuletzt editiert von

#13

You shouldn’t cite Wikipedia because it is not a source of information, it is a summary of other sources which are referenced.

Right, and if an LLM is citing Wikipedia 47.9% of the time, that means that it's summarizing Wikipedia's summary.

You shouldn’t cite Wikipedia for the same reason you shouldn’t cite a library’s book report, you should read and cite the book itself.

Exactly my point.
1 Antwort Letzte Antwort

4
M mniot@programming.dev

I think the academic advice about Wikipedia was sadly mistaken. It's true that Wikipedia contains errors, but so do other sources. The problem was that it was a new thing and the idea that someone could vandalize a page startled people. It turns out, though, that Wikipedia has pretty good controls for this over a reasonable time-window. And there's a history of edits. And most pages are accurate and free from vandalism.

Just as you should not uncritically read any of your other sources, you shouldn't uncritically read Wikipedia as a source. But if you are going to uncritically read, Wikipedia's far from the worst thing to blindly trust.
C This user is from outside of this forum
C This user is from outside of this forum
chulk@lemmy.ml

schrieb zuletzt editiert von

#14

I think the academic advice about Wikipedia was sadly mistaken.

Yeah, a lot of people had your perspective about Wikipedia while I was in college, but they are wrong, according to Wikipedia.

From the link:

We advise special caution when using Wikipedia as a source for research projects. Normal academic usage of Wikipedia is for getting the general facts of a problem and to gather keywords, references and bibliographical pointers, but not as a source in itself. Remember that Wikipedia is a wiki. Anyone in the world can edit an article, deleting accurate information or adding false information, which the reader may not recognize. Thus, you probably shouldn't be citing Wikipedia. This is good advice for all tertiary sources such as encyclopedias, which are designed to introduce readers to a topic, not to be the final point of reference. Wikipedia, like other encyclopedias, provides overviews of a topic and indicates sources of more extensive information.

I personally use ChatGPT like I would Wikipedia. It's a great introduction to a subject, especially in my line of work, which is software development. I can get summarized information about new languages and frameworks really quickly, and then I can dive into the official documentation when I have a high level understanding of the topic at hand. Unfortunately, most people do not use LLMs this way.
M 1 Antwort Letzte Antwort

1
M mniot@programming.dev

I think the academic advice about Wikipedia was sadly mistaken. It's true that Wikipedia contains errors, but so do other sources. The problem was that it was a new thing and the idea that someone could vandalize a page startled people. It turns out, though, that Wikipedia has pretty good controls for this over a reasonable time-window. And there's a history of edits. And most pages are accurate and free from vandalism.

Just as you should not uncritically read any of your other sources, you shouldn't uncritically read Wikipedia as a source. But if you are going to uncritically read, Wikipedia's far from the worst thing to blindly trust.
A This user is from outside of this forum
A This user is from outside of this forum
antonim@lemmy.dbzer0.com

schrieb zuletzt editiert von

#15

I think the academic advice about Wikipedia was sadly mistaken.

It wasn't mistaken 10 or especially 15 years ago, however. Check how some articles looked back then, you'll see vastly fewer sources and overall a less professional-looking text. These days I think most professors will agree that it's fine as a starting point (depending on the subject, at least; I still come across unsourced nonsensical crap here and there, slowly correcting it myself).
M 1 Antwort Letzte Antwort

0
A antonim@lemmy.dbzer0.com

I think the academic advice about Wikipedia was sadly mistaken.

It wasn't mistaken 10 or especially 15 years ago, however. Check how some articles looked back then, you'll see vastly fewer sources and overall a less professional-looking text. These days I think most professors will agree that it's fine as a starting point (depending on the subject, at least; I still come across unsourced nonsensical crap here and there, slowly correcting it myself).
M This user is from outside of this forum
M This user is from outside of this forum
mniot@programming.dev

schrieb zuletzt editiert von

#16

I think it was. When I think of Wikipedia, I'm thinking about how it was in ~2005 (20 years ago) and it was a pretty solid encyclopedia then.

There were (and still are) some articles that are very thin. And some that have errors. Both of these things are true of non-wiki encyclopedias. When I've seen a poorly-written article, it's usually on a subject that a standard encyclopedia wouldn't even cover. So I feel like that was still a giant win for Wikipedia.
A 1 Antwort Letzte Antwort

1
C chulk@lemmy.ml

I think the academic advice about Wikipedia was sadly mistaken.

Yeah, a lot of people had your perspective about Wikipedia while I was in college, but they are wrong, according to Wikipedia.

From the link:

We advise special caution when using Wikipedia as a source for research projects. Normal academic usage of Wikipedia is for getting the general facts of a problem and to gather keywords, references and bibliographical pointers, but not as a source in itself. Remember that Wikipedia is a wiki. Anyone in the world can edit an article, deleting accurate information or adding false information, which the reader may not recognize. Thus, you probably shouldn't be citing Wikipedia. This is good advice for all tertiary sources such as encyclopedias, which are designed to introduce readers to a topic, not to be the final point of reference. Wikipedia, like other encyclopedias, provides overviews of a topic and indicates sources of more extensive information.

I personally use ChatGPT like I would Wikipedia. It's a great introduction to a subject, especially in my line of work, which is software development. I can get summarized information about new languages and frameworks really quickly, and then I can dive into the official documentation when I have a high level understanding of the topic at hand. Unfortunately, most people do not use LLMs this way.
M This user is from outside of this forum
M This user is from outside of this forum
mniot@programming.dev

schrieb zuletzt editiert von

#17

This is good advice for all tertiary sources such as encyclopedias, which are designed to introduce readers to a topic, not to be the final point of reference. Wikipedia, like other encyclopedias, provides overviews of a topic and indicates sources of more extensive information.

The whole paragraph is kinda FUD except for this. Normal research practice is to (get ready for a shock) do research and not just copy a high-level summary of what other people have done. If your professors were saying, "don't cite encyclopedias, which includes Wikipedia" then that's fine. But my experience was that Wikipedia was specifically called out as being especially unreliable and that's just nonsense.

I personally use ChatGPT like I would Wikipedia

Eesh. The value of a tertiary source is that it cites the secondary sources (which cite the primary). If you strip that out, how's it different from "some guy told me..."? I think your professors did a bad job of teaching you about how to read sources. Maybe because they didn't know themselves.
C 1 Antwort Letzte Antwort

1
P pro@programming.dev

A study from Profound of OpenAI's ChatGPT, Google AI Overviews and Perplexity shows that while ChatGPT mostly sources its information from Wikipedia, Google AI Overviews and Perplexity mostly source their information from Reddit.

ChatGPT Sources Mostly From Wikipedia While Google AI Overviews Sources Mostly From Reddit

A study from Profound of OpenAI's ChatGPT, Google AI Overviews and Perplexity shows that while ChatGPT mostly sources its information from Wikipedia, Google AI Overviews and Perplexity mostly source their information from Reddit.

Search Engine Roundtable (www.seroundtable.com)
K This user is from outside of this forum
K This user is from outside of this forum
kingthrillgore@lemmy.ml

schrieb zuletzt editiert von

#18

I would be hesitant to use either as a primary source...
S 1 Antwort Letzte Antwort

4
M mniot@programming.dev

I think it was. When I think of Wikipedia, I'm thinking about how it was in ~2005 (20 years ago) and it was a pretty solid encyclopedia then.

There were (and still are) some articles that are very thin. And some that have errors. Both of these things are true of non-wiki encyclopedias. When I've seen a poorly-written article, it's usually on a subject that a standard encyclopedia wouldn't even cover. So I feel like that was still a giant win for Wikipedia.
A This user is from outside of this forum
A This user is from outside of this forum
antonim@lemmy.dbzer0.com

schrieb zuletzt editiert von

#19

In 2005 the article on William Shakespeare contained references to a total of 7 different sources, including a page describing how his name is pronounced, Plutarch, and "Catholic Encyclopedia on CD-ROM". It contained more text discussing Shakespeare's supposed Catholicism than his actual plays, which were described only in the most generic terms possible. I'm not noticing any grave mistakes while skimming the text, but it really couldn't pass for a reliable source or a traditionally solid encyclopedia. And that's the page on the best known English writer, slightly less popular topics were obviously much shoddier.

It had its significant upsides already back then, sure, no doubt about that. But the teachers' skepticism wasn't all that unwarranted.
1 Antwort Letzte Antwort

0
M mniot@programming.dev

This is good advice for all tertiary sources such as encyclopedias, which are designed to introduce readers to a topic, not to be the final point of reference. Wikipedia, like other encyclopedias, provides overviews of a topic and indicates sources of more extensive information.

The whole paragraph is kinda FUD except for this. Normal research practice is to (get ready for a shock) do research and not just copy a high-level summary of what other people have done. If your professors were saying, "don't cite encyclopedias, which includes Wikipedia" then that's fine. But my experience was that Wikipedia was specifically called out as being especially unreliable and that's just nonsense.

I personally use ChatGPT like I would Wikipedia

Eesh. The value of a tertiary source is that it cites the secondary sources (which cite the primary). If you strip that out, how's it different from "some guy told me..."? I think your professors did a bad job of teaching you about how to read sources. Maybe because they didn't know themselves.
C This user is from outside of this forum
C This user is from outside of this forum
chulk@lemmy.ml

schrieb zuletzt editiert von

#20

my experience was that Wikipedia was specifically called out as being especially unreliable and that's just nonsense.

Let me clarify then. It's unreliable as a cited source in Academia. I'm drawing parallels and criticizing the way people use chatgpt. I.e. taking it at face value with zero caution and using it as if it's a primary source of information.

Eesh. The value of a tertiary source is that it cites the secondary sources (which cite the primary). If you strip that out, how's it different from "some guy told me..."? I think your professors did a bad job of teaching you about how to read sources. Maybe because they didn't know themselves.

Did you read beyond the sentence that you quoted?

Here:

I can get summarized information about new languages and frameworks really quickly, and then I can dive into the official documentation when I have a high level understanding of the topic at hand.

Example: you're a junior developer trying to figure out what this JavaScript syntax is const {x} = response?.data. It's difficult to figure out what destructuring and optional chaining are without knowing what they're called.

With Chatgpt, you can copy and paste that code and ask "tell me what every piece of syntax is in this line of Javascript." Then you can check the official docs to learn more.
1 Antwort Letzte Antwort

0
K kingthrillgore@lemmy.ml

I would be hesitant to use either as a primary source...
S This user is from outside of this forum
S This user is from outside of this forum
sqgl@sh.itjust.works

schrieb zuletzt editiert von

#21

Sure: for professionals.

However when casually commenting in a forum it is fine because the reader can go check the citations (and perhaps come back and add to the thread).
1 Antwort Letzte Antwort

0

Anmelden zum Antworten

D

How China's new auto giants left General Motors, Volkswagen and Tesla in the dust
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
13

1

47 Stimmen

13 Beiträge

6 Aufrufe

N

They don't treat their people like shit, they treat them like slaves. In countries outside China at that. https://www.bbc.com/news/articles/c3v5n7w55kpo
E

EQT’s $167.5M Settlement: What It Means for Investors—and What It Doesn’t
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

5 Aufrufe

Niemand hat geantwortet
R

AI willing to let humans die, blackmail to avoid shutdown, report finds
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

1

10 Stimmen

7 Beiträge

21 Aufrufe

L

All hail our tiny head terminator overlords.
T

Just launched a social platform where every post is a poll and am looking for feedback!
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

0 Stimmen

1 Beiträge

8 Aufrufe

Niemand hat geantwortet
E

True Wireless Power is FINALLY here (building a TRULY wire-free desk setup)
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology diy wireless
84

110 Stimmen

84 Beiträge

69 Aufrufe

T

It's not new technology you numpty. It's not news. It's not a scientific paper. Wireless energy transfer isn't "bullshit", it's been an understood aspect of physics for a long time. Since you seem unable to grasp the concept, I'll put it in bold and italics: This is a video of a guy doing a DIY project where he wanted to make his setup as wireless as possible. In the video he also goes over his thoughts and design considerations, and explains how the tech works for people who don't already know. It is not new technology. It is not pseudoscience. It is a guy showing off his bespoke PC setup. It does not need an article or a blog post. He can post about it in any form he wants. Personally, I think showcasing this kind of thing in a video is much better than a wall of text. I want to see the process, the finished product, the tools used and how he used them.
S

DeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunch
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
21

1

88 Stimmen

21 Beiträge

51 Aufrufe

J

The self hosted model has hard coded censored content.
A

New Orleans used Minority Report-like facial recognition software to monitor citizens for crime suspects: Report
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

31 Stimmen

1 Beiträge

9 Aufrufe

Niemand hat geantwortet
I

Google Worried It Couldn’t Control How Israel Uses Project Nimbus, Files Reveal
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

1

88 Stimmen

4 Beiträge

13 Aufrufe

C

Won't someone think of the shareholders?!