Skip to content

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Technology
232 123 46
  • Can't believe I've lived to see Cloudflare be the good guys

    Lesser of two bad guys maybe?

  • Cloudflare are notorious for shielding cybercrime sites. You can't even complain about abuse of Cloudflare about them, they'll just forward on your abuse complaint to the likely dodgy host of the cybercrime site. They don't even have a channel to complain to them about network abuse of their DNS services.

    So they certainly are an enabler of the cybercriminals they purport to protect people from.

    If they acted differently, they'd probably be liable for illegal activity that they proxy for (this is for example relevant for the DMCA safe harbor).

    Anyhow, when on their abuse page, I have an option for "Registrar", which is used for "DNS abuse", among others.

  • Cloudflare are notorious for shielding cybercrime sites. You can't even complain about abuse of Cloudflare about them, they'll just forward on your abuse complaint to the likely dodgy host of the cybercrime site. They don't even have a channel to complain to them about network abuse of their DNS services.

    So they certainly are an enabler of the cybercriminals they purport to protect people from.

    Any internet service provider needs to be completely neutral. Not only in their actions, but also in their liability.
    Same goes for other services like payment processors.
    If companies that provide content-agnostic services are allowed to policy the content, that opens the door to really nasty stuff.

    You can't chop everyone's arms to stop a few people from stealing.

    If they think their services are being used in a reprehensible manner, what they need to do is alert the authorities, not act like vigilantes.

  • That all sounds very vague to me, and I don't expect it to be captured properly by law any time soon. Being accessed for LLM? What does it mean for you and how is it different from being accessed by a user? Imagine you host a weather forecast. If that information is public, what kind of compensation do you expect from anyone or anything who accesses that data?

    Is it okay for a person to access your site? Is it okay for a script written by that person to fetch data every day automatically? Would it be okay for a user to dump a page of your site with a headless browser? Would it be okay to let an LLM take a look at it to extract info required by a user? Have you heard about changedetection.io project? If some of these sound unfair to you, you might want to put a DRM on your data or something.

    Would you expect a compensation from me after reading your comment?

    That all sounds very vague to me, and I don’t expect it to be captured properly by law any time soon.

    It already has been captured, properly in law, in most places. We can use the US as an example: Both intellectual property and real property have laws already that cover these very items.

    What does it mean for you and how is it different from being accessed by a user?

    Well, does a user burn up gigawatts of power, to access my site every time? That's a huge different.

    Imagine you host a weather forecast. If that information is public, what kind of compensation do you expect from anyone or anything who accesses that data?

    Depends on the terms of service I set for that service.

    Is it okay for a person to access your site?

    Sure!

    Is it okay for a script written by that person to fetch data every day automatically?

    Sure! As long as it doesn't cause problems for me, the creator and hoster of said content.

    Would it be okay for a user to dump a page of your site with a headless browser?

    See above. Both power usage and causing problems for me.

    Would it be okay to let an LLM take a look at it to extract info required by a user?

    No. I said, I do not want my content and services to be used by and for LLMs.

    Have you heard about changedetection.io project?

    I have now. And should a user want to use that service, that service, which charges 8.99/month for it needs to pay me a portion of that, or risk having their service blocked.

    There no need to use it, as I already provide RSS feeds for my content. Use the RSS feed, if you want updates.

    If some of these sound unfair to you, you might want to put a DRM on your data or something.

    Or, I can just block them, via a service like Cloud Flare. Which I do.

    Would you expect a compensation from me after reading your comment?

    None. Unless you're wanting to access if via an LLM. Then I want compensation for the profit driven access to my content.

  • This post did not contain any content.

    I don't see a problem here. Maybe Perplexity should consider the reasons WHY Cloudflare have a firewall...?

  • Recaptcha v2 does way more than check if the box was checked.

    you're not wrong, but it also allows more than 99.8% of the bot traffic through too on text challenges. Its like the TSA of website security. Its mostly there to keep the user busy while cloudflare places itself in a man in the middle of your encrypted connection to a third party. The only difference between cloudflare and a malicious attacker is cloudflares stated intention not to be evil. With that and 3 dollars I can buy myself a single hard shell taco from tacobell.

  • Site owners currently do and should have the freedom to decide who is and is not allowed to access the data, and to decide for what purpose it gets used for. Idgaf if you think scraping is malicious or not, it is and should be illegal to violate clear and obvious barriers against them at the cost of the owners and unsanctioned profit of the scrapers off of the work of the site owners.

    to decide for what purpose it gets used for

    Yeah, fuck everything about that. If I'm a site visitor I should be able to do what I want with the data you send me. If I bypass your ads, or use your words to write a newspaper article that you don't like, tough shit. Publishing information is choosing not to control what happens to the information after it leaves your control.

    Don't like it? Make me sign an NDA. And even then, violating an NDA isn't a crime, much less a felony punishable by years of prison time.

    Interpreting the CFAA to cover scraping is absurd and draconian.

  • That all sounds very vague to me, and I don’t expect it to be captured properly by law any time soon.

    It already has been captured, properly in law, in most places. We can use the US as an example: Both intellectual property and real property have laws already that cover these very items.

    What does it mean for you and how is it different from being accessed by a user?

    Well, does a user burn up gigawatts of power, to access my site every time? That's a huge different.

    Imagine you host a weather forecast. If that information is public, what kind of compensation do you expect from anyone or anything who accesses that data?

    Depends on the terms of service I set for that service.

    Is it okay for a person to access your site?

    Sure!

    Is it okay for a script written by that person to fetch data every day automatically?

    Sure! As long as it doesn't cause problems for me, the creator and hoster of said content.

    Would it be okay for a user to dump a page of your site with a headless browser?

    See above. Both power usage and causing problems for me.

    Would it be okay to let an LLM take a look at it to extract info required by a user?

    No. I said, I do not want my content and services to be used by and for LLMs.

    Have you heard about changedetection.io project?

    I have now. And should a user want to use that service, that service, which charges 8.99/month for it needs to pay me a portion of that, or risk having their service blocked.

    There no need to use it, as I already provide RSS feeds for my content. Use the RSS feed, if you want updates.

    If some of these sound unfair to you, you might want to put a DRM on your data or something.

    Or, I can just block them, via a service like Cloud Flare. Which I do.

    Would you expect a compensation from me after reading your comment?

    None. Unless you're wanting to access if via an LLM. Then I want compensation for the profit driven access to my content.

    Both intellectual property and real property have laws already that cover these very items.

    And it causes a lot of trouble to many people and pains me specifically. Information should not be gated or owned in a way that would make it illegal for anyone to access it under proper conditions. License expiration causing digital work to die out, DRM causing software to break, idiotic license owners not providing appropriate service, etc.

    Well, does a user burn up gigawatts of power, to access my site every time?

    Doing a GET request doesn't do that.

    As long as it doesn't cause problems for me, the creator and hoster of said content.

    What kind of problems that would be?

    Both power usage and causing problems for me.

    ?? How? And what?

    do not want my content and services to be used by and for LLMs.

    You have to agree that at one point "be used by LLM" would not be different from "be used by a user".

    which charges 8.99/month

    It's self-hosted and free.

    Use the RSS feed, if you want updates.

    How does that prohibit usage and processing of your info? That sounds like "I won't be providing any comments on Lemmy website, if you want my opinion you can mail me at a@b.com"

    I can just block them, via a service like Cloud Flare. Which I do.

    That will never block all of them. Your info will be used without your consent and you will not feel troubled from it. So you might not feel troubled if more things do the same.

    None. Unless you're wanting to access if via an LLM. Then I want compensation for the profit driven access to my content.

    What if I use my local hosted LLM? Anyway, the point is, selling text can't work well, and you're going to spend much more resources on collecting and summarizing data about how your text was used and how others benefited from it, in order to get compensation, than it worths.

    Also, it might be the case that some information is actually worthless when compared to a service provided by things like LLM, even though they use that worthless information in the process.

    I'm all for killing off LLMs, btw. Concerns of site makers who think they are being damaged by things like Perplexity are nothing compared to what LLMs do to the world. Maybe laws should instead make it illegal to waste energy. Before energy becomes the main currency.

  • to decide for what purpose it gets used for

    Yeah, fuck everything about that. If I'm a site visitor I should be able to do what I want with the data you send me. If I bypass your ads, or use your words to write a newspaper article that you don't like, tough shit. Publishing information is choosing not to control what happens to the information after it leaves your control.

    Don't like it? Make me sign an NDA. And even then, violating an NDA isn't a crime, much less a felony punishable by years of prison time.

    Interpreting the CFAA to cover scraping is absurd and draconian.

    If you want anybody and everyone to be able to use everything you post for any purpose, right on, good for you, but don't try to force your morality on others who rely on their writing, programming, and artworks to make a living to survive.

  • If you want anybody and everyone to be able to use everything you post for any purpose, right on, good for you, but don't try to force your morality on others who rely on their writing, programming, and artworks to make a living to survive.

    I'm gonna continue to use ad blockers and yt-dlp, and if you think I'm a criminal for doing so, I'm gonna say you don't understand either technology or criminal law.

  • I'm gonna continue to use ad blockers and yt-dlp, and if you think I'm a criminal for doing so, I'm gonna say you don't understand either technology or criminal law.

    Thats a crime yeah and if Alphabet co wants to sue you for $1.34 damages then they have that right, just as we should have the right to sue them if their AI crawlers make our site unusable and plagiarize our work to the effect of thousands of dollars, or even press charges for the criminal act of intentional disruption of services.

  • 74 Stimmen
    21 Beiträge
    4 Aufrufe
    C
    My French isn't great, but I'm fairly confident I know how to pronounce this. It's only a slant rhyme, right?
  • DIY cyborg (Nerdforge)

    Technology technology
    1
    22 Stimmen
    1 Beiträge
    8 Aufrufe
    Niemand hat geantwortet
  • 2 Stimmen
    1 Beiträge
    13 Aufrufe
    Niemand hat geantwortet
  • Ex-Google CEO: Power Grid Crisis Could Kill AI's Next Big Leap

    Technology technology
    20
    1
    115 Stimmen
    20 Beiträge
    269 Aufrufe
    S
    Our CPUs and GPUs are many orders of magnitude simpler than our brains. https://www.scientificamerican.com/article/100-trillion-connections/ But I largely agree! We need to optimize software. OTOH, some of the smartest people in IT have been working on this, who are we to second guess them.
  • Ads on YouTube

    Technology technology
    47
    30 Stimmen
    47 Beiträge
    473 Aufrufe
    K
    this is like a soulless manager or some ai bot trying to figure why the human brain hates terrible interruptions
  • 1k Stimmen
    95 Beiträge
    2k Aufrufe
    G
    Obviously the law must be simple enough to follow so that for Jim’s furniture shop is not a problem nor a too high cost to respect it, but it must be clear that if you break it you can cease to exist as company. I think this may be the root of our disagreement, I do not believe that there is any law making body today that is capable of an elegantly simple law. I could be too naive, but I think it is possible. We also definitely have a difference on opinion when it comes to the severity of the infraction, in my mind, while privacy is important, it should not have the same level of punishments associated with it when compared to something on the level of poisoning water ways; I think that a privacy law should hurt but be able to be learned from while in the poison case it should result in the bankruptcy of a company. The severity is directly proportional to the number of people affected. If you violate the privacy of 200 million people is the same that you poison the water of 10 people. And while with the poisoning scenario it could be better to jail the responsible people (for a very, very long time) and let the company survive to clean the water, once your privacy is violated there is no way back, a company could not fix it. The issue we find ourselves with today is that the aggregate of all privacy breaches makes it harmful to the people, but with a sizeable enough fine, I find it hard to believe that there would be major or lasting damage. So how much money your privacy it's worth ? 6 For this reason I don’t think it is wise to write laws that will bankrupt a company off of one infraction which was not directly or indirectly harmful to the physical well being of the people: and I am using indirectly a little bit more strict than I would like to since as I said before, the aggregate of all the information is harmful. The point is that the goal is not to bankrupt companies but to have them behave right. The penalty associated to every law IS the tool that make you respect the law. And it must be so high that you don't want to break the law. I would have to look into the laws in question, but on a surface level I think that any company should be subjected to the same baseline privacy laws, so if there isn’t anything screwy within the law that apple, Google, and Facebook are ignoring, I think it should apply to them. Trust me on this one, direct experience payment processors have a lot more rules to follow to be able to work. I do not want jail time for the CEO by default but he need to know that he will pay personally if the company break the law, it is the only way to make him run the company being sure that it follow the laws. For some reason I don’t have my usual cynicism when it comes to this issue. I think that the magnitude of loses that vested interests have in these companies would make it so that companies would police themselves for fear of losing profits. That being said I wouldn’t be opposed to some form of personal accountability on corporate leadership, but I fear that they will just end up finding a way to create a scapegoat everytime. It is not cynicism. I simply think that a huge fine to a single person (the CEO for example) is useless since it too easy to avoid and if it really huge realistically it would be never paid anyway so nothing usefull since the net worth of this kind of people is only on the paper. So if you slap a 100 billion file to Musk he will never pay because he has not the money to pay even if technically he is worth way more than that. Jail time instead is something that even Musk can experience. In general I like laws that are as objective as possible, I think that a privacy law should be written so that it is very objectively overbearing, but that has a smaller fine associated with it. This way the law is very clear on right and wrong, while also giving the businesses time and incentive to change their practices without having to sink large amount of expenses into lawyers to review every minute detail, which is the logical conclusion of the one infraction bankrupt system that you seem to be supporting. Then you write a law that explicitally state what you can do and what is not allowed is forbidden by default.
  • 112 Stimmen
    34 Beiträge
    450 Aufrufe
    fredselfish@lemmy.worldF
    Nlow that was a great show. I always wanted in on that too. Back when Radio Shack still dealt in parts for remote control cars.
  • 0 Stimmen
    6 Beiträge
    70 Aufrufe
    P
    Outlook.... Ok Pretty solid Bahaha hahahahaha Sorry. Outlook is a lot of things. "Gooey crap" would be one way to describe it, but "solid"? Yeah, no. Gmail is (well, was) pretty solid. There are a lot of other webmail providers out there, including self hosted options and most are pretty solid, yeah. Outlook, though? It's a shit show, it's annoying. Do you love me? Please love me, please give feedback, please give feedback again, please look at this, hey am I the best? Am I.. STFU YOU PIECE OF CRAP! Can you PLEASE just let me do my email without being an attention whore every hour? Even down to the basics. Back button? "What is that? Never heard of it, can't go back to the message I just was on because I'm Microsoft software and so half baked." Having two tabs open? "Oh noes, now I get scawed, now I don't know how to manage sessions anymore, better just sign you out everywhere." What is it with Microsoft and not being able to do something basic as sessions normal? I'm not even asking for good, definitely not "awesome", just normal, and that is already too much to ask. Try running it in Firefox! I'm sure it's totally not on purpose, just "oopsie woopsie poopsie" accidentally bwoken. Maybe it's working again today, who knows, tomorrow it'll be broken again. I run everything on Firefox except the Microsoft sites, they have to be in chrome because fuck you, that's why. Seriously, I can't take any Microsoft software seriously at this point, and all of it is on its way out in our company, I'm making sure of that