Non-Google search engines blocked from showing recent Reddit results

Updated robots.txt file hits Bing and others without a Reddit deal.

By: arstechnica.com

Jul 25 2024
0
2 Views

Non-Google search engines blocked from showing recent Reddit results

Recent discussions on Reddit are no longer showing up in non-Google search engine results. The absence is the result of updates to Reddit’s Content Policy that ban crawling its site without agreeing to Reddit’s rules, which bar using Reddit content for AI training without Reddit’s explicit consent.

As reported by 404 Media, using "site:reddit.com" on non-Google search engines, including Bing, DuckDuckGo, and Mojeek, brings up minimal or no Reddit results from the past week. Ars Technica made searches on these and other search engines and can confirm the findings. Brave, for example, brings up a few Reddit results sometimes (examples here and here) but not nearly as many as what appears on Google when using identical queries. A standout is Kagi, which is a paid-for engine that pays Google for some of its search index and still shows recent Reddit results.

As 404 Media noted, Reddit's Robots Exclusion Protocol (robots.txt file) blocks bots from scraping the site. The protocol also states, "Reddit believes in an open Internet, but not the misuse of public content." Reddit has approved scrapers from the Internet Archive and some research-focused entities.

Reddit announced changes to its robots.txt file on June 25. Ahead of the changes, it said it had "seen an uptick in obviously commercial entities who scrape Reddit and argue that they are not bound by our terms or policies. Worse, they hide behind robots.txt and say that they can use Reddit content for any use case they want."

Last month, Reddit said that any "good-faith actor" could reach out to Reddit to try to work with the company, linking to an online form. However, Colin Hayhurst, Mojeek's CEO, told me via email that he reached out to Reddit after he was blocked but that Reddit "did not respond to many messages and emails." He noted that since 404 Media's report, Reddit CEO Steve Huffman has reached out.

Google's search strangehold tightens

With Google being virtually the only search engine that can show recent Reddit results—at least for now—Reddit has inadvertently helped tighten Google's stranglehold on the search industry. The change comes amid recent quality concerns about Google results, which have ranked SEO and AI spam farms, ads, and e-commerce links higher than more relevant results. There are also worries about Google's AI Overview.

When reached for comment, Reddit spokesperson Tim Rathschmidt said via email that Reddit has been in talks "with multiple search engines." He added:

We have been unable to reach agreements with all of them, since some are unable or unwilling to make enforceable promises regarding their use of Reddit content, including their use for AI.

After Reddit declared war on free use of its content for AI training (which also resulted in an API access price hike that shuttered many third-party Reddit apps), Reddit signed a deal at a reported $60 million per year that lets Google use Reddit data to train its AI. It was expected that Reddit would try to strike a similar deal with Microsoft, but it seems the parties could not reach an agreement in line with Reddit's content policy, which also includes rules about user privacy and deleted content, for example.

A spokesperson for Microsoft told me, "We respect the robots.txt standard.”

A statement shared with Search Engine Land went further, adding, "Bing stopped crawling Reddit after they implemented their updated robots.txt file on July 1, which prohibits all crawling of their site." In October, The Washington Post, citing an anonymous source, reported that Reddit was considering blocking Bing search crawlers if it couldn't reach a deal with Microsoft.

As 404 Media pointed out, Reddit's guide for accessing its data names "search or website ads" as a commercial use warranting fees. It's unclear how much money other search engines would need to spend to be permitted to scrape the platform. Rathschmidt said Reddit is "open to working with partners big and small."

"It’s bad for the health of the Internet for for-profit companies to scrape our content without constraint and use it for, among other things, [training] AI models," he said.

For now, Google can continue leaning on Reddit to help make search results more relevant. Google didn't respond to Ars' request for comment.

Meanwhile, alternative search engines may find it harder to compete.

"With our own ranking algorithms, previously users would often find different pages on Reddit than they might find with Google and others," Mojeek's Hayhurst told me.

The CEO added that while being blocked by Reddit alone "is not a huge deal," he is concerned about the precedent it could set. "Search engines are the main traffic source for most websites, and a spreading of this behavior will further choke off traffic. And smaller sites will be impacted even more than large sites," he said.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

anchor links ads by Easy Branches

Get Reliable Matka Guessing Forum with our Satta Matka Expert and Get all Matka Chart For Free.

Non-Google search engines blocked from showing recent Reddit results

Google's search strangehold tightens

Related

Poco M6 Plus 5G India Launch Date Set for August 1; Price, Key Specifications Listed on Amazon

Non-Google search engines blocked from showing recent Reddit results

Apple to Launch 2 iPhone Models in 2025 With Its Proprietary 5G Modem Replacing Qualcomm: Ming-Chi Kuo

Redmi 14C Moniker Surfaces on IMEI Website, May Run on HyperOS: Report

HMD Crest, Crest Max 5G With 50-Megapixel Front Camera, 5,000mAh Battery Launched in India: Price, Specifications

Redmi Pad SE 4G Leaked Renders Suggest Colour Options; Key Specifications Tipped

Mistral Large 2 Open Source AI Model Released, Said to Be on Par With Meta Llama 3.1 405B

Infinix Partners With Samsung to Launch AI-Powered Deep Learning Camera Algorithm

Motorola Edge 50 India Launch Date Set for August 1; Design, Colour Options, Key Features Revealed

Nothing Phone 2a Plus Chipset, RAM Details Revealed Ahead of July 31 India Launch

Google Pixel 9 Series May Reportedly Get Samsung-Made OLED Displays

HP EliteBook Ultra, HP OmniBook X Copilot+ AI PCs With Snapdragon X Elite Chipset Launched in India

Poco M6 Plus 5G India Launch Date Set for August 1; Price, Key Specifications Listed on Amazon