Tech

Wikipedia offers AI developers of training data set to remove scraper bots from their back

ByAdmin April 17, 2025

Rate this post

Wikipedia has been With the impact of this – Bots that scrape text and multimedia from the encyclopedia to train generative artificial intelligence models – they have their servers on their servers, which leads to increased costs and more slow times of workload for human consumers in some cases. In an attempt to stop bots from breaking down the Public Wikipedia website and soaking too very frequency tape, the Wikimedia Foundation (which manages Wikipedia data) offers AI developers of data that can freely use.

The organization has partnered with Kaggle, a data science platform, to offer a beta version of a structured data set in both English and French. – which owns Kaggle – the data set is formatted for machine learning to make it more useful for training, development and data science.

Wikimedia Enterprise that the data set includes "Summary, short descriptions, Infobox key value data, image links and clearly segmented tabs of articles." No references or others "non-positive elements," Like videos. The lack of references can make the question of the attribution for information in the data set somewhat foggy. However, Wikimedia Enterprise (part of the Wikimedia Foundation, which seeks to provide Wikipedia data available via API), says the content of the data set is freely licensed under Creative Commons, public space and so on, as everything is wikipedia.

This article originally appeared at Engadget at https://www.engadget.com/Ai/wikipedia-offers-ai-evelopers-a-training-to-Maybe-sCraper-Bots-Nits-Back-Back-Back-Back

Report

Game / Application Name

Your Email: *

Issue: *