Discover the most powerful AI tools in this category with pricing, features, demo and use cases
ArXiv Dataset is not a singular AI tool or model, but rather a vast repository of scientific preprin...
MNIST is a foundational dataset of handwritten digits, widely used for training and evaluating machi...
CIFAR-10 is a widely used benchmark dataset for image classification tasks, consisting of 60,000 32x...
ImageNet is a foundational large-scale visual database designed for use in visual object recognition...
LAION-400M is a massive, open-source dataset containing billions of image-text pairs, primarily used...
CIFAR-100 is a widely used benchmark dataset for image classification tasks, containing 100 fine-gra...
COCO (Common Objects in Context) is a large-scale object detection, segmentation, and captioning dat...
Fashion-MNIST is a benchmark dataset of 70,000 28x28 grayscale images of 10 fashion categories, wide...
Wikipedia Dump is a massive, publicly available dataset of Wikipedia articles, offering an unparalle...
LAION-5B is a massive, open-source dataset of 5.85 billion image-text pairs, designed to facilitate ...
Amazon Personalize is a fully managed machine learning service that makes it easy for developers to ...
The Stack Exchange Data Dump provides a comprehensive, publicly accessible collection of anonymized ...
The Open Images Dataset is a large-scale, open dataset of ~9 million images annotated with image-lev...
SVHN (Street View House Numbers) is a computer vision dataset used for training and evaluating digit...
Flickr30k is a large-scale dataset for image captioning and visual-linguistic research, comprising o...
Common Crawl is a non-profit organization that provides a massive, open repository of web crawl data...
The Pile is a massive, diverse dataset curated for training large language models, encompassing a wi...
OpenWebText is an open-source dataset designed to replicate the quality and diversity of OpenAI's We...
Wikitext-103 is a large language model primarily used for language modeling tasks, serving as a benc...
Project Gutenberg is a massive, free online library of over 60,000 free eBooks, focusing on public d...
CelebA is a large-scale dataset of celebrity images with annotations for various facial attributes, ...
LibriSpeech is a large-scale, open-source dataset of read English speech used for training and evalu...
VoxCeleb is a large-scale dataset for speaker recognition and speaker diarization, comprising a vast...
Common Voice is an open-source initiative by Mozilla to collect diverse voice data, enabling the tra...
AudioSet is a large-scale dataset containing diverse audio events annotated with semantic labels, pr...
VQAv2 is a benchmark dataset and evaluation metric for Visual Question Answering (VQA) systems, desi...
CLIP Benchmark Dataset is a curated collection of image-text pairs designed to evaluate the zero-sho...
MIMIC-III is a critical benchmark dataset that enables research in critical care medicine. It contai...
KITTI is a specialized computer vision benchmark dataset and associated software development kit, pr...
The Waymo Open Dataset is a comprehensive, large-scale dataset for autonomous driving research, prov...