Discover the most powerful AI tools in this category with pricing, features, demo and use cases

ArXiv Dataset is not a singular AI tool or model, but rather a vast repository of scientific preprin...

MNIST is a foundational dataset of handwritten digits, widely used for training and evaluating machi...

CIFAR-10 is a widely used benchmark dataset for image classification tasks, consisting of 60,000 32x...

ImageNet is a foundational large-scale visual database designed for use in visual object recognition...

LAION-400M is a massive, open-source dataset containing billions of image-text pairs, primarily used...

CIFAR-100 is a widely used benchmark dataset for image classification tasks, containing 100 fine-gra...

COCO (Common Objects in Context) is a large-scale object detection, segmentation, and captioning dat...

Fashion-MNIST is a benchmark dataset of 70,000 28x28 grayscale images of 10 fashion categories, wide...

Wikipedia Dump is a massive, publicly available dataset of Wikipedia articles, offering an unparalle...

LAION-5B is a massive, open-source dataset of 5.85 billion image-text pairs, designed to facilitate ...

Amazon Personalize is a fully managed machine learning service that makes it easy for developers to ...

The Stack Exchange Data Dump provides a comprehensive, publicly accessible collection of anonymized ...

The Open Images Dataset is a large-scale, open dataset of ~9 million images annotated with image-lev...

SVHN (Street View House Numbers) is a computer vision dataset used for training and evaluating digit...

Flickr30k is a large-scale dataset for image captioning and visual-linguistic research, comprising o...

Common Crawl is a non-profit organization that provides a massive, open repository of web crawl data...

The Pile is a massive, diverse dataset curated for training large language models, encompassing a wi...

OpenWebText is an open-source dataset designed to replicate the quality and diversity of OpenAI's We...

Wikitext-103 is a large language model primarily used for language modeling tasks, serving as a benc...

Project Gutenberg is a massive, free online library of over 60,000 free eBooks, focusing on public d...

CelebA is a large-scale dataset of celebrity images with annotations for various facial attributes, ...

LibriSpeech is a large-scale, open-source dataset of read English speech used for training and evalu...

VoxCeleb is a large-scale dataset for speaker recognition and speaker diarization, comprising a vast...

Common Voice is an open-source initiative by Mozilla to collect diverse voice data, enabling the tra...

AudioSet is a large-scale dataset containing diverse audio events annotated with semantic labels, pr...

VQAv2 is a benchmark dataset and evaluation metric for Visual Question Answering (VQA) systems, desi...

CLIP Benchmark Dataset is a curated collection of image-text pairs designed to evaluate the zero-sho...

MIMIC-III is a critical benchmark dataset that enables research in critical care medicine. It contai...

KITTI is a specialized computer vision benchmark dataset and associated software development kit, pr...

The Waymo Open Dataset is a comprehensive, large-scale dataset for autonomous driving research, prov...