[ad_1]
*=Equal Contributors
Multimodal datasets are a important element in current breakthroughs resembling Steady Diffusion and GPT-4, but their design doesn’t obtain the identical analysis consideration as mannequin architectures or coaching algorithms. To deal with this shortcoming within the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered round a brand new candidate pool of 12.8 billion image-text pairs from Widespread Crawl. Contributors in our benchmark design new filtering methods or curate new knowledge sources after which consider their new dataset by working our standardized CLIP coaching code and testing the ensuing mannequin on 38 downstream take a look at units. Our benchmark consists of a number of compute scales spanning 4 orders of magnitude, which allows the research of scaling developments and makes the benchmark accessible to researchers with various sources. Our baseline experiments present that the DataComp workflow results in higher coaching units. Particularly, our greatest baseline, DataComp-1B, allows coaching a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet, outperforming OpenAI’s CLIP ViT-L/14 by 3.7 share factors whereas utilizing the identical coaching process and compute.
[ad_2]
Source link