[ad_1]
This paper introduces AIM, a group of imaginative and prescient fashions pre-trained with an autoregressive goal. These fashions are impressed by their textual counterparts, i.e., Massive Language Fashions (LLMs), and exhibit related scaling properties. Particularly, we spotlight two key findings: (1) the efficiency of the visible options scale with each the mannequin capability and the amount of information, (2) the worth of the target perform correlates with the efficiency of the mannequin on downstream duties. We illustrate the sensible implication of those findings by pre-training a 7 billion parameter AIM on 2 billion photographs, that achieves 84.0% on ImageNet-1k with a frozen trunk. Curiously, even at this scale, we observe no signal of saturation in efficiency, suggesting that AIM doubtlessly represents a brand new frontier for coaching large-scale imaginative and prescient fashions. The pre-training of AIM is much like the pre-training of LLMs, and doesn’t require any image-specific technique to stabilize the coaching at scale.
[ad_2]
Source link