[ad_1]
We introduce EELBERT, an strategy for compression of transformer-based fashions (for instance, BERT), with minimal impression on the accuracy of downstream duties. That is achieved by changing the enter embedding layer of the mannequin with dynamic, for instance, on-the-fly, embedding computations. Because the enter embedding layer accounts for a major fraction of the mannequin dimension, particularly for the smaller BERT variants, changing this layer with an embedding computation operate helps us scale back the mannequin dimension considerably. Empirical analysis on the GLUE benchmark reveals that our BERT variants (EELBERT) undergo minimal regression in comparison with the normal BERT fashions. By way of this strategy, we’re in a position to develop our smallest mannequin, UNO-EELBERT, which achieves a GLUE rating inside 4% of totally educated BERT-tiny whereas being 15x smaller (1.2 MB) in dimension.
[ad_2]
Source link