One Wide Feedforward is All You Need

[ad_1]

This paper was accepted at WMT convention at EMNLP.
The Transformer structure has two fundamental non-embedding elements: Consideration and the Feed Ahead Community (FFN). Consideration captures interdependencies between phrases no matter their place, whereas the FFN non-linearly transforms every enter token independently. On this work, we discover the function of FFN and discover that regardless of, and discover that regardless of taking on a big fraction of the mannequin’s parameters, it’s extremely redundant. Concretely, we’re capable of considerably scale back the variety of parameters with solely a modest drop in accuracy by…

[ad_2]

Source link

No Result