[ad_1]
The reproducibility and transparency of huge language fashions are essential for advancing open analysis, making certain the trustworthiness of outcomes, and enabling investigations into information and mannequin biases, in addition to potential dangers. To this finish, we launch OpenELM, a state-of-the-art open language mannequin. OpenELM makes use of a layer-wise scaling technique to effectively allocate parameters inside every layer of the transformer mannequin, resulting in enhanced accuracy. For instance, with a parameter finances of roughly one billion parameters, OpenELM reveals a 2.36% enchancment in accuracy in comparison with OLMo whereas requiring 2 occasions fewer pre-training tokens.
Diverging from prior practices that solely present mannequin weights and inference code, and pre-train on non-public datasets, our launch contains the entire framework for coaching and analysis of the language mannequin on publicly obtainable datasets, together with coaching logs, a number of checkpoints, and pre-training configurations. We additionally launch code to transform fashions to MLX library for inference and fine-tuning on Apple units. This complete launch goals to empower and strengthen the open analysis neighborhood, paving the best way for future open analysis endeavors.
[ad_2]
Source link