[ad_1]
Following Hugging Face’s Zephyr recipe


Discovering good coaching hyperparameters for brand new LLMs is at all times troublesome and time-consuming. With Zephyr Gemma 7B, Hugging Face appears to have discovered a great recipe for fine-tuning Gemma. They used a mix of distilled supervised fine-tuning and DPO just like what they did for his or her unique Zephyr based mostly on Mistral 7B. Nonetheless, coaching Gemma with DPO on shopper {hardware} is difficult attributable to its reminiscence consumption.
On this article, I first overview the recipe utilized by Hugging Face to coach Zephyr Gemma 7B. Then, I present the best way to use this recipe with Unsloth, a framework implementing varied optimizations for quick and memory-efficient coaching. The strategy introduced on this article has a peak reminiscence consumption of 19 GB of VRAM and a complete coaching time of solely 8 hours. In different phrases, DPO coaching for Gemma is feasible on shopper {hardware}.
Supervised Superb-tuning (SFT)
DPO should use for reference a mannequin educated with supervised fine-tuning (SFT) on an instruction dataset. Hugging Face additionally launched this SFT mannequin:
For SFT, they used deita-10k which is a small instruction dataset of 9.5k examples:
All kinds of LLMs have generated all of the examples on this dataset (GPT-4, GPT-3.5, Claude, Vicuna, Llama 2, Mistral 7B, Zephyr, and many others.). For SFT coaching, they used a particular information format that we are going to additionally use.
Hugging Face used the hyperparameters referenced on this configuration file from their alignment handbook. They didn’t use LoRA or quantization. It signifies that they most likely used many A100/H100 GPUs for coaching Zephyr Gemma. Notice: Within the mannequin card, they wrote “16 gadgets” however they don’t say what are these gadgets.
To run this recipe on shopper {hardware}, we’ll use LoRA and quantization, i.e., QLoRA. I’ll element the LoRA configuration within the subsequent part.
[ad_2]
Source link