Training Agents 2: Live tutorial on model distillation for training custom agents.

Sponsored:

Your customers are already searching—if you’re not online, they’re choosing someone else. A polished website builds trust, works 24/7, and turns interest into action. With RoseHosting, you don’t need code, a big budget, or weeks of waiting: pick a design, add your content, and launch fast.

Stop being invisible. Start owning your digital storefront today—build your WordPress site now.

Video by Hugging Face via YouTube
Training Agents 2: Live tutorial on model distillation for training custom agents.

In this live session, we’ll cover how to transfer capability from a teacher model to a smaller student through distillation. We’ll work through supervised fine-tuning on teacher-generated data, then on-policy and online methods where the teacher scores the student live, then self-distillation where the model teaches itself. Each one runs in TRL.
What we’ll cover:

– What distillation is, and the four axes that organize it: signal, data source, timing, and teacher identity
– White-box vs black-box: distilling from open weights vs strings
– Off-policy distillation: generate from the teacher, then SFT on the outputs
– On-policy distillation: sample from the student, score with the teacher in the loop
– Distillation as reinforcement learning: the KD distance as a dense, token-level reward
– Self-distillation: the model as its own teacher, and when that beats a stronger one

Repo: https://github.com/burtenshaw/training-agents

This is part of the Training Agents series: using coding agents to design, run, monitor, and review post-training experiments, while training models to become better agents.

#TRL #HuggingFace #PostTraining #AIAgents #Distillation

Source