Knowledge Distillation based on Monoclass Teachers for Edge Infrastructure


Cédric Maron1,2, Virginie Fresse1, Karynn Morand2 and Freddy Havart2, 1Laboratoire Hubert Curien, France, 2SEGULA Technologie, France


With the growing interest in neural network compression, several methods aiming to improve the networks accuracy have emerged. Data augmentation aims to enhance model robustness and generalization by increasing the diversity of the training dataset. Knowledge distillation, aims to transfer knowledge from a teacher network to a student network. Knowledge distillation is generally carried out using high-end GPUs because teacher network architectures are often too heavy to be implemented on the small resources present in the Edge. This paper proposes a new distillation method adapted to an edge computing infrastructure. By employing multiple monoclass teachers of small sizes, the proposed distillation method becomes applicable even within the constrained computing resources of the edge. The proposed method is evaluated with classical knowledge distillation based on bigger teacher network, using different data augmentation methods and using different amount of training data.


Neural network compression, knowledge distillation, edge infrastructure, data augmentation

Full Text  Volume 14, Number 1