The state of the art semi-supervised learning framework has greatly shown its potential in making deep and complex language models such as BERT highly effective for text classification tasks when labeled data is limited. However, the large size and low inference speed of such models may hinder their application on resources-limited or real-time use cases. In this paper, we propose a new approach in semi-supervised learning framework to distill large complex teacher model into a fairly lightweight student model which has the ability of acquiring...