Head pose estimation has many wide applications such as driver monitoring, attention recognition and multi-view facial analysis. Most of the previous works routinely utilize detected face regions to further estimate head pose with hard labels, which limits to explore more discriminative texture information and tends to over-fit. In this paper, we present a novel framework to alleviate this problem, which takes entire images as input and constructs soft labels using a Gaussian distribution function as supervision information, and then introduces...