To demonstrate the performance of proposed model, 100 short infrared videos are selected as the dataset in this paper. The IR videos come from the two sources, namely, downloading from the internet and capturing by ourselves. In the videos, the teachers stand in front of the electronic display and teach the new courses in the classroom. With the permission of the students, teachers, and schools, our research group capture the infrared videos during the past three semesters. The key frames are