Estimating and improving students' engagement in a collaborative learning environment is an important component in the field of learning research. Collaborative learning is a strategy of learning activities employed by small groups in which cooperative learning behaviors are closely related to other members or objects in the group. Researchers showed that students who are actively involved in class learn more. Therefore, gaze behavior and facial expression are important nonverbal indicators in cooperative learning environments. In this paper, we proposed a multimodal deep neural network (MDNN)...