Recognizing classroom behavior is crucial for assessing and improving teaching quality. However, the existing methods for behavior recognition have limited accuracy due to issues, such as occlusions, pose variations, and inconsistent target scales. To address these challenges, we propose an advanced single-stage object detector called ConvNeXt Block Prediction Head Network (CBPH-Net). Specifically, we design an efficient feature extraction module (FEM) to capture more channel information and relevant features from the images in the backbone net...