The research on video-audio synchronization has attracted much attention in recent years. With the popularity of online education, the asynchrony between video and audio affects the quality of teaching and learning. This paper introduces a correction algorithm for video-audio asynchronization in online education. First, the video data were preprocessed using the S3FD and Librosa package to extract the lip images and MFCC as visual and auditory features; then, the Syncnet, consisting of a two-stream neural network, was retrained on the preprocessed dataset to obtain the semantic similarity of v...