作者机构:
[Xinguo Yu] National Engineering Research Center for E-Learning, Central China Normal University, China;[Jing Xia] College of International Cultural Exchange, Central China Normal University, China
摘要:
GPT has made the noticeable impact on research of solving algebra problems. In order to fuse the good features of GPT with the traditional methods to design better algorithms, this paper analyzes the approaches of solving algebra problems to understand their abstractive features. To this end, this paper classifies the approaches by means of state-transit analysis and then reveals their abstractive features such as assumptions and scopes from the algorithm descriptions. It further analyzes their application-related features such as readability. The tables are built to compare the features of the various approaches, the findings are listed, and two future research directions are pointed out. The outcomes of this paper will provide an overview understanding of the research area of solving algebra problems and a thinking scaffold.
作者机构:
[Ullah, Anwar; Yu, Xinguo; Yu, XG] Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan 430079, Peoples R China.;[Numan, Muhammad] Cent China Normal Univ, Wollongong Joint Inst, Wuhan 430079, Peoples R China.
通讯机构:
[Yu, XG ] C;Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan 430079, Peoples R China.
摘要:
Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved, including digit deformation, noise interference between frames, blurred output, and the need for temporal coherence across frames. In this paper, we propose a novel approach for generating coherent videos of moving digits from textual input using a Deep Deconvolutional Generative Adversarial Network (DD-GAN). The DD-GAN comprises a Deep Deconvolutional Neural Network (DDNN) as a Generator (G) and a modified Deep Convolutional Neural Network (DCNN) as a Discriminator (D) to ensure temporal coherence between adjacent frames. The proposed research involves several steps. First, the input text is fed into a Long Short Term Memory (LSTM) based text encoder and then smoothed using Conditioning Augmentation (CA) techniques to enhance the effectiveness of the Generator (G). Next, using a DDNN to generate video frames by incorporating enhanced text and random noise and modifying a DCNN to act as a Discriminator (D), effectively distinguishing between generated and real videos. This research evaluates the quality of the generated videos using standard metrics like Inception Score (IS), Frechet Inception Distance (FID), Frechet Inception Distance for video (FID2vid), and Generative Adversarial Metric (GAM), along with a human study based on realism, coherence, and relevance. By conducting experiments on Single-Digit Bouncing MNIST GIFs (SBMG), Two-Digit Bouncing MNIST GIFs (TBMG), and a custom dataset of essential mathematics videos with related text, this research demonstrates significant improvements in both metrics and human study results, confirming the effectiveness of DD-GAN. This research also took the exciting challenge of generating preschool math videos from text, handling complex structures, digits, and symbols, and achieving successful results. The proposed research demonstrates promising results for generating coherent videos from textual input.
作者机构:
[Xinguo Yu; Rao Peng] National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, China;[Chuanzhi Yang; Runze Huang] School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
会议名称:
2021 IEEE International Conference on Engineering, Technology & Education (TALE)
会议时间:
05 December 2021
会议地点:
Wuhan, Hubei Province, China
会议论文集名称:
2021 IEEE International Conference on Engineering, Technology & Education (TALE)
关键词:
math word problem;problem solver;text-to-text transformer model;deep learning
摘要:
In recent years, automatic problem solving for math-ematical words has attracted increasing attention. Therefore, algorithms developed for solving mathematical word problems like humans are a key technology to facilitate the development of digital education. In this paper, a deep learning model based on a text-to-text conversion model is proposed to solve mathematical word problems. The deep learning model treats each mathematical word problem as a “text-to-text” problem, i.e. taking a text as input and producing a new text as output. However, the output text appears in the form of a mathematical expression. In our experiments, this paper evaluates a deep learning model of Ape210K, which consists of 210K Chinese primary school level maths problems. The experimental results show that our deep learning model presented in this paper can solve 78.61% of the mathematical word problems as a whole. In addition, the problems were classified into six categories based on the knowledge points involved, and this paper explores the model's performance on different types of problems.
作者机构:
[Xinguo Yu; Rao Peng] National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, China;[Runze Huang; Chuanzhi Yang] School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
会议名称:
2021 IEEE International Conference on Engineering, Technology & Education (TALE)
会议时间:
05 December 2021
会议地点:
Wuhan, Hubei Province, China
会议论文集名称:
2021 IEEE International Conference on Engineering, Technology & Education (TALE)
摘要:
The research on video-audio synchronization has attracted much attention in recent years. With the popularity of online education, the asynchrony between video and audio affects the quality of teaching and learning. This paper introduces a correction algorithm for video-audio asynchronization in online education. First, the video data were preprocessed using the S3FD and Librosa package to extract the lip images and MFCC as visual and auditory features; then, the Syncnet, consisting of a two-stream neural network, was retrained on the preprocessed dataset to obtain the semantic similarity of video-audio. Next, it calculated the synchronization error by using a sliding window method based on the similarity. Finally, a correction program adopting the FFmpeg framework fixed the asynchronous problem. In the experiment, 300 hours of Chinese video-audio synchronization dataset and 20 hours of Chinese live classroom record dataset were collected to test the algorithm. The experimental results show that the proposed algorithm can achieve 94% correction accuracy.
期刊:
Journal of Systems Science and Systems Engineering,2021年30(4):417-432 ISSN:1004-3756
通讯作者:
Niu, Lei(lniu@ccnu.edu.cn)
作者机构:
[Huang, Litian; Yu, Xinguo; Niu, Lei] Cent China Normal Univ, Cent China Normal Univ Wollongong Joint Inst, Wuhan 430000, Peoples R China.;[Zhao, Jinhua] Wuhan Univ, Sch Econ & Management, Wuhan 430000, Peoples R China.;[Yu, Xinguo] Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan 430000, Peoples R China.
通讯机构:
[Lei Niu] C;Central China Normal University Wollongong Joint Institute, Central China Normal University, Wuhan, China
摘要:
The research of multiple negotiations considering issue interdependence across negotiations is considered as a complex research topic in agent negotiation. In the multiple negotiations scenario, an agent conducts multiple negotiations with opponents for different negotiation goals, and issues in a single negotiation might be interdependent with issues in other negotiations. Moreover, the utility functions involved in multiple negotiations might be nonlinear, e.g., the issues involved in multiple negotiations are discrete. Considering this research problem, the current work may not well handle multiple interdependent negotiations with complex utility functions, where issues involved in utility functions are discrete. Regarding utility functions involving discrete issues, an agent may not find an offer exactly satisfying its expected utility during the negotiation process. Furthermore, as sub-offers on issues in every single negotiation might be restricted by the interdependence relationships with issues in other negotiations, it is even harder for the agent to find an offer satisfying the expected utility and all involved issue interdependence at the same time, leading to a high failure rate of processing multiple negotiations as a final outcome. To resolve this challenge, this paper presents a negotiation model for multiple negotiations, where interdependence exists between discrete issues across multiple negotiations. By introducing the formal definition of “interdependence between discrete issues across negotiations”, the proposed negotiation model applies the multiple alternating offers protocol, the clustered negotiation procedure and the proposed negotiation strategy to handle multiple interdependent negotiations with discrete issues. In the proposed strategy, the “tolerance value” is introduced as an agent’s consideration to balance between the overall negotiation goal and the negotiation outcomes. The experimental results show that, 1) the proposed model well handles the multiple negotiations with interdependence between discrete issues, 2) the proposed approach is able to help agents in the decision-making process of proposing acceptable offers, 3) an agent can choose a proper “tolerance value” to balance between the success rate of multiple negotiations and its expected utility.
摘要:
Synthesizing high-resolution realistic images from text description using one iteration Generative Adversarial Network (GAN) is difficult without using any additional techniques because mostly the blurry artifacts and mode collapse problems are occurring. To reduce these problems, this paper proposes an Iterative Generative Adversarial Network (iGAN) which takes three iterations to synthesize high-resolution realistic image from their text description. In the \(1^{st}\) iteration, GAN synthesizes a low-resolution \(64 \times 64\) pixels basic shape and basic color image from the text description with less mode collapse and blurry artifacts problems. In the \(2^{nd}\) iteration, GAN takes the result of the \(1^{st}\) iteration and text description again and synthesizes a better resolution \(128 \times 128\) pixels better shape and well color image with very less mode collapse and blurry artifacts problems. In the last iteration, GAN takes the result of the \(2^{nd}\) iteration and text description as well and synthesizes a high-resolution \(256 \times 256\) well shape and clear image with almost no mode collapse and blurry artifacts problems. Our proposed iGAN shows a significant performance on CUB birds and Oxford-102 flowers datasets. Moreover, iGAN improves the inception score and human rank as compare to the other state-of-the-art methods.
期刊:
Proceedings of the International Conference on Document Analysis and Recognition, ICDAR,2019年:577-582 ISSN:1520-5363
通讯作者:
Zhang, Ting(ting.zhang@mail.ccnu.edu.cn)
作者机构:
[Liu, Xiaoxue] Central China Normal University Wollongong Joint Institute, Central China Normal University, China;[Zhang, Ting; Yu, Xinguo] National Engineering Research Center for E-learning, Central China Normal University, China
作者机构:
[Yuan, Shuo; Yu, Xinguo] Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan, Hubei, Peoples R China.;[Majid, Abdul] Wollongong Joint Inst Cent China Normal Univ, Wuhan, Hubei, Peoples R China.
会议名称:
2019 4th International Conference on Control and Robotics Engineering (ICCRE)
摘要:
In order to analyze facial expression in human-computer interaction, real time face-tracking has become significant research problem. Traditional face tracking methods have achieved good results in some constrained environments (such as good illumination, no background interference, etc.) However,these methods require to design manual facial features depending on researcher's experience. In addition,lacking ability for generalization problems is worthy of study. The robustness of face tracking in complex scenes is challenging due to fast moving, multi-scale changes, rotation and occlusion, illumination changes, etc. In view of the above considerations, this paper proposes an improved method based on Siamese-Net to optimize for face tracking tasks. Our work mainly includes four aspects. First, the first two convolutional layers of deeper VGG-16 are used to extract feature.So we call our method Siamese-VGG. Second, we report experiment on face tracking using a pre-trained VGG-Face model which is trained by 2.6M images for face recognition and then fine-tuning to acceleration convergence.Third, in this research the same size crops are input to two branches in the framework and then the inner smaller template feature maps are extracted during training.The proposed method in this paper reduce offset losses by this way. Finally, L2 regularization add to the loss function to improve the generalization ability of the model. The experiment results show better robustness and generalization performance over the original algorithm. In complex scenes, the proposed improved method have achieved the great improvements by almost 11% on average overlap.But,the frame rate of improved method is still 18.5fps on the Nvidia GTX1070Ti GPU.The improved method proposed in this paper is more practical in terms of speed and accuracy.
期刊:
ACM International Conference Proceeding Series,2014年:177-180
作者机构:
[Wan Ding; Xinguo Yu] National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, China;[Nan Ye] School of Computing, National University of Singapore, Singapore
会议论文集名称:
ICIMCS '14: Proceedings of International Conference on Internet Multimedia Computing and Service
关键词:
Artificial intelligence;Character recognition;Hough transforms;Internet;Video signal processing;Automatic analysis;Complex background;Conditional Random Fields(CRFs);Event detection;Multimedia;Recognition algorithm;Transition patterns;Vision and scene understanding;Sports
摘要:
Goal events are important in automatic analysis of broadcast sports game videos, but previous approaches rely on visual or audio information which are hard to obtain. In this paper, we use superimposed texts to detect goals (both the occurrences of goal events and their types) for broadcast basketball video, and we propose a transition pattern based approach for both text extraction and goal detection. Our approach is lightweight and effectively handles main challenges in extracting superimposed texts: complex background, low-resolution and blur of the texts, which made standard localization and character recognition algorithms inaccurate. We focus on extracting superimposed game clock and game score texts in broadcast basketball video. We exploit transition patterns to develop a Hough transform for localization, and conditional random fields (CRFs) for both score digit recognition and goal detection. The experiments show that our transition pattern based approach leads to high accuracy for both superimposed text extraction and goal detection. Categories and Subject Descriptors I.2.10 [Artificial Intelligence]: Vision and Scene Understandings -Video Analysis. General Terms Algorithms. Copyright 2014 ACM.
期刊:
ACM International Conference Proceeding Series,2013年:364-367
作者机构:
[Xinguo Yu; Jun Cheng; Wu Song; Bin He] National Engineering Research Center for E-Learning, Center China Normal University, Wuhan, 430079, China
会议论文集名称:
ICIMCS '13: Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
关键词:
digit localization;Localization procedure;Pixel recovery;Removal algorithms;secondly-periodicity;Surveillance video;Time-stamp;Video surveillance;Algorithms;Applications;Internet;Monitoring;Network security;Pixels;Security systems