作者机构:
[Zhong, Duo; Jiang, Xingpeng; Li, Bojing] Cent China Normal Univ, Hubei Key Lab Artificial Intelligence & Smart Lear, Wuhan, Peoples R China.;[Zhong, Duo; Jiang, Xingpeng; Li, Bojing] Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.;[Qiao, Jimei] Shanghai Normal Univ, Math & Sci Coll, Shanghai, Peoples R China.;[Jiang, Xingpeng] Cent China Normal Univ, Natl Language Resources Monitoring & Res Ctr Netwo, Wuhan, Peoples R China.
通讯机构:
[Xingpeng Jiang] H;Hubei Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, China<&wdkj&>School of Computer, Central China Normal University, Wuhan, China<&wdkj&>National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan, China
摘要:
Microorganisms play important roles in our lives especially on metabolism and diseases. Determining the probability of human suffering from specific diseases and the severity of the disease based on microbial genes is the crucial research for understanding the relationship between microbes and diseases. Previous could extract the topological information of phylogenetic trees and integrate them to metagenomic datasets, thus enable classifiers to learn more information in limited datasets and thus improve the performance of the models. In this paper, we proposed a GNPI model to better learn the structure of phylogenetic trees. GNPI maintained the original vector format of metagenomic datasets, while previous research had to change the input form to matrices. The vector-like form of the input data can be easily adopted in the baseline machine learning models and is available for deep learning models. The datasets processed with GNPI help enhance the accuracy of machine learning and deep learning models in three different datasets. GNPI is an interpretable data processing method for host phenotype prediction and other bioinformatics tasks.
摘要:
In cross-language question retrieval (CLQR), users employ a new question in one language to search the community question answering (CQA) archives for similar questions in another language. In addition to the ranking problem in monolingual question retrieval, one needs to bridge the language gap in CLQR. The existing adversarial models for cross-language learning normally rely on a single adversarial component. Since natural languages consist of units of different abstract levels, we argue that crossing the language gap adaptatively on different levels with multiple adversarial components should lead to smoother text representation and better CLQR performance. To this end, we first encode questions into multi-layer representations of different abstract levels with a CNN based model which enhances conventional models with diverse kernel shapes and the corresponding pooling strategy so as to capture different aspects of a text segment. We then impose a set of adversarial components on different layers of question representation so as to decide the appropriate abstract levels and their role in performing cross-language mapping. Experimental results on two real-world datasets demonstrate that our model outperforms state-of-the-art models for CLQR, which is on par with the strong machine translation baselines and most monolingual baselines. (C) 2020 Elsevier Inc. All rights reserved.
关键词:
Bug severity;code change complexity;commit record
摘要:
Both complexity of code change for bug fixing and bug severity play an important role in release planning when considering which bugs should be fixed in a specific release under certain constraints. This work investigates whether there are significant differences between bugs of different severity levels regarding the complexity of code change for fixing the bugs. Code change complexity is measured by the number of modified lines of code, source files, and packages, as well as the entropy of code change. We performed a case study on 20 Apache open source software (OSS) projects using commit records and bug reports. The study results show that (1) for bugs of high severity levels (i.e. Blocker, Critical and Major in JIRA), there is no significant difference on the complexity of code change for fixing bugs of different severity levels for most projects, while (2) for bugs of low severity levels (i.e. Major, Minor and Trivial in JIRA), fixing bugs of a higher severity level needs significantly more complex code change than fixing bugs of a lower severity level for most projects. These findings provide useful and practical insights for effort estimation and release planning of OSS development.
摘要:
Dense motion estimations obtained from optical flow techniques play a significant role in many image processing and computer vision tasks. Remarkable progress has been made in both theory and its application in practice. In this paper, we provide a systematic review of recent optical flow techniques with a focus on the variational method and approaches based on Convolutional Neural Networks (CNNs). These two categories have led to state-of-the-art performance. We discuss recent modifications and extensions of the original model, and highlight remaining challenges. For the first time, we provide an overview of recent CNN-based optical flow methods and discuss their potential and current limitations.
摘要:
This paper addresses the issue of video-based action recognition by exploiting an advanced multi-stream Convolutional Neural Network (CNN) to fully use semantics-derived multiple modalities in both spatial (appearance) and temporal (motion) domains, since the performance of the CNN-based action recognition methods heavily relate to two factors: semantic visual cues and the network architecture. Our work consists of two major parts. First, to extract useful human-related semantics accurately, we propose a novel spatiotemporal saliency based video object segmentation (STS-VOS) model. By fusing different distinctive saliency maps, which are computed according to object signatures of complementary object detection approaches, a refined spatiotemporal saliency maps can be obtained. In this way, various challenges in the realistic video can be handled jointly. Based on the estimated saliency maps, an energy function is constructed to segment two semantic cues: the actor and one distinctive acting part of the actor. Second, we modify the architecture of the two-stream network (TS-Net) to design a multi-stream network (MS-Net) that consists of three TS-Nets with respect to the extracted semantics, which is able to use deeper abstract visual features of multi-modalities in multi-scale spatiotemporally. Importantly, the performance of action recognition is significantly boosted when integrating the captured human-related semantics into our framework. Experiments on four public benchmarks JHMDB, HMDB51, UCF-Sports and UCF101 demonstrate that the proposed method outperforms the state of the art algorithms.
期刊:
NATURAL LANGUAGE ENGINEERING,2018年24(4):523-549 ISSN:1351-3249
通讯作者:
Li, Bo
作者机构:
[Li, Bo] Cent China Normal Univ, Dept Comp Sci, Wuhan, Hubei, Peoples R China.;[Gaussier, Eric] Univ Grenoble Alpes, CNRS, LIG, AMA, Grenoble, France.;[Yang, Dan] China Elect Power Res Inst, Wuhan, Hubei, Peoples R China.
通讯机构:
[Li, Bo] C;Cent China Normal Univ, Dept Comp Sci, Wuhan, Hubei, Peoples R China.
摘要:
Comparable corpora serve as an important substitute for parallel resources in cases of under-resourced language pairs. Previous work mostly aims to find a better strategy to exploit existing comparable corpora, while ignoring the variety in corpus quality. The quality of comparable corpora affects a lot its usability in practice, a fact that has been justified by several studies. However, researchers have not been able to establish a widely accepted and fully validated framework to measure corpus quality. We will thus investigate in this paper a comprehensive methodology to deal with the quality of comparable corpora. To be exact, we will propose several comparability measures and a quantitative strategy to test those measures. Our experiments show that the proposed comparability measure can capture gold-standard comparability levels very well and is robust to the bilingual dictionary used. Moreover, we will show in the task of bilingual lexicon extraction that the proposed measure correlates well with the performance of the real world application.
摘要:
The most successful video-based human action recognition methods rely on feature representations extracted using Convolutional Neural Networks (CNNs). Inspired by the two-stream network (TS-Net), we propose a multi-stream Convolutional Neural Network (CNN) architecture to recognize human actions. We additionally consider human-related regions that contain the most informative features. First, by improving foreground detection, the region of interest corresponding to the appearance and the motion of an actor can be detected robustly under realistic circumstances. Based on the entire detected human body, we construct one appearance and one motion stream. In addition, we select a secondary region that contains the major moving part of an actor based on motion saliency. By combining the traditional streams with the novel human-related streams, we introduce a human-related multi-stream CNN (HR-MSCNN) architecture that encodes appearance, motion, and the captured tubes of the human-related regions. Comparative evaluation on the JHMDB, HMDB51, UCF Sports and UCF101 datasets demonstrates that the streams contain features that complement each other. The proposed multi-stream architecture achieves state-of-the-art results on these four datasets. (C) 2018 Elsevier Ltd. All rights reserved.
期刊:
Information Processing & Management,2018年54(2):291-302 ISSN:0306-4573
通讯作者:
Li, Bo
作者机构:
[Li, Bo] Cent China Normal Univ, Sch Comp Sci, Wuhan, Hubei, Peoples R China.;[Gaussier, Eric] Univ Grenoble Alpes, CNRS, LIG AMA, Grenoble, France.;[Yang, Dan] China Elect Power Res Inst, Wuhan, Hubei, Peoples R China.
通讯机构:
[Li, Bo] C;Cent China Normal Univ, Sch Comp Sci, Wuhan, Hubei, Peoples R China.
关键词:
Cross-language information retrieval;D/C condition;Information retrieval heuristic
摘要:
Experimental results of cross-language information retrieval (CLIR) do not indicate why a model fails or how a model could be improved. One basic research question is thus whether it is possible to provide conditions by which one can evaluate any existing or new CLIR strategy analytically and one can improve the design of CLIR models. Inspired by the heuristics in monolingual IR, we introduce in this paper Dilution/Concentration (D/C) conditions to characterize good CLIR models based on direct intuitions under artificial settings. The conditions, derived from first principles in CLIR, generalize the idea of query structuring approach. Empirical results with state-of-the-art CLIR models show that when a condition is not satisfied, it often indicates non-optimality of the method. In general, we find that the empirical performance of a retrieval formula is tightly related to how well it satisfies the conditions. Lastly, we propose, by following the D/C conditions, several novel CLIR models based on the information-based models, which again shows that the D/C conditions are efficient to feature good CLIR models.
摘要:
The type of centralized group key establishment protocols is the most commonly used one due to its efficiency in computation and communication. A key generation center (KGC) in this type of protocols acts as a server to register users initially. Since the KGC selects a group key for group communication, all users must trust the KGC. Needing a mutually trusted KGC can cause problem in some applications. For example, users in a social network cannot trust the network server to select a group key for a secure group communication. In this paper, we remove the need of a mutually trusted KGC by assuming that each user only trusts himself. During registration, each user acts as a KGC to register other users and issue sub-shares to other users. From the secret sharing homomorphism, all sub-shares of each user can be combined into a master share. The master share enables a pairwise shared key between any pair of users. A verification of master shares enables all users to verify their master shares are generated consistently without revealing the master shares. In a group communication, the initiator can become the server to select a group key and distribute it to each other user over a pairwise shared channel. Our design is unique since the storage of each user is minimal, the verification of master shares is efficient and the group key distribution is centralized. There are public-key based group key establishment protocols without a trusted third party. However, these protocols can only establish a single group key. Our protocol is a non-public-key solution and can establish multiple group keys which is computationally efficient.
摘要:
Buzzwords are the main embodiment of Internet culture, which play an important role in public opinion analysis, social focus tracking and language evolution study. At present, questionnaire has been wildly used as a standard method to obtain network buzzwords, which is subjective and costly. In this paper, we will propose a novel algorithm relying on the time-distribution feature of words and a KL-divergence measure to estimate words' popularity so as to figure out buzzwords in a specific period. The time-distribution feature simply states the fact that buzzwords' usage has a sharp increase during a very short period, which is then modeled formally with the KL-divergence measure. Compared with traditional method involving much workforce, the automatic algorithm presented here is clearly more efficient. Moreover, buzzwords identified in this manner will not be affected by individual's subjective opinions, so they can reflect the language usage in practice better. When applying the algorithm to a social media big data set, our experimental results show that the proposed approach can accurately identify buzzwords in a certain period, which is highly coincident with results tagged manually.
期刊:
Lecture Notes in Computer Science,2014年8801:223-233 ISSN:0302-9743
通讯作者:
Li, Bo
作者机构:
[He, Tingting; Li, Bo; Chen, Qianjun; Zhu, Qunyan] Cent China Normal Univ, Hubei Univ, Sch Comp,Ctr Natl Language Tracing & Res Network, Natl Engn Res Ctr E Learning,Network Ctr, Wuhan 430079, Peoples R China.
通讯机构:
[Li, Bo] C;Cent China Normal Univ, Hubei Univ, Sch Comp,Ctr Natl Language Tracing & Res Network, Natl Engn Res Ctr E Learning,Network Ctr, Wuhan 430079, Peoples R China.
会议名称:
13th China National Conference on Chinese Computational Linguistics (CCL) / 2nd International Symposium on Natural Language Processing Based on Naturally Annotated Big Data (NLP-NABD)
会议时间:
OCT 18-19, 2014
会议地点:
Cent China Normal Univ, Wuhan, PEOPLES R CHINA