作者机构:
[Jiang, Xingpeng; He, Tingting; Hu, Xiaohua; Li, Xusheng; Wang, Xiaoyan; Zhong, Duo] Cent China Normal Univ, Sch Comp, Wuhan, Hubei, Peoples R China.;[Hu, Xiaohua] Drexel Univ, Sch Comp & Informat, Philadelphia, PA 19104 USA.;[Zhong, Ran] Cent China Normal Univ, Collaborat & Innovat Ctr, Wuhan, Hubei, Peoples R China.
会议名称:
IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
会议时间:
DEC 03-06, 2018
会议地点:
Madrid, SPAIN
会议主办单位:
[Li, Xusheng;Wang, Xiaoyan;Zhong, Duo;He, Tingting;Hu, Xiaohua;Jiang, Xingpeng] Cent China Normal Univ, Sch Comp, Wuhan, Hubei, Peoples R China.^[Hu, Xiaohua] Drexel Univ, Sch Comp & Informat, Philadelphia, PA 19104 USA.^[Zhong, Ran] Cent China Normal Univ, Collaborat & Innovat Ctr, Wuhan, Hubei, Peoples R China.
会议论文集名称:
IEEE International Conference on Bioinformatics and Biomedicine-BIBM
关键词:
biomedical text mining;bacterial named entity recognition;conditional random field;deep learning;microbial interaction
摘要:
Microorganisms have been confirmed to be essential for the fundamental function of various ecosystems. The interactions among microorganisms affect the human health and environmental ecosystem. A large number of microbial interactions with experimental confidence have been reported in biomedical literature. Extracting and collating these interactions with experimental confidence into a database will create a valuable data resource. Named Entity Recognition (NER) is the premise and key to interaction extraction from literatures. Especially, bacterial named entity recognition is still a challenging task due to the specialty of bacterial names. In this paper, we propose a bacterial named entity recognition system based on a hybrid deep learning framework (HDL-CRF), which integrates two deep learning models: the bidirectional long short-term memory network and the convolutional neural network, as well as the conditional random field approach, for automatically extracting the features. Finally, we prove that this model outperforms previous methods in performance.
期刊:
Lecture Notes in Computer Science,2018年10955:93-99 ISSN:0302-9743
通讯作者:
Zhang, Yue
作者机构:
[Pan, Min] Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan 430079, Hubei, Peoples R China.;[Jiang, Xingpeng; He, Tingting; Zhang, Yue] Cent China Normal Univ, Sch Comp Sci, Wuhan 430079, Hubei, Peoples R China.
通讯机构:
[Zhang, Yue] C;Cent China Normal Univ, Sch Comp Sci, Wuhan 430079, Hubei, Peoples R China.
会议名称:
14th International Conference on Intelligent Computing (ICIC)
会议时间:
AUG 15-18, 2018
会议地点:
Wuhan, PEOPLES R CHINA
会议主办单位:
[Pan, Min] Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan 430079, Hubei, Peoples R China.^[Zhang, Yue;He, Tingting;Jiang, Xingpeng] Cent China Normal Univ, Sch Comp Sci, Wuhan 430079, Hubei, Peoples R China.
摘要:
In an actual electronic health record (EHR), patient notes are written with terse language and clinical jargons. However, most Pseudo Relevance Feedback (PRF) technique methods do not take into account the significant degree of candidate term in feedback documents and the co-occurrence relationship between a candidate term and a query term simultaneously. In this paper, we study how to incorporate proximity information into the Rocchio's model, and propose a HAL-based Rocchio's model, called HRoc. A new concept of term proximity feedback weight is introduced to model in the query expansion. Then, we propose three normalization methods to incorporate proximity information. Experimental results on 2016 TREC Clinical Support Medicine collections show that our proposed models are effective and generally superior to the state-of-the-art relevance feedback models.
作者:
Jian, Fanghong;Huang, Jimmy Xiangji*;Zhao, Jiashu;He, Tingting(何婷婷)
作者机构:
[Huang, Jimmy Xiangji] Cent China Normal Univ, Informat Retrieval & Knowledge Management Res Lab, Wuhan, Hubei, Peoples R China.;Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan, Hubei, Peoples R China.;Cent China Normal Univ, Sch Comp, Wuhan, Hubei, Peoples R China.;York Univ, Sch Informat Technol, Toronto, ON, Canada.
会议名称:
41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
会议时间:
JUL 08-12, 2018
会议地点:
Univ Michigan, Ann Arbor, MI
会议主办单位:
Univ Michigan
关键词:
Term Frequency Normalization;BM25;Probabilistic Model
摘要:
In- probabilistic BM25, term frequency normalization is one of the key components. It is often controlled by parameters k(1) and b, which need to be optimized for each given data set. In this paper, we assume and show empirically that term frequency normalization should be specific with query length in order to optimize retrieval performance. Following this intuition, we first propose a new term frequency normalization with query length for probabilistic information retrieval, namely BM25(QL). Then BM25(QL) is incorporated into the state-of-the-art models CRTER2 and LDA-BM25, denoted as CRTER2QL and LDA-BM25(QL) respectively. A series of experiments show that our proposed approaches BM25(QL), CRTER2QL and LDA-BM25(QL) are comparable to BM25, CRTER2 and LDA-BM25 with the optimal b setting in terms of MAP on all the data sets.
摘要:
Virus-host association studies are significant for understanding the complex functions and dynamics of microbial communities of human health or diseases. Several virus-host association prediction methods have been developed based on the information of sequences, virus networks, host networks and virus-host networks separately. In this study, we develop a heterogeneous network approach based on neighborhood regularization logistic matrix factorization (LMFH-VH) which integrate the virus similarity network and the host similarity network using known virus-host associations. The virus similarity network and the host similarity network were constructed based on oligonucleotide frequency measures and Gaussian interaction profile kernel similarity, respectively. LMFH-VH achieves the best performance on several validation datasets comparing with other four network-based methods. The host prediction accuracy of LMFH-VH is 24.17% and 12.8% higher than two recently proposed virus-host prediction methods, respectively. The codes and datasets are available at https://github.com/liudan111/LMFH-VH.git.
作者机构:
[Zhu, Qiang] Cent China Normal Univ, Sch Informat Management, Wuhan 430079, Hubei, Peoples R China.;[Jiang, Xingpeng; He, Tingting; Hu, Xiaohua; Pan, Min; Zhu, Qing] Cent China Normal Univ, Sch Comp, Wuhan 430079, Hubei, Peoples R China.;[Hu, Xiaohua] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
通讯机构:
[He, Tingting] C;Cent China Normal Univ, Sch Comp, Wuhan 430079, Hubei, Peoples R China.
会议名称:
IEEE International Conference on Bioinformatics and Biomedicine (BIBM) - Human Genomics
会议时间:
DEC 03-06, 2018
会议地点:
Madrid, SPAIN
会议主办单位:
[Zhu, Qiang] Cent China Normal Univ, Sch Informat Management, Wuhan 430079, Hubei, Peoples R China.^[Zhu, Qing;Pan, Min;Jiang, Xingpeng;Hu, Xiaohua;He, Tingting] Cent China Normal Univ, Sch Comp, Wuhan 430079, Hubei, Peoples R China.^[Hu, Xiaohua] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
会议论文集名称:
IEEE International Conference on Bioinformatics and Biomedicine-BIBM
摘要:
Microorganisms are closely related to human health and have an impact on the development of various diseases. It is extremely significant to identify the relationships between microorganisms and the phenotypes (such as healthy or disease status) by analyzing microbial abundance in personalized medicine. Deep learning allows computational models that composed of multiple processing layers to learn representation of data with multiple levels of abstraction. These methods have improved the state-of-the-art in speech recognition, visual object recognition and object detection. However, current deep models are typically neural networks which are actually multiple layers of parameterized differentiable nonlinear models that can be trained by backpropagation. It is interesting to explore other deep learning models to handle tasks with small sample size and high dimensional data. While a unique feature of microbial data is that it has phylogenetic tree structure information which can be embedded to improve the classification performance. In this work, in order to further improve the metagenomic classification, we propose a deep model named Cascade Deep Forest which keeps the spatial structure between nodes through embedding phylogenetic tree information. Our results demonstrate: 1) the modified cascade structure can enhance the classification performance of Deep Forest; 2) embedding phylogenetic tree information can also improve the classification of the models; 3) Deep Forest achieves highly competitive performance to deep neural networks.
作者机构:
[Jiang, Xingpeng; Yang, Jincai; He, Tingting; Shen, Xianjun; Hu, Xiaohua; Shen, XJ; Gong, Xue] Cent China Normal Univ, Sch Comp, Wuhan, Hubei, Peoples R China.;[Hu, Xiaohua] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
会议名称:
IEEE International Conference on Bioinformatics and Biomedicine (BIBM) - Human Genomics
会议时间:
DEC 03-06, 2018
会议地点:
Madrid, SPAIN
会议主办单位:
[Shen, Xianjun;Gong, Xue;Jiang, Xingpeng;Yang, Jincai;He, Tingting;Hu, Xiaohua] Cent China Normal Univ, Sch Comp, Wuhan, Hubei, Peoples R China.^[Hu, Xiaohua] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
会议论文集名称:
IEEE International Conference on Bioinformatics and Biomedicine-BIBM
关键词:
weighted Directed motifs;microbial network;high order structures;motif-based clustering
摘要:
High-order connectivity patterns are essential to understanding the basic structure of complex networks. Network motifs are considered as the basic building blocks of complex networks. From identifying network motifs to discovering higher-order modular organizations by them, it is helpful to study the organization principles and functional modules of the biological networks in a divide-and-conquer manner. However, the current research based on network motifs often neglect the influence of weight in network motifs. In this paper, the concept of weighted motifs was presented and was applied to microbial network. The method was proposed to find the optimal weighted motif in microbial network and analyze the high-order structure of weighted networks based on them. It also proved that the partially weighted motifs can obtain optimal clusters in theory over unweighted ones.
摘要:
Dynamic network is drawing more and more attention due to its potential in capturing time-dependent phenomena such as online public opinion and biological system. Microbial interaction networks that model the microbial system are often dynamic, static analysis methods are difficult to obtain reliable knowledge on evolving communities. To fulfill this gap, a dynamic clustering approach based on evolutionary symmetric nonnegative matrix factorization (ESNMF) is used to analyze the microbiome time-series data. To our knowledge, this is the first attempt to extract dynamic modules across time-series microbial interaction network. ESNMF systematically integrates temporal smoothness cost into the objective function by simultaneously refining the clustering structure in the current network and minimizing the clustering deviation in successive timestamps. We apply the proposed framework on a human microbiome datasets from infants delivered vaginally and ones born via C-section. The proposed method cannot only identify the evolving modules related to certain functions of microbial communities, but also discriminate differences in two kinds of networks obtained from infants delivered vaginally and via C-section.
作者机构:
[Zhu, Qiang] Cent China Normal Univ, Sch Informat Management, Wuhan 430079, Hubei, Peoples R China.;[Jiang, Xingpeng; He, Tingting; Hu, Xiaohua; Pan, Min; Hu, XH; Liu, Lei; Li, Bojing] Cent China Normal Univ, Sch Comp, Wuhan 430079, Hubei, Peoples R China.;[Hu, Xiaohua] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
会议名称:
IEEE International Conference on Bioinformatics and Biomedicine (BIBM) - Human Genomics
会议时间:
DEC 03-06, 2018
会议地点:
Madrid, SPAIN
会议主办单位:
[Zhu, Qiang] Cent China Normal Univ, Sch Informat Management, Wuhan 430079, Hubei, Peoples R China.^[Pan, Min;Liu, Lei;Li, Bojing;He, Tingting;Jiang, Xingpeng;Hu, Xiaohua] Cent China Normal Univ, Sch Comp, Wuhan 430079, Hubei, Peoples R China.^[Hu, Xiaohua] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
会议论文集名称:
IEEE International Conference on Bioinformatics and Biomedicine-BIBM
摘要:
With the rapid advancement of DNA sequencing, metagenomics and metatranscriptomics have made great progress, which deepen our understanding on the human microbiome and its impact on human health and diseases. The microbiome, which is characterized by small samples, high dimensions and complicated relationships with hosts, refers to the species, genes and genomes of the microbiota, as well as the products of the microbiota and the host environment. In fact, many machine learning methods have been used to conduct Microbiome-Wide Association Studies which can link the microbiome with the phenotypes, such as the status of human health and diseases. However, existing methods such as Support Vector Machines (SVMs) have some limitations on deep representation learning with deep architectures which can promote the reuse of features and potentially lead to progressively more abstract features at higher layers of representations. Recently, Deep Neural Networks (DNNs), a kind of deep learning models, are widely used for metagenomic data analysis and can perform well on representation learning. But they are considered as a black box and sufferring from criticisms due to theirs lacking of interpretability. Thus, it is interesting to explore other deep learning models for metagenomic data analysis. In this work, we introduce a deep learning model called Deep Forest to study the microbiome associations and we also present an ensemble method for feature selection. Experimental results show that Deep Forest outperforms the traditional machine learning methods. In addition, compared to DNNs, Deep Forest has better interpretability and less hyperparameters.
期刊:
PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM),2018年:197-200 ISSN:2156-1125
通讯作者:
Hu, XH
作者机构:
[Ma, Yingjun] Cent China Normal Univ, Sch Math & Stat, Wuhan, Hubei, Peoples R China.;[Ge, Leixin] Cent China Normal Univ, Sch Life Sci, Wuhan, Hubei, Peoples R China.;[Ma, Yuanyuan] Anyang Normal Univ, Sch Comp & Informat Engn, Anyang, Peoples R China.;[Jiang, Xingpeng; He, Tingting; Hu, Xiaohua; Hu, XH] Cent China Normal Univ, Sch Comp, Wuhan, Hubei, Peoples R China.
通讯机构:
[Hu, XH] C;Cent China Normal Univ, Sch Comp, Wuhan, Hubei, Peoples R China.
会议名称:
IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
会议时间:
DEC 03-06, 2018
会议地点:
Madrid, SPAIN
会议主办单位:
[Ma, Yingjun] Cent China Normal Univ, Sch Math & Stat, Wuhan, Hubei, Peoples R China.^[Ge, Leixin] Cent China Normal Univ, Sch Life Sci, Wuhan, Hubei, Peoples R China.^[Ma, Yuanyuan] Anyang Normal Univ, Sch Comp & Informat Engn, Anyang, Peoples R China.^[Jiang, Xingpeng;He, Tingting;Hu, Xiaohua] Cent China Normal Univ, Sch Comp, Wuhan, Hubei, Peoples R China.
会议论文集名称:
IEEE International Conference on Bioinformatics and Biomedicine-BIBM
摘要:
Studies have shown that microRNAs are functionally related to human diseases. However, experimental methods for detecting miRNA-disease associations are both time consuming and laborious. Therefore, a large number of computational models for predicting potential miRNA-disease interaction have been proposed. However, few methods take into account the nonlinear structural similarity of miRNAs (diseases) and effectively integrate multiple similar metrics into one network. In this paper, we propose a kernel-based soft-neighborhood network propagation algorithm (LKSNF) to predict potential miRNA-disease interactions, which not only exploits the potential nonlinear relationship, but also effectively integrates different similar measures of miRNA (disease). The results of the 5-fold cross-validation show that the LKSNF model has significantly better predictive performance than other state-of-the-art methods. Case study further illustrates the effectiveness of LKSNF in predicting new miRNA-disease interactions.
摘要:
Many datasets that exists in the real world are often comprised of different representations or views which provide complementary information to each other. To integrate information from multiple views, data integration approaches such as nonnegative matrix factorization (NMF) have been developed to combine multiple heterogeneous data simultaneously to obtain a comprehensive representation. In this paper, we proposed a novel variant of symmetric nonnegative matrix factorization (SNMF), called Laplacian regularization based joint symmetric nonnegative matrix factorization (LJ-SNMF) for clustering multi-view data. We conduct extensive experiments on several realistic datasets including Human Microbiome Project data. The experimental results show that the proposed method outperforms other variants of NMF, which suggests the potential application of LJ-SNMF in clustering multi-view datasets. Additionally, we also demonstrate the capability of LJ-SNMF in community finding.
期刊:
2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA),2017年:3882-3887 ISSN:2639-1589
通讯作者:
Hu, Xiaohua
作者机构:
[Jiang, Xingpeng; He, Tingting; Shen, Xianjun; Hu, Xiaohua; Gao, Li; Zhu, Xianchao] Cent China Normal Univ, Sch Comp, Wuhan, Hubei, Peoples R China.;[Shen, Xianjun; Hu, Xiaohua] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
通讯机构:
[Hu, Xiaohua] C;[Hu, Xiaohua] D;Cent China Normal Univ, Sch Comp, Wuhan, Hubei, Peoples R China.;Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
会议名称:
IEEE International Conference on Big Data (IEEE Big Data)
会议时间:
DEC 11-14, 2017
会议地点:
Boston, MA
会议主办单位:
[Shen, Xianjun;Zhu, Xianchao;Jiang, Xingpeng;Gao, Li;He, Tingting;Hu, Xiaohua] Cent China Normal Univ, Sch Comp, Wuhan, Hubei, Peoples R China.^[Shen, Xianjun;Hu, Xiaohua] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
会议论文集名称:
IEEE International Conference on Big Data
摘要:
Known as phenotypic overlapping, some disease-related symptoms share a common pathological and physiological mechanism. Researchers attempt to visualize the phenotypic relationships between different human diseases from the perspective of machine learning, but traditional visualization methods may be subject to fundamental limitations of metric spaces. Multiple maps t-SNE regularization method, a probabilistic method for visualizing data points in multiple low-dimensional spaces has been proposed to address the limitation. However, the convergence speed is low when apply on the scale dataset. We use the RMSProp with Nesterov momentum method to learn the objective loss function. This method normalize the gradients by applying an exponential moving average of gradient magnitude for each iteration parameter and use Nesterov momentum to counterweigh too high velocities by "peeking ahead" actual objective values in the candidate search direction. This method convergent faster than the original method of convergence speed. Experiments results on several dataset shows that the proposed method outperforms the several version of mm-tSNE with or without regularization, as measured by the neighborhood preservation ratio and error rate. This suggests the modified mm-tSNE regularization can be applied directly in other domain including social, biological and microbiomic datasets.
作者机构:
[Jiang, Xingpeng; Xie, Wei; Yang, Jincai; He, Tingting; Shen, Xianjun; Hu, Po; Hu, Xiaohua; Yi, Li] Cent China Normal Univ, Sch Comp, Wuhan, Hubei, Peoples R China.;[Yi, Li] Letv Cloud Comp Co Ltd, Beijing, Peoples R China.;[Hu, Xiaohua] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
通讯机构:
[Shen, Xianjun] C;Cent China Normal Univ, Sch Comp, Wuhan, Hubei, Peoples R China.
关键词:
Protein complexes;Algorithms;Protein interaction networks;Gene expression;Protein interactions;Forecasting;Genetic networks;Yeast
摘要:
How to identify protein complex is an important and challenging task in proteomics. It would make great contribution to our knowledge of molecular mechanism in cell life activities. However, the inherent organization and dynamic characteristic of cell system have rarely been incorporated into the existing algorithms for detecting protein complexes because of the limitation of protein-protein interaction (PPI) data produced by high throughput techniques. The availability of time course gene expression profile enables us to uncover the dynamics of molecular networks and improve the detection of protein complexes. In order to achieve this goal, this paper proposes a novel algorithm DCA (Dynamic Core-Attachment). It detects protein-complex core comprising of continually expressed and highly connected proteins in dynamic PPI network, and then the protein complex is formed by including the attachments with high adhesion into the core. The integration of core-attachment feature into the dynamic PPI network is responsible for the superiority of our algorithm. DCA has been applied on two different yeast dynamic PPI networks and the experimental results show that it performs significantly better than the state-of-the-art techniques in terms of prediction accuracy, hF-measure and statistical significance in biology. In addition, the identified complexes with strong biological significance provide potential candidate complexes for biologists to validate.