期刊:
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS,2017年17(2):97-114 ISSN:1748-5673
通讯作者:
Wang, Yan
作者机构:
[Deng, Wenping; Wang, Yan] Hubei Univ Chinese Med, Informat Engn Coll, Wuhan 430065, Peoples R China.;[Mao, Mingzhi] China Univ Geosci, Sch Math & Phys, Wuhan 430074, Peoples R China.;[Li, Fang] Jing De Zhen Ceram Inst, Informat Engn Coll, Jiujiang 333403, Peoples R China.;[Shen, Shaowu] Hubei Univ Chinese Med, Informat Engn Coll, Inst Standardizat & Informat Technol, Wuhan 430065, Peoples R China.;[Jiang, Xingpeng] Cent China Normal Univ, Sch Comp Sci, Wuhan 430079, Peoples R China.
通讯机构:
[Wang, Yan] H;Hubei Univ Chinese Med, Informat Engn Coll, Wuhan 430065, Peoples R China.
关键词:
microbiome;microbial interactions;vector autoregression model;Laplacian regularisation;coordinate descent;penalty function
摘要:
The evolution of biotechnological knowledge poses some new challenges to study microbial interactions. Vector autoregressive (VAR) model was proved to be an efficient approach to infer dynamic interactions in biological systems. However, high-throughput metagenomics or 16S-rRNA sequencing data is high dimension, which means that the number of covariates is much larger than the number of observations. Reducing the dimension of data or selecting suitable covariates became a critical component VAR modelling. In this paper, we develop a graph-regularised vector autoregressive model incorporating network information to infer causal relationships among microbial entities. The method not only considers the signs of the network connections among any two covariates, but also constructs a network weighted matrix by microbial topology information. The coordinate descent algorithm for estimating model parameters improves the accuracy of prediction. The experimental results on a time series data set of human gut microbiomes indicate that the proposed approach has better performance than other VAR-based models with penalty functions.
作者机构:
[Ma, Yuanyuan] Cent China Normal Univ, Sch Informat Management, Wuhan, Peoples R China.;[Jiang, Xingpeng; He, Tingting; Hu, Xiaohua] Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.;[Ma, Yuanyuan] Anyang Normal Univ, Anyang, Peoples R China.
通讯机构:
[Jiang, Xingpeng] C;Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.
会议名称:
IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM)
会议时间:
DEC 15-18, 2016
会议地点:
Shenzhen, PEOPLES R CHINA
会议主办单位:
[Ma, Yuanyuan] Cent China Normal Univ, Sch Informat Management, Wuhan, Peoples R China.^[Hu, Xiaohua;He, Tingting;Jiang, Xingpeng] Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.^[Ma, Yuanyuan] Anyang Normal Univ, Anyang, Peoples R China.
会议论文集名称:
IEEE International Conference on Bioinformatics and Biomedicine-BIBM
关键词:
Human Microbiome;Laplacian Regularization;Multi-view Clustering;Symmetric Nonnegative Matrix Factorization
期刊:
Advances in Experimental Medicine and Biology,2017年1028:79-87 ISSN:0065-2598
通讯作者:
Jiang, X.
作者机构:
[Jiang, Xingpeng] School of Computer, Central China Normal University, Wuhan, Hubei, 430079, China. xpjiang@mail.ccnu.edu.cn;[Hu, Xiaohua] School of Computer, Central China Normal University, Wuhan, Hubei, 430079, China;[Hu, Xiaohua] College of Computing & Informatics, Drexel University, Philadelphia, PA, 19104, USA
通讯机构:
[Jiang, X.] S;School of Computer, Central China Normal University, Wuhan, Hubei, China
摘要:
Microbiome datasets are often comprised of different representations or views which provide complementary information, such as genes, functions, and taxonomic assignments. Integration of multi-view information for clustering microbiome samples could create a comprehensive view of a given microbiome study. Similarity network fusion (SNF) can efficiently integrate similarities built from each view of data into a unique network that represents the full spectrum of the underlying data. Based on this method, we develop a Robust Similarity Network Fusion (RSNF) approach which combines the strength of random forest and the advantage of SNF at data aggregation. The experimental results indicate the strength of the proposed strategy. The method substantially improves the clustering performance significantly comparing to several state-of-the-art methods in several datasets.
作者机构:
[Jiang, Xingpeng; He, Tingting; Hu, Xiaohua] Cent China Normal Univ, Sch Comp Sci, Wuhan 430079, Peoples R China.;[Hu, Xiaohua] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
通讯机构:
[He, Tingting] C;Cent China Normal Univ, Sch Comp Sci, Wuhan 430079, Peoples R China.
关键词:
microbiome;information distance;data visualization;density clustering;microbial community
摘要:
Clustering technology is a method for grouping data points into clusters containing a group of similar data points. In a real dataset such as microbiome data, the data points are presented as profiles or a probability distribution. These data points form the periphery of a cluster, making it difficult to identify the real clustering structure. In this study, we used density clustering on several distance measures to overcome this difficulty. Experiments using a real dataset indicated that the Manhattan distance is an appropriate distance measure for clustering analysis of microbiome data.
摘要:
Nonnegative matrix factorization (NMF) has received considerable attention due to its interpretation of observed samples as combinations of different components, and has been successfully used as a clustering method. As an extension of NMF, Symmetric NMF (SNMF) inherits the advantages of NMF. Unlike NMF, however, SNMF takes a nonnegative similarity matrix as an input, and two lower rank nonnegative matrices (H, H-T) are computed as an output to approximate the original similarity matrix. Laplacian regularization has improved the clustering performance of NMF and SNMF. However, Laplacian regularization (LR), as a classic manifold regularization method, suffers some problems because of its weak extrapolating ability. In this paper, we propose a novel variant of SNMF, called Hessian regularization based symmetric nonnegative matrix factorization (HSNMF), for this purpose. In contrast to Laplacian regularization, Hessian regularization fits the data perfectly and extrapolates nicely to unseen data. We conduct extensive experiments on several datasets including text data, gene expression data and HMP (Human Microbiome Project) data. The results show that the proposed method outperforms other methods, which suggests the potential application of HSNMF in biological data clustering. (C) 2016 Published by Elsevier Inc.
摘要:
The discovery of the community structure of real-world networks is still an open problem. Many methods have been proposed to shed light on this problem, and most of these have focused on discovering node community. However, link community is also a powerful framework for discovering overlapping communities. Here we present a novel edge label propagation algorithm (ELPA), which combines the natural advantage of link communities with the efficiency of the label propagation algorithm (LPA). ELPA can discover both link communities and node communities. We evaluated ELPA on both synthetic and real-world networks, and compared it with five state-of-the-art methods. The results demonstrate that ELPA performs competitively with other algorithms.
摘要:
Visualization is an important method of data analysis in the study of microbiome, with the dimensionality reduction techniques as its prerequisites for high-dimensional data. Multidimensional scaling (MDS), as a popular method for data visualization, can provide a low-dimensional representation of the original data utilizing its distance matrix. Meanwhile, the unique fraction metric (UniFrac) is a very reasonable and biologically meaningful metric for calculating distance matrices through a phylogenetic tree constructed from microbiome data. However, due to the complexity of the phylogenetic tree and the notable high dimensionality of the microbiome data, applying the MDS with UniFrac would require costly calculations. In this paper, we propose a novel dimensionality reduction algorithm based on Laplace matrix (DRLM) for microbiome data analysis. The experimental results from both synthesized and real microbiome data demonstrate the proposed DRLM is able to conduct more distinct clustering while significantly reducing the computational cost for the dimensionality reduction and visualization in the microbiome data analysis.
作者机构:
[Jiang, Xingpeng; Yang, Jincai; He, Tingting; Shen, Xianjun; Hu, Xiaohua; Yi, Li; Zhao, Yanli] Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.;[Hu, Xiaohua] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
通讯机构:
[Yang, Jincai] C;Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.
关键词:
*Clustering coefficient;*Neighbor affinity;*Temporal protein complex;*Time course protein interaction networks
摘要:
Detection of temporal protein complexes would be a great aid in furthering our knowledge of the dynamic features and molecular mechanism in cell life activities. Most existing clustering algorithms for discovering protein complexes are based on static protein interaction networks in which the inherent dynamics are often overlooked. We propose a novel algorithm DPC-NADPIN (Discovering Protein Complexes based on Neighbor Affinity and Dynamic Protein Interaction Network) to identify temporal protein complexes from the time course protein interaction networks. Inspired by the idea of that the tighter a protein’s neighbors inside a module connect, the greater the possibility that the protein belongs to the module, DPC-NADPIN algorithm first chooses each of the proteins with high clustering coefficient and its neighbors to consolidate into an initial cluster, and then the initial cluster becomes a protein complex by appending its neighbor proteins according to the relationship between the affinity among neighbors inside the cluster and that outside the cluster. In our experiments, DPC-NADPIN algorithm is proved to be reasonable and it has better performance on discovering protein complexes than the following state-of-the-art algorithms: Hunter, MCODE, CFinder, SPICI, and ClusterONE; Meanwhile, it obtains many protein complexes with strong biological significance, which provide helpful biological knowledge to the related researchers. Moreover, we find that proteins are assembled coordinately to form protein complexes with characteristics of temporality and spatiality, thereby performing specific biological functions.
作者机构:
[Jiang, Xingpeng; Yang, Jincai; He, Tingting; Shen, Xianjun; Hu, Xiaohua; Yi, Li] Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.;[Hu, Xiaohua] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
通讯机构:
[Yang, Jincai] C;Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.
关键词:
Protein complexes;Protein interactions;Protein interaction networks;Gene expression;Algorithms;Molecular evolution;Protein expression;Protein metabolism
摘要:
The identification of temporal protein complexes would make great contribution to our knowledge of the dynamic organization characteristics in protein interaction networks (PINs). Recent studies have focused on integrating gene expression data into static PIN to construct dynamic PIN which reveals the dynamic evolutionary procedure of protein interactions, but they fail in practice for recognizing the active time points of proteins with low or high expression levels. We construct a Time-Evolving PIN (TEPIN) with a novel method called Deviation Degree, which is designed to identify the active time points of proteins based on the deviation degree of their own expression values. Owing to the differences between protein interactions, moreover, we weight TEPIN with connected affinity and gene co-expression to quantify the degree of these interactions. To validate the efficiencies of our methods, ClusterONE, CAMSE and MCL algorithms are applied on the TEPIN, DPIN (a dynamic PIN constructed with state-of-the-art three-sigma method) and SPIN (the original static PIN) to detect temporal protein complexes. Each algorithm on our TEPIN outperforms that on other networks in terms of match degree, sensitivity, specificity, F-measure and function enrichment etc. In conclusion, our Deviation Degree method successfully eliminates the disadvantages which exist in the previous state-of-the-art dynamic PIN construction methods. Moreover, the biological nature of protein interactions can be well described in our weighted network. Weighted TEPIN is a useful approach for detecting temporal protein complexes and revealing the dynamic protein assembly process for cellular organization.
期刊:
PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE,2015年:1177-1183 ISSN:2156-1125
通讯作者:
Yuan, Jie
作者机构:
[Jiang, Xingpeng; He, Tingting; Yuan, Jie] Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.;[Guo, Xiyue; Wang, Yan] Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan, Peoples R China.
通讯机构:
[Yuan, Jie] C;Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.
会议名称:
IEEE International Conference on Bioinformatics and Biomedicine - Medical Informatics and Decision Making
会议时间:
NOV 09-12, 2015
会议地点:
Washington, DC
会议主办单位:
[Yuan, Jie;Jiang, Xingpeng;He, Tingting] Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.^[Wang, Yan;Guo, Xiyue] Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan, Peoples R China.
会议论文集名称:
IEEE International Conference on Bioinformatics and Biomedicine-BIBM
关键词:
PPI network;Phenotype ontology;protein complexes;resolution-limit-free clustering algorithm
期刊:
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS,2015年13(4):378-394 ISSN:1748-5673
通讯作者:
He, Tingting
作者机构:
[Wang, Yan] Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan 430079, Peoples R China.;[Jiang, Xingpeng; He, Tingting; Shen, Xianjun; Yuan, Jie] Cent China Normal Univ, Sch Comp Sci, Wuhan 430079, Peoples R China.
通讯机构:
[He, Tingting] C;Cent China Normal Univ, Sch Comp Sci, Wuhan 430079, Peoples R China.
摘要:
In this paper, we develop a novel regularisation method for MVAR via weighted fusion which considers the correlation among variables. In theory, we discuss the grouping effect of weighted fusion regularisation for linear models. By virtue of the probability method, we show that coefficients corresponding to highly correlated predictors have small differences. A quantitative estimate for such small differences is given regardless of the coefficients signs. The estimate is also improved when consider empirical approximation error if the model fit the data well. We then apply the proposed model on several time series data sets especially a time series dataset of human gut microbiomes. The experimental results indicate that the new approach has better performance than several other VAR-based models and we also demonstrate its capability of extracting relevant microbial interactions.
作者:
Wang, Yan*;Hu, Xiaohua;Jiang, Xingpeng(蒋兴鹏);He, Tingting(何婷婷);Yuan, Jie
期刊:
PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE,2015年:635-638 ISSN:2156-1125
通讯作者:
Wang, Yan
作者机构:
[Wang, Yan] Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan 430079, Peoples R China.;[Jiang, Xingpeng; He, Tingting; Hu, Xiaohua; Yuan, Jie] Cent China Normal Univ, Sch Comp Sci, Wuhan 430079, Peoples R China.;[Hu, Xiaohua] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
通讯机构:
[Wang, Yan] C;Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan 430079, Peoples R China.
会议名称:
IEEE International Conference on Bioinformatics and Biomedicine - Medical Informatics and Decision Making
会议时间:
NOV 09-12, 2015
会议地点:
Washington, DC
会议主办单位:
[Wang, Yan] Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan 430079, Peoples R China.^[Hu, Xiaohua;Jiang, Xingpeng;He, Tingting;Yuan, Jie] Cent China Normal Univ, Sch Comp Sci, Wuhan 430079, Peoples R China.^[Hu, Xiaohua] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA.
会议论文集名称:
IEEE International Conference on Bioinformatics and Biomedicine-BIBM
关键词:
Coordinate descent;Microbial interactions;Microbiome;Time series analysis;Vector autoregression model