期刊:
ICBDE '20: Proceedings of the 2020 3rd International Conference on Big Data and Education,2020年2:Pages 37–42
作者机构:
[Yong Zhang; Fen Chen; Wufeng Zhang; Haoyang Zuo; Fangyuan Yu] Computer School, Central China Normal University, Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Wuhan, China
会议论文集名称:
ICBDE '20: Proceedings of the 2020 3rd International Conference on Big Data and Education
摘要:
In order to improve the performance of keyword extraction by enhancing the semantic representations of documents, we propose a method of keyword extraction which exploits the document's internal semantic information and the semantic representations of words pre-trained by massive external documents. Firstly, we utilize the deep learning tool Word2Vec to characterize the external document information, and evaluate the similarity between the words by the cosine distance, thus we obtain the semantic information between words in the external documents. Then, the word-to-word similarity is used to replace the probability transfer matrix in the TextRank of word graph of the target document. At the same time, the information of the title and the abstract of the internal document are exploited to construct the words' semantic graph for keyword extraction. The experiments select the related academic paper data from AMiner as experimental data set. The experimental results show that our method outperforms the TextRank algorithm and the precision, recall and F-score of the five keywords are increased by 28.60%, 10.70% and 12.90% respectively compared to the single TextRank algorithm.
期刊:
ICBDE '20: Proceedings of the 2020 3rd International Conference on Big Data and Education,2020年:Pages 30–36
作者机构:
[Ying Su] College of Information Science and Engineering, Wuchang Shouyi University, Wuhan, China;[Yong Zhang] Computer School, Central China Normal University, Wuhan, Hubei Province, China
会议论文集名称:
ICBDE '20: Proceedings of the 2020 3rd International Conference on Big Data and Education
关键词:
Big data;Graphic methods;Natural language processing systems;Semantics;Automatic construction;Automatic construction methods;Educational Applications;Evaluation models;Manual annotation;Normalized Google distances;Semantic similarity;Teaching resources;Knowledge representation
摘要:
In this paper, we propose an automatic construction method of subject knowledge graph for educational applications. The subject knowledge graph is constructed based on educational big data by using a bootstrapping strategy to gradually expand knowledge points and connections between them. In this paper two different datasets are used. One is the subject teaching resources such as syllabuses, teaching plans, textbooks and etc., which is used to automatically construct the core of subject knowledge graph so as to reduce the dependence on the manual annotation. Meanwhile the high-quality of subject teaching resources is the guarantee of accuracy of the knowledge graph core. The other dataset is the massive Internet encyclopedia texts, which is used to expand and complete the subject knowledge graph. As to algorithm, this paper utilizes the BERT-BiLSTM-CRF model to automatically identify the subject knowledge points, and then evaluates the relationship between the knowledge points by calculating their semantic similarity, PMI and Normalized Google Distance between them. The experimental results show that BERT-BiLSTM-CRF outperforms the baselines significantly, and the three kinds of relationship evaluation models have achieved good results. Finally, computer science and physics science are taken as examples to construct the subject knowledge graphs successfully, which show the effectiveness of our method.
作者机构:
[Sun, Bo; Pan, Min] Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan 430079, Hubei, Peoples R China.;[Pan, Min] Hubei Normal Univ, Sch Comp & Informat Engn, Huangshi 435002, Hubei, Peoples R China.;[Jiang, Xingpeng; He, Tingting; Zhang, Yue; Zhu, Qiang] Cent China Normal Univ, Sch Comp, Wuhan 430079, Hubei, Peoples R China.
通讯机构:
[He, Tingting] C;Cent China Normal Univ, Sch Comp, Wuhan 430079, Hubei, Peoples R China.
会议名称:
International Conference on Intelligent Computing (ICIC) / Intelligent Computing and Biomedical Informatics (ICBI) Conference - Medical Informatics and Decision Making
会议时间:
AUG 15-18, 2018
会议地点:
PEOPLES R CHINA
会议主办单位:
[Pan, Min;Sun, Bo] Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan 430079, Hubei, Peoples R China.^[Pan, Min] Hubei Normal Univ, Sch Comp & Informat Engn, Huangshi 435002, Hubei, Peoples R China.^[Zhang, Yue;Zhu, Qiang;He, Tingting;Jiang, Xingpeng] Cent China Normal Univ, Sch Comp, Wuhan 430079, Hubei, Peoples R China.
摘要:
BACKGROUND: In order to better help doctors make decision in the clinical setting, research is necessary to connect electronic health record (EHR) with the biomedical literature. Pseudo Relevance Feedback (PRF) is a kind of classical query modification technique that has shown to be effective in many retrieval models and thus suitable for handling terse language and clinical jargons in EHR. Previous work has introduced a set of constraints (axioms) of traditional PRF model. However, in the feedback document, the importance degree of candidate term and the co-occurrence relationship between a candidate term and a query term. Most methods do not consider both of these factors. Intuitively, terms that have higher co-occurrence degree with a query term are more likely to be related to the query topic. METHODS: In this paper, we incorporate original HAL model into the Rocchio's model, and propose a new concept of term proximity feedback weight. A HAL-based Rocchio's model in the query expansion, called HRoc, is proposed. Meanwhile, we design three normalization methods to better incorporate proximity information to query expansion. Finally, we introduce an adaptive parameter to replace the length of sliding window of HAL model, and it can select window size according to document length. RESULTS: Based on 2016 TREC Clinical Support medicine dataset, experimental results demonstrate that the proposed HRoc and HRoc_AP models superior to other advanced models, such as PRoc2 and TF-PRF methods on various evaluation metrics. Among them, compared with the Proc2 and TF-PRF models, the MAP of our model is increased by 8.5% and 12.24% respectively, while the F1 score of our model is increased by 7.86% and 9.88% respectively. CONCLUSIONS: The proposed HRoc model can effectively enhance the precision and the recall rate of Information Retrieval and gets a more precise result than other models. Furthermore, after introducing self-adaptive parameter, the advanced HRoc_AP model uses less hyper-parameters than other models while enjoys an equivalent performance, which greatly improves the efficiency and applicability of the model and thus helps clinicians to retrieve clinical support document effectively.
期刊:
ACM International Conference Proceeding Series,2019年:43-47
通讯作者:
Zhang, Yong
作者机构:
[Li, Yu; Zhao, Jingjing; Yang, Liping; Zhang, Yong] Cent China Normal Univ, Comp Sch, Wuhan, Hubei, Peoples R China.
通讯机构:
[Zhang, Yong] C;Cent China Normal Univ, Comp Sch, Wuhan, Hubei, Peoples R China.
会议论文集名称:
ICBDE '19: Proceedings of the 2019 International Conference on Big Data and Education
关键词:
Big data;Knowledge representation;Visualization;Websites;Entity relation extractions;Graph visualization;Intelligent educations;Knowledge graphs;Normalized Google distances;Engineering education
摘要:
To make full use of specialized vocabulary in computer science and discover relationships among these words, a Chinese knowledge graph of computer science major is constructed based on the internet web pages, and then the knowledge graph visualization and application for learning guidance based on it are developed. For the construction of computer science knowledge graph, a small amount of important specialized words in computer science are collected manually, and then these words are extended based on Baidu Baike (baike.baidu.com). Thus we get about 3000 specialized words (called entries). The similarity between two entries is calculated based on the Normalized Google Distance (NGD). Once the similarity is greater than a setting value, a link between the two entries is created. Finally the knowledge graph is constructed by these words and links between them. Here the relation type of link is ignored for simplicity. Furthermore the graph visualization is implemented by a tool called sigma.js, and an application for learning guidance is developed by J2EE. Through the application, students can get a visualized overview of computer science major and make a learning plan efficiently. Moreover the application and method of knowledge graph construction can be applied for other majors easily.
作者机构:
[Tan, Liansheng; Zhang, Yongchang] Cent China Normal Univ, Dept Comp Sci, Wuhan 43007, Peoples R China.
通讯机构:
[Tan, Liansheng] C;Cent China Normal Univ, Dept Comp Sci, Wuhan 43007, Peoples R China.
关键词:
Network utility maximization (NUM);Resource allocation;Wireless network;Fairness index;Principle of equality and diminishing marginal utility (PEDMU)
摘要:
In this paper, we study the optimal resource allocation problem in a wireless network, where all types of traffic including best effort and quality of service (QoS; Soft QoS and Hard QoS) are described by a unified utility function. The attacked problem is casted into a network utility maximization (NUM) model. We formulate the fairness index in terms of users’ utility and traffic type parameters, and then study their relationships. Law of diminishing marginal utility is widely accepted in economics. In this paper, we establish the principle of equality and diminishing marginal utility that enables us to find the desired optimal solution to the NUM model by using this principle, correspondingly for the case where the total resource is sufficient and for the case where the total resource is insufficient. We propose some essential theorems and algorithms to find the optimal solution for the above two cases. The proposed algorithms are evaluated via simulation results. The theoretical analysis and simulation results not only validate the efficacy and efficiency of the proposed algorithms but also disclose the relation between the optimal resource allocation and the factors of traffic types, total available resource and user’s channel quality and the relation between fairness and total resource with respect to a certain allocation scheme.
摘要:
Protein-protein interaction plays an important role in understanding biological processes. In order to resolve the parsing error resulted from modal verb phrases and the noise interference brought by appositive dependency, an improved tree kernel-based PPI extraction method is proposed in this paper. Both modal verbs and appositive dependency features are considered to define some relevant processing rules which can effectively optimize and expand the shortest dependency path between two proteins in the new method. On the basis of these rules, the effective optimization and expanding path is used to direct the cutting of constituent parse tree, which makes the constituent parse tree for protein-protein interaction extraction more precise and concise. The experimental results show that the new method achieves better results on five commonly used corpora.
摘要:
Manifold ranking is one of the most competitive approaches for query-focused multi-document summarization. Despite its success for this task, it usually constructs a sentence affinity graph first based on inter-sentence content similarity, and then perform manifold ranking on the graph to score each sentence with the assumption that all the sentences live on a single manifold. Actually, for a document set to be summarized, the distribution of the sentences might form different, but related manifolds. This paper aims to generalize the basic manifold-ranking based approach to the more generic setting by introducing a novel affinity graph to estimate the similarity between sentences, which leverages both the local geometric structures and the contents of sentences jointly. Preliminary experimental results on the DUC datasets demonstrate the good effectiveness of the proposed approach.
摘要:
The evaluation of node importance in complex networks has been an increasing widespread concern in recent years. Seeking and protecting vital nodes is important to ensure the security and stability of the whole network. Existing clustering algorithms of complex networks all have certain drawbacks, which could not cover everything in calculation accuracy and time complexity, and need external supervision. To design a fast complex networks clustering method is a problem which requires to be solved immediately. This paper proposes a clustering algorithm of complex networks based on data field using physical data field theory, which excavates key nodes in complex networks by evaluating the importance of nodes based on a mutual information algorithm, and then uses it to classify the clusters. To verify the validity of the algorithm, a simulation experiment was conducted. The results indicated that the algorithm could analyze the cluster exactly and calculate with high-speed, it could also determine the granularity of a partition according to the actual demand.
期刊:
Lecture Notes in Computer Science,2013年7995 LNCS:112-119 ISSN:0302-9743
通讯作者:
Lin, Pengxiang
作者机构:
[He, Tingting; Lin, Pengxiang; Zhang, Yong] Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.
通讯机构:
[Lin, Pengxiang] C;Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.
会议名称:
Proceedings of the 9th international conference on Intelligent Computing Theories
关键词:
Community detection;Label-Influence-Algorithm;Micro-blog;Module-based;Query expansion;Query words;Sina-Microblog;Word matches;Algorithms;Intelligent computing;Social networking (online);Population dynamics
摘要:
In this paper, we investigate the current software architecture of Twitter searching, and propose a new Microblog Searching Module (MSM) to retrieve microblog messages. MSM mainly consists of three parts. The first one is community detection with Label-Influence-Algorithm (LIA). We have conducted series of experiments in two data sets downloaded from the Sina-Microblog. And the results show that the modularity measure Q of the communities discovered by LIA is well improved. The second one is extracting microblog tags of a microblog user and the community. The last part is designing a module to expand the query word using the Hownet instead of the exact word match. The application of the Microblog Searching Module proves that the module can search the interesting topic and persons conveniently.
期刊:
Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition),2013年41(SUPPL.2):232-236 ISSN:1671-4512
期刊:
Lecture Notes in Computer Science,2013年7717 LNAI:84-93 ISSN:0302-9743
通讯作者:
Su, Y.(suying929@163.com)
作者机构:
[Ying Su] Department of Computer and Electronic, Huazhong University of Science and Technology Wuchang Branch, Wuhan, China;[Hongmiao Wu] School of Foreign Languages and Literature, Wuhan University, Wuhan 430072, China;[Yibing Wang] Third Faculty, Second Artillery Command College, China;[Yong Zhang] Department of Computer Science, Huazhong Normal University, Wuhan, China;[Yong Zhang; Donghong Ji] Computer School, Wuhan University, Wuhan, 430072, China
会议名称:
Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
期刊:
Lecture Notes in Computer Science,2013年7818 LNAI(PART 1):402-413 ISSN:0302-9743
作者机构:
[Zhang, Yong; Ji, Dong-Hong] Computer School, Wuhan University, Wuhan, China;[Wu, Hongmiao] School of Foreign Languages and Literature, Wuhan University, China;[Su, Ying] Department of Computer Science, Wuchang Branch, Huazhong University of Science and Technology, Wuhan, China;[Zhang, Yong] Department of Computer Science, Huazhong Normal University, Wuhan, China
会议名称:
17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013
摘要:
In this paper, we investigate features and propose a method to identify influential users on Sina-Weibo, one of the most famous micro-blogging services in China. We first investigate features such as users' follower number distribution, relation between Weibo number and follower number and analysis of user interaction. Due to the existing methods are not very comprehensive in measuring the influence of user, we propose a new model. In which, we take the three basic actions: following, retweeting and commenting into consideration. Based on the weight and networks of them, we construct a weighted network, then employ Weighted PageRank and Hypertext Induced Topic Selection algorithm to calculate user influence. Compared with other two methods, the experiment results suggest that our model offers a new way to identify influential user, and it is more comprehensive and stable than the other two.
关键词:
Sentiment Analysis;Topic Model;Author-Review-Object Model
摘要:
In this paper, we propose a probabilistic generative model for online review sentiment analysis, called joint Author-Review-Object Model (ARO). The users, objects and reviews form a heterogeneous graph in online reviews. The ARO model focuses on utilizing the user-review-object graph to improve the traditional sentiment analysis. It detects the sentiment based on not only the review content but also the author and object information. Preliminary experimental results on three datasets show that the proposed model is an effective strategy for jointly considering the various factors for the sentiment analysis.