作者:
Ye, Guanghui;Wang, Cancan;Wu, Chuan;Peng, Ze;Wei, Jinyu;...
期刊:
Journal of Informetrics,2023年17(3):101421 ISSN:1751-1577
通讯作者:
Wu, C
作者机构:
[Wu, Chuan; Peng, Ze; Wu, C; Tan, Qitao; Wu, Lanqi; Ye, Guanghui; Wei, Jinyu; Song, Xiaoying; Wang, Cancan] Cent China Normal Univ, Sch Informat Management, Wuhan, Peoples R China.
通讯机构:
[Wu, C ] C;Cent China Normal Univ, Sch Informat Management, Wuhan, Peoples R China.
关键词:
Research front detection;Research grant information;Evolution analysis;Health informatics
摘要:
Identifying research fronts is an essential aspect of promoting scientific development. Many re-searchers choose their research directions and topics by analyzing their field's current research fronts. Many previous researchers have used academic papers or patents to identify research fronts; however, this is potentially outdated and reduces the prospective value of the research front detection. Considering this, this work proposes adapted indicators to conduct research front topic detection based on research grant data, which aims to identify research front topics and fore-cast trends using path analysis. First, research topics were identified using topic modeling, and then the mapping relations from topics to both fund projects and cross-domain categories were built. Then, research front topics were detected by multi-dimensional measurements, and the evo-lution of research topics was analyzed using topic evolution visualization to predict development trends. Finally, the Brillouin index was used to measure the cross-domain degree. Our method was evaluated using a dataset from the field of health informatics and was shown to be effective in research front identification. We found that the proposed adapted indicators were informative in identifying the evolutional trends in the health informatics field. In addition, research grants with higher cross-domain degrees are more likely to receive a high amount of funding.
摘要:
Unsupervised sentence embedding learning is a fundamental task in natural language processing. Recently, unsupervised contrastive learning based on pre-trained language models has shown impressive performance in sentence embedding learning. This method aims to align positive sentence pairs while pushing apart negative sentence pairs to achieve semantic uniformity in the representation space. However, most previous literature leverages a random strategy to sample negative pairs, which suffers from the risk of selecting uninformative negative examples (e.g., easily distinguishable examples, anisotropic representations), thus greatly affecting the quality of learned representations. To address this issue, we propose nmCSE, a negative mining contrastive learning method for sentence embedding. Specifically, we introduce distance moderation and spatial uniformity as two properties of informative negative examples, and devise distance-based weighting and grid sampling as two strategies to preserve these properties, respectively. Our proposal outperforms the random strategy across seven semantic textual similarity datasets. Furthermore, our method can easily be adapted to other contrastive learning scenarios (e.g., vision), and does not introduce significant computational overhead.
摘要:
Social media data are used to enhance crisis management, as people widely adopt social media to share and acquire information to cope with uncertainties in crises. Identification and extraction of informative communications out of large volumes of data is critical for accurate situational awareness and timely response. Existing studies use conditions of geolocations, keywords, and topics separately or jointly to retrieve data that can be crisis related, but are not enough to filter subsets of data for different crisis management tasks. We propose that the crisis communication purposes of users can be detected to enhance data selection and prioritization for different crisis management tasks. A classification framework was built to identify three facets of a message: content type, audience type, and information source. The definitions of these categories are not dependent on a specific type of crises. So the classification framework can be potentially applied to different crisis scenarios. Machine learning models were created for the automatic classification of messages. Results showed the CNN-based model achieved the best accuracy (88.5%) for the classification of content type. The proposed Naive Bayes and logistic repression with predetermined features can best differentiate audience types and information source with an accuracy of 72.7% and 72.2%, respectively.