Main content extraction from web pages based on node characteristics

首页 > 成果 > 详情

认领

导出

Link by DOI

反馈

作者信息关键词期刊信息基础信息归属信息摘要

成果类型：

期刊论文

作者：

Qingtang Liu;Mingbo Shao;Linjing Wu;Gang Zhao;Guilin Fan;...

通讯作者：

Liu, Qingtang(liuqtang@mail.ccnu.edu.cn)

作者机构：

[Qingtang Liu; Mingbo Shao; Linjing Wu; Gang Zhao; Guilin Fan] School of Educational Information Technology, Central China Normal University, Wuhan, China

[Jun Li] School of Information Engineering, Hubei University for Nationalities, Enshi, China

通讯机构：

School of Educational Information Technology, Central China Normal University, Wuhan, China

语种：

英文

关键词：

Extraction;Hypertext systems;Information retrieval systems;Search engines;Content extraction;Continuous distribution;Estimation algorithm;Hyperlinks;Mobile Internet;Neighboring nodes;News websites;Web content aggregations;Websites

期刊：

Journal of Computing Science and Engineering

ISSN：

1976-4677

年：

2017

卷：

期：

页码：

39-48

DOI：

10.5626/JCSE.2017.11.2.39

机构署名：

本校为第一且通讯机构

院系归属：

教育信息技术学院

摘要：

Main content extraction of web pages is widely used in search engines, web content aggregation and mobile Internet browsing. However, a mass of irrelevant information such as advertisement, irrelevant navigation and trash information is included in web pages. Such irrelevant information reduces the efficiency of web content processing in content-based applications. The purpose of this paper is to propose an automatic main content extraction method of web pages. In this method, we use two indicators to describe characteristics of web pages: Text...

反馈

产权有误：本人成果被他人认领

数据有误：数据基本信息有误

归属有误：成果的院系归属、机构署名归属有误

其他原因：

验证码：

看不清楚，换一个

确定

取消

成果认领

标题：

用户	作者	通讯作者	--
	请选择	请选择	--

确定

取消

Main content extraction from web pages based on node characteristics

反馈

成果认领

提示

该栏目需要登录且有访问权限才可以访问