版权说明 操作指南
首页 > 成果 > 详情

Main content extraction from web pages based on node characteristics

认领
导出
Link by DOI
反馈
分享
QQ微信 微博
成果类型:
期刊论文
作者:
Liu, Qingtang;Shao, Mingbo;Wu, Linjing;Zhao, Gang;Fan, Guilin;...
通讯作者:
Liu, Qingtang(liuqtang@mail.ccnu.edu.cn)
作者机构:
[Fan, Guilin; Shao, Mingbo; Zhao, Gang; Liu, Qingtang; Wu, Linjing] School of Educational Information Technology, Central China Normal University, Wuhan, China
[Li, Jun] School of Information Engineering, Hubei University for Nationalities, Enshi, China
通讯机构:
School of Educational Information Technology, Central China Normal University, Wuhan, China
语种:
英文
关键词:
Content extraction;Hyperlink density;Text density;Web page
期刊:
Journal of Computing Science and Engineering
ISSN:
1976-4677
年:
2017
卷:
11
期:
2
页码:
39-48
机构署名:
本校为第一且通讯机构
院系归属:
教育信息技术学院
摘要:
Main content extraction of web pages is widely used in search engines, web content aggregation and mobile Internet browsing. However, a mass of irrelevant information such as advertisement, irrelevant navigation and trash information is included in web pages. Such irrelevant information reduces the efficiency of web content processing in content-based applications. The purpose of this paper is to propose an automatic main content extraction method of web pages. In this method, we use two indicators to describe characteristics of web pages: Text density and hyperlink density. According to conti...

反馈

验证码:
看不清楚,换一个
确定
取消

成果认领

标题:
用户 作者 通讯作者
请选择
请选择
确定
取消

提示

该栏目需要登录且有访问权限才可以访问

如果您有访问权限,请直接 登录访问

如果您没有访问权限,请联系管理员申请开通

管理员联系邮箱:yun@hnwdkj.com