版权说明 操作指南
首页 > 成果 > 详情

Semantic Cues Enhanced Multimodality Multistream CNN for Action Recognition

认领
导出
Link by DOI
反馈
分享
QQ微信 微博
成果类型:
期刊论文
作者:
Tu, Zhigang*;Xie, Wei(谢伟);Dauwels, Justin;Li, Baoxin;Yuan, Junsong
通讯作者:
Tu, Zhigang
作者机构:
[Tu, Zhigang] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & R, Wuhan 430079, Hubei, Peoples R China.
[Xie, Wei] Cent China Normal Univ, Sch Comp, Wuhan 430079, Hubei, Peoples R China.
[Dauwels, Justin] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 637553, Singapore.
[Li, Baoxin] Arizona State Univ, Sch Comp, Decis Syst Engn, Informat, Tempe, AZ 85287 USA.
[Yuan, Junsong] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USA.
通讯机构:
[Tu, Zhigang] W
Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & R, Wuhan 430079, Hubei, Peoples R China.
语种:
英文
关键词:
Action recognition;multi-modalities;multi-stream CNN;semantic cues;spatiotemporal saliency estimation;video object detection
期刊:
IEEE Transactions on Circuits and Systems for Video Technology
ISSN:
1051-8215
年:
2019
卷:
29
期:
5
页码:
1423-1437
基金类别:
Manuscript received January 17, 2018; revised April 1, 2018; accepted April 21, 2018. Date of publication April 25, 2018; date of current version May 3, 2019. This work is supported in part by the Singapore Ministry of Education Academic Research Fund Tier 2 under Grant MOE2015-T2-2-114, in part by the National Natural Science Foundation of China under Grant 61501198, in part by the Natural Science Foundation of Hubei Province under Grant 2014CFB461, and in part by the University at Buffalo. This paper was recommended by Associate Editor G.-J. Qi. (Corresponding author: Zhigang Tu.) Z. Tu is with the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China (e-mail: tuzhigang1986@gmail.com).
机构署名:
本校为其他机构
院系归属:
计算机学院
摘要:
This paper addresses the issue of video-based action recognition by exploiting an advanced multi-stream Convolutional Neural Network (CNN) to fully use semantics-derived multiple modalities in both spatial (appearance) and temporal (motion) domains, since the performance of the CNN-based action recognition methods heavily relate to two factors: semantic visual cues and the network architecture. Our work consists of two major parts. First, to extract useful human-related semantics accurately, we propose a novel spatiotemporal saliency based video object segmentation (STS-VOS) model. By fusing d...

反馈

验证码:
看不清楚,换一个
确定
取消

成果认领

标题:
用户 作者 通讯作者
请选择
请选择
确定
取消

提示

该栏目需要登录且有访问权限才可以访问

如果您有访问权限,请直接 登录访问

如果您没有访问权限,请联系管理员申请开通

管理员联系邮箱:yun@hnwdkj.com