版权说明 操作指南
首页 > 成果 > 详情

MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios

认领
导出
Link by DOI
反馈
分享
QQ微信 微博
成果类型:
期刊论文
作者:
Zheng, Qiuyu;Chen, Zengzhao;Liu, Hai;Lu, Yuanyuan;Li, Jiawen;...
通讯作者:
Zengzhao Chen
作者机构:
[Lu, Yuanyuan; Chen, Zengzhao; Li, Jiawen; Zheng, Qiuyu; Liu, Hai] Cent China Normal Univ, Fac Artificial Intelligence Educ, Wuhan 430079, Peoples R China.
[Lu, Yuanyuan; Li, Jiawen; Zheng, Qiuyu] Cent China Normal Univ, Natl Engn Res Ctr Educ Big Data, Wuhan 430079, Peoples R China.
[Chen, Zengzhao; Liu, Hai] Cent China Normal Univ, Natl Engn Res Ctr Elearning, Wuhan 430079, Peoples R China.
[Liu, Tingting] Hubei Univ, Sch Educ, Wuhan 430062, Peoples R China.
通讯机构:
[Zengzhao Chen] F
Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430079, China<&wdkj&>National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China
语种:
英文
关键词:
Alterable scenarios;Attention mechanism;Embedding extraction;Frame-level features;Speaker verification
期刊:
Expert Systems with Applications
ISSN:
0957-4174
年:
2023
卷:
217
页码:
119511
基金类别:
The authors thank the editor and anonymous reviewers for their valuable suggestions. This work has been supported by the National Natural Science Foundation of China (Grant No. 62077022 , 61875068 , 62211530433 , 62177018 , 62011530436 , 62277041 , 62005092 , 62077020 ), the National Teacher Development Collaborative Innovation Experimental Base Construction Research Project of Central China Normal University (No. CCNUTEIII 2021-21 ), and the National Key R&D Program of China ( 2021YFC3340802 ), and was supported in part by the National Natural Science Foundation of Hubei Province, China under Grant (No. 2022CFB971 ), the China Unicom Hubei Branch Bilateral Cooperation Research Funds under Grant 2021111002002004 and “Universities Helping Counties” Research Funds of Hubei Province, China under Grant BXLBX0192 .
机构署名:
本校为第一机构
院系归属:
国家数字化学习工程技术研究中心
摘要:
Speaker embeddings have become the most popular feature representation in speaker verification. Improving the robustness of speaker embedding extraction systems is a crucial problem. A multi-scale residual aggregation network (MSRANet), which is a simple but efficient network with triplet input and triplet loss, is proposed in this paper. Two different aggregation strategies are utilized in frame-level feature extractors to capture long-term variations in speaker characteristics. Attention mechanism is employed to filter a large number of parameters in temporal and frequency dimensions, which ...

反馈

验证码:
看不清楚,换一个
确定
取消

成果认领

标题:
用户 作者 通讯作者
请选择
请选择
确定
取消

提示

该栏目需要登录且有访问权限才可以访问

如果您有访问权限,请直接 登录访问

如果您没有访问权限,请联系管理员申请开通

管理员联系邮箱:yun@hnwdkj.com