摘要:
Pseudo-label (PL) learning-based methods usually regard class confidence above a certain threshold for unlabeled samples as PLs, which may result in PLs still containing wrong labels. In this letter, we propose a prototype-based PL refinement (PPLR) for semi-supervised hyperspectral image (HSI) classification. The proposed PPLR filters wrong labels from PLs using class prototypes, which can improve the discrimination of the network. First, PPLR uses multihead attentions (MHAs) to extract the spectral-spatial features, and designs an adaptive threshold that can be dynamically adjusted to generate high-confidence PLs. Then, PPLR constructs class prototypes for different categories using labeled sample features and unlabeled sample features with refined PLs to improve the quality of PLs by filtering wrong labels. Finally, PPLR further assigns reliable weights (RWs) to these PLs in calculating their supervised loss, and introduces a center loss (CL) to improve the discrimination of features. When ten labeled samples per category are utilized for training, PPLR achieves the overall accuracies of 82.11%, 86.70%, and 92.50% on the Indian Pines (IP), Houston2013, and Salinas datasets, respectively.
期刊:
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS,2024年21 ISSN:1545-598X
通讯作者:
Sun, Hao;Xie, W
作者机构:
[Xu, Hao; Tan, Cheng; Xie, W; Xie, Wei; Sun, Hao; Sun, H] Cent China Normal Univ, Sch Comp Sci, Hubei Prov Key Lab Artificial Intelligence & Smart, Wuhan 430079, Peoples R China.;[Xu, Hao; Tan, Cheng; Xie, W; Xie, Wei; Sun, Hao; Sun, H] Cent China Normal Univ, Natl Language Resources Monitoring & Res Ctr Netwo, Wuhan 430079, Peoples R China.;[Chen, Wenjing] Hubei Univ Technol, Sch Comp Sci, Wuhan 430068, Peoples R China.;[Ning, Hailong] Xian Univ Posts & Telecommun, Sch Comp Sci & Technol, Xian 710121, Peoples R China.
通讯机构:
[Xie, W ; Sun, H] C;Cent China Normal Univ, Sch Comp Sci, Hubei Prov Key Lab Artificial Intelligence & Smart, Wuhan 430079, Peoples R China.;Cent China Normal Univ, Natl Language Resources Monitoring & Res Ctr Netwo, Wuhan 430079, Peoples R China.
摘要:
Recently, some literature has begun to pay attention to the open-set problem in remote sensing application scenarios and studied various open-set hyperspectral image classification (OSHIC) methods. These OSHIC methods are usually based on deep neural networks, using the nondirectional Euclidean distance losses to constrain latent sample representations of known classes to be compact. Nonetheless, the potential effect of the spatial distribution of sample representations is ignored, resulting in degraded classification performance in OSHIC. In this letter, we propose an orientational clustering learning (OCL) method for OSHIC. First, in the feature space generated by the convolutional neural network, a class anchor strategy is employed to bring features of the same class closer while keeping features of different classes distant. Then, we utilize the orientational learning to further tighten the intraclass feature space. OCL directionally optimizes the spatial distribution of hyperspectral sample representations to improve the ability to identify known classes and distinguish unknown classes. Experiments show that the OCL achieves overall accuracies of 94.43%, 92.27%, and 76.94% on the Pavia University, Salinas, and Indian Pines datasets, respectively.
摘要:
Due to various factors such as differential light absorption across wave-lengths, light scattering and refraction by suspended particles, and potential constraints of underwater imaging equipment, image acquisition in underwater settings is inherently constrained. Consequently, these factors contribute to the degradation of image contrast and visibility, along with color distortion. Considering these problems, we propose an image enhancement method that combines physical and non-physical models and utilizes fusion. In our proposed method, we calculate the medium transmittance using the dark channel prior for underwater degraded images and then corrected for gain with respect to the Y channel on the CIE-XYZ color space. Then, we employ an improved automatic parameter calculation contrast stretching technique to enhance its contrast, while also sharpening it. The two processed images are then fused at multiple scales, combining the advantages of guided filtering to retain the global structural information and integrate the visual detail information, and eventually the enhanced image with higher quality is obtained. This method can effectively enhance underwater images, maintain color fidelity and stability, improve contrast and sharpness, and have strong robustness and adaptability. We tested our method on a no-reference image dataset using four no-reference image quality evaluation metrics and achieved better results compared to eight other state-of-the-art image enhancement methods. It can also be generalized to remote sensing images and night images without adjusting the parameters.
期刊:
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5,2024年38(5):4998-5007 ISSN:2159-5399
通讯作者:
Xie, W;Chen, WJ
作者机构:
[Zhou, Mingyao; Xie, W; Xie, Wei; Sun, Hao] Cent China Normal Univ, Hubei Prov Key Lab Artificial Intelligence & Sma, Wuhan, Peoples R China.;[Zhou, Mingyao; Xie, W; Xie, Wei; Sun, Hao] Cent China Normal Univ, Sch Comp Sci, Wuhan, Peoples R China.;[Zhou, Mingyao; Xie, W; Xie, Wei; Sun, Hao] Cent China Normal Univ, Natl Language Resources Monitoring & Res Ctr Netw, Wuhan, Peoples R China.;[Chen, Wenjing] Hubei Univ Technol, Sch Comp Sci, Wuhan, Peoples R China.
通讯机构:
[Chen, WJ ] H;[Xie, W ] C;Cent China Normal Univ, Hubei Prov Key Lab Artificial Intelligence & Sma, Wuhan, Peoples R China.;Cent China Normal Univ, Sch Comp Sci, Wuhan, Peoples R China.;Cent China Normal Univ, Natl Language Resources Monitoring & Res Ctr Netw, Wuhan, Peoples R China.
会议名称:
38th AAAI Conference on Artificial Intelligence (AAAI) / 36th Conference on Innovative Applications of Artificial Intelligence / 14th Symposium on Educational Advances in Artificial Intelligence
会议时间:
FEB 20-27, 2024
会议地点:
Vancouver, CANADA
会议主办单位:
[Sun, Hao;Zhou, Mingyao;Xie, Wei] Cent China Normal Univ, Hubei Prov Key Lab Artificial Intelligence & Sma, Wuhan, Peoples R China.^[Sun, Hao;Zhou, Mingyao;Xie, Wei] Cent China Normal Univ, Sch Comp Sci, Wuhan, Peoples R China.^[Sun, Hao;Zhou, Mingyao;Xie, Wei] Cent China Normal Univ, Natl Language Resources Monitoring & Res Ctr Netw, Wuhan, Peoples R China.^[Chen, Wenjing] Hubei Univ Technol, Sch Comp Sci, Wuhan, Peoples R China.
会议论文集名称:
AAAI Conference on Artificial Intelligence
摘要:
<jats:p>Video moment retrieval (MR) and highlight detection (HD) based on natural language queries are two highly related tasks, which aim to obtain relevant moments within videos and highlight scores of each video clip. Recently, several methods have been devoted to building DETR-based networks to solve both MR and HD jointly. These methods simply add two separate task heads after multi-modal feature extraction and feature interaction, achieving good performance. Nevertheless, these approaches underutilize the reciprocal relationship between two tasks. In this paper, we propose a task-reciprocal transformer based on DETR (TR-DETR) that focuses on exploring the inherent reciprocity between MR and HD. Specifically, a local-global multi-modal alignment module is first built to align features from diverse modalities into a shared latent space. Subsequently, a visual feature refinement is designed to eliminate query-irrelevant information from visual features for modal interaction. Finally, a task cooperation module is constructed to refine the retrieval pipeline and the highlight score prediction process by utilizing the reciprocity between MR and HD. Comprehensive experiments on QVHighlights, Charades-STA and TVSum datasets demonstrate that TR-DETR outperforms existing state-of-the-art methods. Codes are available at https://github.com/mingyao1120/TR-DETR.</jats:p>
作者机构:
[Zhou, Mingyao; Dong, Ming; Sun, Hao; Xie, Wei] Cent China Normal Univ, Hubei Prov Key Lab Artificial Intelligence & Smart, 152 Luoyu Rd, Wuhan 430079, Hubei, Peoples R China.;[Zhou, Mingyao; Dong, Ming; Sun, Hao; Xie, Wei] Cent China Normal Univ, Sch Comp Sci, 152 Luoyu Rd, Wuhan 430079, Hubei, Peoples R China.;[Zhou, Mingyao; Dong, Ming; Sun, Hao; Xie, Wei] Cent China Normal Univ, Natl Language Resources Monitoring & Res Ctr Netwo, 152 Luoyu Rd, Wuhan 430079, Hubei, Peoples R China.;[Chen, Wenjing] Hubei Univ Technol, Sch Comp Sci, 28 Nanli Rd, Wuhan 430068, Hubei, Peoples R China.;[Lu, Xiaoqiang] Fuzhou Univ, Coll Phys & Informat Engn, 2 Wulong Jiangbei Ave, Fuzhou 350002, Fujian, Peoples R China.
通讯机构:
[Sun, H ] C;Cent China Normal Univ, Hubei Prov Key Lab Artificial Intelligence & Smart, 152 Luoyu Rd, Wuhan 430079, Hubei, Peoples R China.;Cent China Normal Univ, Sch Comp Sci, 152 Luoyu Rd, Wuhan 430079, Hubei, Peoples R China.;Cent China Normal Univ, Natl Language Resources Monitoring & Res Ctr Netwo, 152 Luoyu Rd, Wuhan 430079, Hubei, Peoples R China.
摘要:
Recently, weakly supervised temporal sentence grounding in videos (TSGV) has attracted extensive attention because it does not require precise start-end time annotations during training, and it can quickly retrieve interesting segments according to user needs. In weakly supervised TSGV, query reconstruction (QR)-based methods are the current mainstream, and the quality of proposals determines their performance. QR-based methods have two problems in proposal quality. First, a multi-modal global token is usually mapped to proposals with limited duration diversity, making it difficult to capture relevant segments at varying durations in real scenarios. Additionally, Gaussian functions are typically used to generate relatively fixed weights for frames within proposals, which weigh the original video features to generate proposal-specific features. This results in query-irrelevant frames affecting the discrimination of the proposal features. In this study, we propose a query-aware multi-scale proposal network (QMN). Initially, pre-trained encoders are used to extract video and query features. Subsequently, a multi-scale proposal generation module is designed to refine video features guided by queries and diversify the duration of the proposal. This module performs multi- modal interaction and multi-scale modeling to obtain proposals of different durations. Furthermore, to extract discriminative proposal features and enhance the modeling of proposal frame correlation, a query-aware weight generator is constructed to learn frame weights to suppress query-irrelevant frame representations through contrastive learning. Finally, the masked query is reconstructed using the proposal features to select the best proposal. The effectiveness of the proposed QMN is verified through experiments on the Charades-STA and ActivityNet-Captions datasets.
作者机构:
[Hua, Shiqi; Sun, Hao; Jin, Lianghao; Xie, Wei] Cent China Normal Univ, Hubei Prov Key Lab Artificial Intelligence & Smart, Wuhan, Peoples R China.;[Hua, Shiqi; Jin, Lianghao; Xie, Wei; Sun, Hao] Cent China Normal Univ, Sch Comp Sci, Wuhan, Peoples R China.;[Hua, Shiqi; Jin, Lianghao; Xie, Wei; Sun, Hao] Cent China Normal Univ, Natl Language Resources Monitoring & Res Ctr Netwo, Wuhan, Peoples R China.;[Sun, B; Sun, Bo] Dalian Med Univ, Affiliated Hosp 1, Dalian, Peoples R China.;[Tu, Zhigang] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & Re, Wuhan, Peoples R China.
通讯机构:
[Sun, B ] D;[Sun, H ] C;Cent China Normal Univ, Hubei Prov Key Lab Artificial Intelligence & Smart, Wuhan, Peoples R China.;Dalian Med Univ, Affiliated Hosp 1, Dalian, Peoples R China.
关键词:
image segmentation;medical image processing
摘要:
<jats:title>Abstract</jats:title><jats:p>Subarachnoid haemorrhage (SAH), mostly caused by the rupture of intracranial aneurysm, is a common disease with a high fatality rate. SAH lesions are generally diffusely distributed, showing a variety of scales with irregular edges. The complex characteristics of lesions make SAH segmentation a challenging task. To cope with these difficulties, a u‐shaped deformable transformer (UDT) is proposed for SAH segmentation. Specifically, first, a multi‐scale deformable attention (MSDA) module is exploited to model the diffuseness and scale‐variant characteristics of SAH lesions, where the MSDA module can fuse features in different scales and adjust the attention field of each element dynamically to generate discriminative multi‐scale features. Second, the cross deformable attention‐based skip connection (CDASC) module is designed to model the irregular edge characteristic of SAH lesions, where the CDASC module can utilise the spatial details from encoder features to refine the spatial information of decoder features. Third, the MSDA and CDASC modules are embedded into the backbone Res‐UNet to construct the proposed UDT. Extensive experiments are conducted on the self‐built SAH‐CT dataset and two public medical datasets (GlaS and MoNuSeg). Experimental results show that the presented UDT achieves the state‐of‐the‐art performance.</jats:p>
期刊:
IEEE Signal Processing Letters,2024年31:2230-2234 ISSN:1070-9908
通讯作者:
Sun, Hao;Xie, W
作者机构:
[Xie, W; Xie, Wei; Wang, Chengji; Sun, Hao; Sun, H; You, Kaiyang] Cent China Normal Univ, Sch Comp Sci, Hubei Prov Key Lab Artificial Intelligence & Smart, Wuhan 430079, Peoples R China.;[Xie, W; Xie, Wei; Wang, Chengji; Sun, Hao; Sun, H; You, Kaiyang] Cent China Normal Univ, Natl Language Resources Monitoring & Res Ctr Netwo, Wuhan 430079, Peoples R China.;[Chen, Wenjing] Hubei Univ Technol, Sch Comp Sci, Wuhan 430068, Peoples R China.
通讯机构:
[Xie, W ; Sun, H] C;Cent China Normal Univ, Sch Comp Sci, Hubei Prov Key Lab Artificial Intelligence & Smart, Wuhan 430079, Peoples R China.;Cent China Normal Univ, Natl Language Resources Monitoring & Res Ctr Netwo, Wuhan 430079, Peoples R China.
摘要:
Text-based person search aims to retrieve corresponding images of person from a large gallery based on text descriptions. Existing methods strive to bridge the modality gap between images and texts and have made promising progress. However, these approaches disregard the knowledge imbalance between images and texts caused by the reporting bias. To resolve this issue, we present a cross-modal feature fusion-based knowledge transfer network to balance identity information between images and texts. First, we design an identity information emphasis module to enhance person-relevant information and suppress person-irrelevant information. Second, we design an intermediate modal-guided knowledge transfer module to balance the knowledge between images and texts. Experimental results on CUHK-PEDES, ICFG-PEDE, and RSTPReid datasets demonstrate that our method achieves state-of-the-art performance.
摘要:
Brain storm optimization (BSO) is a population-based intelligence algorithm for optimization problems, which has attracted researchers' growing attention due to its simplicity and efficiency. An improved BSO, called CIBSO, is presented in this article. First of all, a new grouping method, in which the population is partitioned into chunks according to the fitness and recombined to groups, is developed to balance each group with same quality-level. Afterwards, a new mutation strategy is designed in CIBSO and a learning mechanism is used to adaptively select appropriate strategy. Experiments on the CEC2014 test suite indicate that CIBSO is better or at least competitive performance against the compared BSO variants.
作者机构:
[Wang, Chao; Zhang, Jiaxu; Tu, Zhigang] Wuhan Univ, State Key Lab Informat Engn Surveying, Wuhan 430072, Hubei, Peoples R China.;[Xie, Wei] Cent China Normal Univ, Sch Comp, Wuhan 430079, Hubei, Peoples R China.;[Tu, Ruide] Cent China Normal Univ, Sch Informat Management, Wuhan 430079, Hubei, Peoples R China.
通讯机构:
[Chao Wang; Ruide Tu] S;State Key Laboratory of Information Engineering in Surveying, Wuhan University, Wuhan, China<&wdkj&>School Of Information Management, Central China Normal University, Wuhan, China
关键词:
Skeleton action recognition;Visual transformer;Graph-aware transformer;Velocity information of human body joints;Graph neural network
摘要:
Recently, graph convolutional networks (GCNs) play a critical role in skeleton-based human action recognition. However, most GCN-based methods still have two main limitations: (1) The semantic-level adjacency matrix of the skeleton graph is difficult to be manually defined, which restricts the perception field of GCN and limits its ability to extract the spatial–temporal features. (2) The velocity information of human body joints cannot be efficiently used and fully exploited by GCN, because GCN does not represent the correlation between the velocity vectors explicitly. To address these issues, we propose a graph-aware transformer (GAT), which can make full use of the velocity information and learn discriminative spatial–temporal motion features from the sequence of the skeleton graphs in a data-driven way. Besides, similar to the GCN-based model, our GAT also considers the prior structures of the human body including the link-aware structure and the part-aware structure. Extensive experiments on three large-scale datasets, i.e., NTU-RGB+D 60, NTU-RGB+D 120, and Kinetics-Skeleton, demonstrated that the proposed GAT obtains significant improvement compared to the GCN-based baseline for skeleton action recognition.
作者机构:
[Pi, Chenchen; Xie, W; Xie, Wei; Sun, Hao] Cent China Normal Univ, Hubei Prov Key Lab Artificial Intelligence & Smar, Wuhan, Peoples R China.;[Pi, Chenchen; Xie, W; Xie, Wei; Sun, Hao] Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.;[Pi, Chenchen; Xie, W; Xie, Wei; Sun, Hao] Cent China Normal Univ, Natl Language Resources Monitoring & Res Ctr Netw, Wuhan, Peoples R China.
会议名称:
IEEE International Conference on Multimedia and Expo (ICME)
会议时间:
JUL 10-14, 2023
会议地点:
Brisbane, AUSTRALIA
会议主办单位:
[Sun, Hao;Pi, Chenchen;Xie, Wei] Cent China Normal Univ, Hubei Prov Key Lab Artificial Intelligence & Smar, Wuhan, Peoples R China.^[Sun, Hao;Pi, Chenchen;Xie, Wei] Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.^[Sun, Hao;Pi, Chenchen;Xie, Wei] Cent China Normal Univ, Natl Language Resources Monitoring & Res Ctr Netw, Wuhan, Peoples R China.
会议论文集名称:
IEEE International Conference on Multimedia and Expo
摘要:
Pseudo-labels are popular in semi-supervised facial expression recognition. Recent methods usually exploit the confidence as the criterion for pseudo-label generation, and utilize the high-confidence pseudo-labels as the ground-truth for training. However, high confidence cannot guarantee the correctness of pseudo-labels. False pseudo-labels can weaken the feature discrimination and degrade recognition performance. In this paper, we propose a Critical Feature Refinement Network (CFRN) to alleviate the interference of false pseudo-labels on the model performance. Specially, a feature dropout module and a feature emphasis module are proposed to improve the feature discrimination of CFRN. Then, a mean-absolute error loss is further exploited to improve the robustness against false pseudo-labels. Experimental results on three challenging datasets RAF-DB, SFEW and Affectnet demonstrate that the proposed CFRN outperforms the state-of-the-art methods.
作者机构:
[Chang, Yunpeng; Luo, Bin; Tu, Zhigang; Sui, Haigang] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & R, Wuhan 430079, Hubei, Peoples R China.;[Xie, Wei] Cent China Normal Univ, Sch Comp, LuoyuRd 152, Wuhan, Hubei, Peoples R China.;[Zhang, Shifu] Shenzhen Infinova Co Ltd, Shenzhen 518100, Guangdong, Peoples R China.;[Yuan, Junsong] SUNY Buffalo, Comp Sci & Engn Dept, Buffalo, NY 14260 USA.
通讯机构:
[Tu, Zhigang] W;Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & R, Wuhan 430079, Hubei, Peoples R China.
关键词:
Deep K-means cluster;Simulate motion of optical flow;Spatio-temporal dissociation;Video anomaly detection
摘要:
Anomaly detection in videos remains a challenging task due to the ambiguous definition of anomaly and the complexity of visual scenes from real video data. Different from the previous work which utilizes reconstruction or prediction as an auxiliary task to learn the temporal regularity, in this work, we explore a novel convolution autoencoder architecture that can dissociate the spatio-temporal representation to separately capture the spatial and the temporal information, since abnormal events are usually different from the normality in appearance and/or motion behavior. Specifically, the spatial autoencoder models the normality on the appearance feature space by learning to reconstruct the input of the first individual frame (FIF), while the temporal part takes the first four consecutive frames as the input and the RGB difference as the output to simulate the motion of optical flow in an efficient way. The abnormal events, which are irregular in appearance or in motion behavior, lead to a large reconstruction error. To improve detection performance on fast moving outliers, we exploit a variance-based attention module and insert it into the motion autoencoder to highlight large movement areas. In addition, we propose a deep K means cluster strategy to force the spatial and the motion encoder to extract a compact representation. Extensive experiments on some publicly available datasets have demonstrated the effectiveness of our method which achieves the state-of-the-art performance. The code is publicly released at the link1. (C) 2021 Elsevier Ltd. All rights reserved.
作者机构:
[Li, Wanxin; Wang, Wei; Jin, Lianghao; Xie, Wei] Cent China Normal Univ, Hubei Prov Key Lab Artificial Intelligence & Smar, Wuhan, Peoples R China.;[Li, Wanxin; Wang, Wei; Jin, Lianghao; Xie, Wei] Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.;[Tu, Zhigang] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & R, Wuhan, Peoples R China.
会议名称:
IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) / IEEE World Congress on Computational Intelligence (IEEE WCCI) / International Joint Conference on Neural Networks (IJCNN) / IEEE Congress on Evolutionary Computation (IEEE CEC)
会议时间:
JUL 18-23, 2022
会议地点:
Padua, ITALY
会议主办单位:
[Li, Wanxin;Xie, Wei;Wang, Wei;Jin, Lianghao] Cent China Normal Univ, Hubei Prov Key Lab Artificial Intelligence & Smar, Wuhan, Peoples R China.^[Li, Wanxin;Xie, Wei;Wang, Wei;Jin, Lianghao] Cent China Normal Univ, Sch Comp, Wuhan, Peoples R China.^[Tu, Zhigang] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & R, Wuhan, Peoples R China.
会议论文集名称:
IEEE International Joint Conference on Neural Networks (IJCNN)
摘要:
Group activity recognition aims to identify group activities from the videos. Most of the previous methods focus on modeling between individuals (one-to- one), which ignores the fact that a single individual's behavior may be jointly determined by multiple individual behaviors (many-to-one). For this reason, we propose a Multi-Hyperedge Hypergraph (MHH) to capture high-order relationships between multiple people. Specifically, we build three different types of hyperedges on the hypergraph structure. Each hyperedge can accommodate the characteristics of multiple nodes to capture different types of high-order relationships between nodes. Then, we use the late fusion method to fuse the three features to further enhance the overall behavioral representation. Finally, we perform a series of experiments on two of the most widely used benchmarks in group activity recognition, which have proved the effectiveness of MHH. More importantly, as far as we know, this is the first case of using a hypergraph structure for group activity recognition.
摘要:
In skeleton-based action recognition task, graph convolutional network has attracted widespread attention and achieved remarkable results. However, most of the current methods are performing graph convolution on the entire skeleton graph, ignoring the fact that people are composed of different body parts. In addition, previous work ignores the temporal and spatial independence and relevance of different parts. Thus, to solve these issues, we optimize the representation of the skeleton graph, graph convolution and temporal convolution respectively. In this work, we propose multi-part adaptive graph convolution (MPA-GC) to adaptively learn the topology of each part of the body and dynamically aggregate the relevance between them. Meanwhile, we add a multi-scale temporal convolution module to better obtain temporal dimension features. Ultimately, we develop a powerful graph convolutional network named MPA-GCN, and extensive experiments on two public large-scale datasets NTU-RGB+D and NTU-RGB+D120 demonstrate the effectiveness of our module, which outperforms state-of-the-art methods.