Compared with surveillance video, user-created videos contain more frequent shot changes, which lead to diversified backgrounds and a wide variety of content. The high redundancy among keyframes is a critical issue for the existing summarising methods in dealing with user-created videos. To address the critical issue, we designed a salient- area-size-based spatial attention model (SAM) on the observation that humans tend to focus on sizable and moving objects in videos. Moreover, the SAM is taken as guidance to refine frame-wise soft selected probability for the bi-directional long short-term ...