Segmenting video content into events provides semantic structures for indexing, retrieval, and summarization. Predicting Visual Context for Unsupervised Event Segmentation in Continuous Photo-streams Extensive experiments on three benchmark datasets, namely, Wiki, MIRFlickr-25K, and NUS-WIDE, have verified that our proposed SCRATCH model outperforms several state-of-the-art unsupervised and supervised hashing methods for cross-modal retrieval. Its time complexity is linear to the size of the dataset, making it scalable to large-scale datasets. Moreover, the binary codes can be generated discretely, reducing the quantization error generated by the relaxation scheme. Based on the proposed loss function and the iterative optimization algorithm, it can learn the hash functions and binary codes simultaneously. In addition, it incorporates the label matrix instead of the similarity matrix into the loss function. It leverages the collective matrix factorization on the kernelized features and the semantic embedding with labels to find a latent semantic space to preserve the intra- and inter-modality similarities. To address these issues, in this paper, we present a novel supervised cross-modal hashing method-Scalable disCRete mATrix faCtorization Hashing, SCRATCH for short. In addition, most of the existing supervised hashing methods use an n x n similarity matrix during the optimization, making them unscalable. Although some discrete schemes have been proposed, most of them are time-consuming. For example, some of them relax the binary constraints to generate the hash codes, which may generate large quantization error. However, there are still some issues that need to be further explored. In recent years, many hashing methods have been proposed for the cross-modal retrieval task.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |