Similarity-Preserving Hash for Content-Based Audio Retrieval Using Unsupervised Deep Neural Network

Petcharat Panyapanuwat, Suwatchai Kamonsantiroj, Luepol Pipanmaekaporn

Abstract


Binary hashing has become an attractive approach for large-scale audio collection search due to its storage and search efficiency. However, most existing hashing methods focus on data-independent scheme where modular arithmetic, dedicated hash functions or random linear projections are used for constructing hash functions. In this way, the hash codes do not necessarily preserve the semantic similarities and may degrade the performance of audio retrieval. In this paper, an unsupervised deep learning of similarity-preserving hash for content-based audio retrieval is proposed. Different from data-independent hashing methods, we develop a deep neural network with multiple hierarchical layers of nonlinear and linear transformations to learn the binary hash codes where the semantic similarity between samples is preserved. The independence and the balance properties are also included and optimized in the objective function to improve the codes. Experimental results on the Extended Ballroom dataset show that our proposed method outperforms state-of-the-art data-independent hashing method in both effectiveness and efficiency.

Keywords


Similarity-preserving hash; Content-based audio retrieval; Unsupervised learning; Deep neural network

References


P. Grosche, M. Müller, and J. Serrà, “Audio Content-Based Music Retrieval,” Multimodal Music Processing. Dagstuhl Follow-Ups, vol. 3, pp. 157–174, 2012.

J. Haitsma, and T. Kalker, “A highly robust audio fingerprinting system with an efficient search strategy,” Journal of New Music Research, vol. 32, no. 2, pp. 211-222, 2003.

A. L. Wang, “An Industrial-Strength Audio Search Algorithm,” 4th International Conference on Music Information Retrieval (ISMIR 2003), pp. 7-13, 2003.

P. Panyapanuwat, S. Kamonsantiroj, and L. Pipanmaekaporn, “Time-Frequency Ratio Hashing for Content-Based Audio Retrieval,” 2017 9th International Conference on Knowledge and Smart Technology (KST), pp. 205-210, 2017.

A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High Dimensions via Hashing,” 25th International Conference on Very Large Data Bases (VLDB’99), pp. 518–529, 1999.

B. Kulis, P. Jain, and K. Grauman, “Fast Similarity Search for Learned Metrics,” In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 12, pp. 2143-2157, 2009.

M. Raginsky, and S. Lazebnik, “Locality-Sensitive Binary Codes from Shift-Invariant Kernels,” 23rd Annual Conference on Neural Information Processing Systems (NIPS’09), pp. 1509-1517, 2009.

Y. Weiss, A. Torralba, and R. Fergus, “Spectral Hashing,” 21st International Conference on Neural Information Processing Systems (NIPS’08), pp. 1753-1760, 2008.

B. Kulis, and K. Grauman, “Kernelized Locality-Sensitive Hashing for Scalable Image Search,” 12th International Conference on Computer Vision (ICCV), pp. 2130-2137, 2009.

Y. Gong, and S. Lazebnik, “Iterative Quantization: A Procrustean Approach to Learning Binary Codes,” IEEE Conference on Computer Vision and Pattern recognition (CVPR 2011), pp. 817-824, 2011.

B. Kulis, and T. Darrell, “Learning to Hash with Binary Reconstructive Embeddings,” 22nd International Conference on Neural Information Processing Systems (NIPS’09), pp. 1042-1050, 2009.

W. Liu, J. Wang, R. Ji, Y. G. Jiang, and S. F. Chang, “Supervised Hashing with Kernels,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012), pp. 2074-2081, 2012.

M. Norouzi, D. J. Fleet, and R. Salakhutdinov, “Hamming Distance Metric Learning,” 25th International Conference on Neural Information Processing Systems (NIPS’12), pp. 1061-1069, 2012.

J. Wang, S. Kumar, and S. F. Chang, “Semi-Supervised Hashing for large-Scale Search,” In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 12, pp. 2393-2406, 2012.

F. Shen, C. Shen, W. Liu W, and H. T. Shen, “Supervised Discrete Hashing,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 1839-1851, 2015.

X. Bai, H. Yang, J. Zhou, P. Ren, and J. Cheng, “Data-dependent Hashing Based on p-Stable Distribution,” IEEE Transactions on Image Processing, vol. 23, no. 12, pp. 5033-5046, 2014.

T. T. Do, A. D. Doan, and N. M. Cheung, “ Learning to Hash with Binary Deep Neural Network,” 14th European Conference on Computer Vision (ECCV 2016), pp. 219-234, 2016.

J. Wang, T. Zhang, J. Song, N. Sebe, and H. T. Shen, “A Survey on Learning to Hash,” In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 769-790, 2018.

J. He, S. F. Chang, R. Radhakrishnan, and C. Bauer, “Compact hashing with joint optimization of search accuracy and time,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), pp. 753–760, 2011.

P. Cano, E. Batlle, T. Kalker, and J. Haitsma, “A Review of Audio Fingerprinting,” Journal of VLSI Signal Processing, vol. 41, pp. 271-284, 2005.

D. Ellis, “Robust Landmark-Based Audio Fingerprinting,” https://labrosa.ee.columbia.edu/matlab/fingerprint/. [accessed April 24, 2015].

U. Marchand, and G. Peeters. “Scale and shift invariant time/frequency representation using auditory statistics: Application to rhythm description,” 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), September 2016.

U. Marchand, and G. Peeters, “The Extended Ballroom Dataset,” 17th International Society for Music Information Retrieval Conference (ISMIR 2016) Late-Breaking Session, New-York, USA, 2016.

P. Panyapanuwat, S. Kamonsantiroj, and L. Pipanmaekaporn, “Unsupervised Learning Hash for Content-Based Audio Retrieval Using Deep Neural Networks,” 2019 11th International Conference on Knowledge and Smart Technology (KST), pp. 99-104, 2019.

P. Panyapanuwat, and S. Kamonsantiroj, “Performance Comparison of Unsupervised Deep Hashing with Data-independent Hashing for Content-Based Audio Retrieval,” 2019 2nd International Conference on Electronics, Communications and Control Engineering, pp. 16-20, 2019.

C. Manning, P. Raghavan, and H. Schütze, “An Introduction to Information Retrieval,” Cambridge University Press, 2009.




DOI: http://doi.org/10.11591/ijece.v11i1.pp%25p
Total views : 0 times


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

ISSN 2088-8708, e-ISSN 2722-2578