Provably secure and efficient audio compression based on compressive sensing

The advancement of systems with the capacity to compress audio signals and simultaneously secure is a highly attractive research subject. This is because of the need to enhance storage usage and speed up the transmission of data, as well as securing the transmission of sensitive signals over limited and insecure communication channels. Thus, many researchers have studied and produced different systems, either to compress or encrypt audio data using different algorithms and methods, all of which suffer from certain issues including high time consumption or complex calculations. This paper proposes a compressing sensing-based system that compresses audio signals and simultaneously provides an encryption system. The audio signal is segmented into small matrices of samples and then multiplied by a non-square sensing matrix generated by a Gaussian random generator. The reconstruction process is carried out by solving a linear system using the pseudoinverse of Moore-Penrose. The statistical analysis results obtaining from implementing different types and sizes of audio signals prove that the proposed system succeeds in compressing the audio signals with a ratio reaching 28% of real size and reconstructing the signal with a correlation metric between 0.98 and 0.99. It also scores very good results in the normalized mean square error (MSE), peak signal-to-noise ratio metrics (PSNR), and the structural similarity index (SSIM), as well as giving the signal a high level of security.

ISSN: 2088-8708  Provably secure and efficient audio compression based on compressive sensing (Enas Wahab Abood) 337 of the mathematical aspects of CS and proposed a model for noise reduction based on CS for speech signals; they formulated a CS system using a random partial fourier as an optimization problem and used the gradient descend line search (GDLS) to solve it. Bala and Arif [7] proposed applying a compressed sensing technique to provide reliable reconstruction algorithms based on discrete fourier transform (DFT) and discrete cosine transform (DCT) for a speech signal using a small number of samples, they compared the performance of both algorithms and found that DCT runs relatively faster than DFT with less time. A video snapshot compressive imaging (SCI) system was proposed by building a digital micro-mirror device; developing a convolutional neural network of an end-to-end kind (E2E-CNN) with a plug-and-play (PnP) framework and adding deep denoising priors to solve the inverse problem [8]. Multiple-input multiple-output (MIMO) systems were studied [17], with an adaptive scheme based on CS, which used an efficient generalized multiple measurement vector approximate message passing (GMMV-AMP) algorithm to detect active users and estimate their channels in a regular manner, had been proposed. Moreno-Alvarado et al. [13] proposed a system to compress and encrypt audio signals based on CS, which segmented the audio signal into frames of 1024 samples and transformed it using DCT to get sparse frames, then they were multiplied by a sensing matrix. The sensing matrix was generated by the chaotic mixing system to satisfy the extended Wyner secrecy (EWS) criterion.
Chai et al. [9] used CS for security by proposing an efficient visually meaningful image compression and encryption (VMICE) scheme which consisted of CS and least significant bit (LSB) embedding. A compression and steganography system had been proposed [1]. First, the discrete wavelet transform (DWT) was applied to transform the plain image into sparse, then adding confusion operation on pixel positions based on a logistic-tent map. To get the cipher image, the sensed images were multiplied with the sensing matrix generated by a low-dimension complex tent-sine. A steganography step was added to the resulting image by applying singular decomposition for both secret and cover images, then the singular values of the secret image are embedded into the singular values of the cover image. Haneche et al. [10] proposed an approach for speech enhancement based on compressed sensing. Firstly, it removed noise, which was estimated during pauses then, the voice activity detection (VAD) was used for classifying frames as speech or silence while orthogonal matching pursuit was implemented as sparse recovery for speech enhancement.
Although the works detailed above have proven ability and trustworthiness in file compression and security, several drawbacks must be discussed and improved to build trusted compression and encryption systems based on CS. Some of these proposed systems have issues with higher computational costs, time consumption or recovery accuracy. To solve these problems, this paper proposes a system that uses basic linear operations to reduce implementation time, together with a pseudo-inverse with a Gaussian measurement matrix to improve the accuracy of reconstruction operations and a good compression ratio to save memory usage.

Compressed sensing standard formula
Compressive sensing (CS) is a technique for digital signal data acquisition and reconstruction that has several benefits for signal processing applications [18]. According to Nyquist sampling, the signal must be acquired at a rate more than twice its original frequency, which presents a lot of redundant data for the acquired signal. Traditional compression algorithms are used to eliminate any redundancy and produce a smaller number of bits for the signal representation [19]. On the other hand, the CS technique exploits the information rate within the signal, eliminating the signal redundancy in the sampling process that leads to a decreased efficient sampling rate [20]. The standard formula for compressive sensing can be represented with a linear system of equations, mathematically as (1), where the signal vector x is an n-length ∈ , which is compressively sensed to be y with m-length ∈ through × samples of the signal vector A [21]. Figure 1 presents a diagram of the CS system. Remark: this is called "compressive sensing" because m is much smaller than n (i.e., m<n) and A is the compressive sensing matrix (or measuring matrix) which can be defined as (2), = where Ψ is used to transform the original signal to a sparse basis, while Φ represents the compressed sensing measurement. Both Ψ and Φ are combined in one matrix called sensing matrix A [22]. The linear system in (1) can be written as a set of linear equations. To find the values of n variables, it is necessary to have n or more equations. Here the number of equations is much smaller than variables, so there are an infinite number of possible solutions. The true solution of vector x could be found by sensing (in a deterministic recovery way) whether A reflects some properties, then the recovery is possible [23], using non-deterministic CS. It may also be resolved by using optimization methods like the metaheuristic evolutionary method [24] or linear programming (LP) based on pseudo-inverse methods. Other solutions could use deterministic CS that demands a certain recovery process to sense the signal vector, which looks like an encoding-decoding technique [25]. The system proposed in this paper is based on CS technology to compress data and reduce signal size by multiplying it with an appropriate sensing matrix. This technology guarantees the retrieval of the signal with the least possible data loss, which is almost less than 0.01. It is also computationally inexpensive and uncomplicated, making the system more efficient.

Moore-Penrose pseudoinverse
In linear algebra and particularly in linear inverse problems in (1), the Moore-Penrose inverse A+ of a matrix A is the most well-known inverse matrix in cases of m≠n [26]. It was first proposed by Moore in 1920 [27] and subsequently by many others. The Moore-Penrose pseudoinverse has been widely used to compute the best solution (least squares) for a system of linear equations that have infinite solutions and can be used for proofing results in linear algebra. The pseudo-inverse can be defined uniquely for all matrices of real or complex numbers and can be computed with the singular value decomposition [28].
Having A∈R^(m×n), where m≠n, a pseudoinverse of matrix A is a matrix A+ which satisfies four criteria (the Moore-Penrose conditions): where + is an inverse, + and + meet the Hermitian condition. The last two conditions provide uniqueness property of the + [28], [29]. In general, each matrix has its inverse but when A has unequal dimensions there are two possible situations: a) If A has linearly independent columns, the solution would exist and be unique, A + can be calculated as: where + = . b) If A has linearly independent rows, the solution would exist and be infinite with an indeterminate system, A + can be calculated as: where + = Sparse pseudo-inverse is applicable in the underdetermined system, as well as in compressed sensing. Therefore, if the solution of an underdetermined linear inverse problem y=Ax in case of x is low dimensional, the Moore-Penrose is desirable due to its ability to reduce the complexity of calculations [30]. To solve the linear system Y=AX, A+ is calculated with (5) or (6), then X can be found with the relation (7).
It can be said that (7) is the reconstruction equation. Remark: the Moore-Penrose pseudoinverse is used to compute the solution in this paper's proposed system because of its simplicity in implementation and the need for fewer requirements. It provides the solution (the best fit) for systems that have multiple solutions [28].

PROPOSED SYSTEM
In this work, a simplified and efficient system is exploited for compressing an audio signal. The system is based on compressive sensing principle to reduce the number of samples of the audio signal. It uses Moore-Penrose pseudo-inverse for reconstruction operation. The general scheme of the system is shown in Figure 2. In Figure 2 The system comprises three steps: a) Measurement matrix generation: the choice of measurement matrix A is an essential step in an audio compression system based on CS. Its values affect the quality of recovered data, while it could act as secret key encryption [31]. To obtain more accurate signal reconstruction, the measurement matrix values would be selected as Gaussian random variables [32]. b) Audio compression: in this part, an audio signal is sensitively compressed by using a measurement matrix in Algorithm 1. c) Signal reconstruction: when the compressed data has been sent to the recipient or transferred through transmission lines, a reconstruction operation must take place to retrieve the original signal. This operation is carried out using Algorithm 2. -Calculating the inverse of the measurement matrix using the Moore-Penrose pseudoinverse.
-Since the measurement matrix is not a square matrix (i.e., k≠n), the inverse matrix is estimated by (6).
-Multiplying sensed frames with the inverse of the measurement matrix as in (7), Figure 2(b).
-Joining all resulting frames and reshaping them to get a one-dimensional vector of the original signal. -End.

EXPERIMENTS AND ANALYSIS
To prove the ability of the proposed system, it was evaluated and tested with different audio signals, such as music, songs, and speech, in different sampling frequencies ranging from 11-48 kHz. The system was simulated with MATLAB R2018a on Intel® Core i7-3520M CPU 2.90 GHz 8.00 GB RAM of memory. The compression was implemented with two rates (30% and 50%) with two measurement matrices, and then two CS systems were built: i) Y 3×4=A3×8 X8×4 for 30% compression rate; ii) Y 4×4=A4×8 X8×4 for 50% compression rate.
The result is shown in Figure 3(a) which shows an original speech signal with FS 48 kHz and length 6.8e+4 samples. Figure 3(b) sensed signal with compression rate 30% with length 1.96e+4 samples. in Figure 3(c) a sensed signal with compression rate 50% with length 2.95e+4 samples, while in Figure 3 44,100 fs. The reconstructed signals were the same as their original signals either in length or in peaks. This indicates the accuracy of retrieval due to the reliance on the Gaussian matrix as a sensing matrix, which has the characteristics of retaining the effective values of the compressed matrix, supporting the retrieved values. Furthermore, several statistical analysis tests were performed to evaluate the reconstruction quality of the system in both compression rates.

Elapsed time and compression rate
The pseudo-inverse technique used allowed the system to score excellent results in time consumption for implementation for both compression and reconstruction. Many different-sized files were compressed with the 30% and 50% compression rates. The size of the compression file had steadily shrunk in the number of samples, while the storage size in bytes had shrunk in different sizes based on the original file size, as illustrated in Tables 1 and 2. Tables 1 and 2 show that the time was relatively low and suitable for online systems and smart devices, as well the compression rate was very convenient. Remark: if the file size is large, its compression rate would also be high.

Pearson correlation analysis
This is a significant metric for evaluating the similarity between the original and the reconstructed audio signals, which is computed: where xi is the sample value of the original signal, is its mean, y is the reconstructed signal. The matric scores from 1-0 according to the similarity ratio between signals, 1 is identical and 0 different ones.

PSNR and MSE analysis
To evaluate the recovery capability of the proposed system, the normalized mean square error (MSE) and peak signal-to-noise ratio metrics (PSNR) parameters were used [33]. These parameters were computed between the original and reconstructed signals, given by: where X is the original signal and Y is the reconstructed signal.

The structural similarity index (SSIM)
The SSIM is a perceptual metric that quantifies the matrix quality distortion caused by processing operations such as data compression or encryption. It is used here to evaluate the system's ability to reconstruct the signal accurately. It is given by: where 1=(Z1 ) 2 , and 2=(Z2 ) 2 are both constant to avoid null dominator; L is the high range of the signal sample values; Z1 and Z2 have default values of 0.01 and 0.03, respectively. The identical score between the two measured signals is 1, decreasing to -1 as the signal changes [34]. See Table 3. Table 3 shows that the two compression ratios were implemented on audio files of different sizes from 31 kB to 9.9 MB (i.e., the compressed file was sized 0.3 and 0.5 of the original size). In testing the proposed system implementation, the PSNR, MSE, SSIM and correlation metrics all scored relatively good results. For the 30% compression ratio, the PSNR scored 16-20 dB, MSE reached 3.0E-4, while the SSIM was in range (0.5-0.99), demonstrating good recovery; the correlation factor was almost 0.99 compared to the scores for the 50% compression ratio, the last compression ratio (50%) scored better results overall for all metrics.
Remark: despite the slight discrepancy between the results of the two compression ratios, it is clear that 50% compression gives a better ability to retrieve with a small difference from 30% especially with bigger files. This was proved by the correlation factor (R) for both the 30% and 50% compression ratios. So, both remain acceptable compared to the system's outputs in terms of reducing the size and time consumed in implementation.

Comparison with previous systems 4.5.1. Comparative computational complexity analysis
CS aims to reduce and standardize sampling and compression operations and reduce computational complexity during encoding and decoding. CS greatly reduces coding complexity and storage requirements. The presented algorithm does not require scattering operations and pre-construction operations. The compression is carried out using the sensing matrix, which is a key in the form of an array. Encryption is done by multiplication. This algorithm provides inconsistency and provides security as well as reducing the size and thus reducing storage. The reconstruction operation requires only simple operations of multiplication and division. Consequently, our system significantly reduces the computational complexity because the complexity time of the proposed algorithm is O(n) which is better than in [35], [36] that are (10 (2 ) 2 ) and (10 ) respectively. Table 4 shows a comparison between the proposed system performance and that of the previous system in [13] for the Pearson correlation and MSE when implementing files with the same attributes, musical audio files and speech signals were recorded with the same characteristics as the files used in [13] in terms of length and size. The results indicate that the proposed system has good scores in both metrics, proving its efficiency in compression and reconstruction. The proposed system can also be used for securing files as an encryption technique. Table 4 shows that the proposed system has good results compared with the earlier proposed system. The model of this paper has better correlation coefficients and MSE values than [13], which means that this system has a better signal recovery ability. The compression ratio in the proposed system also reached 30% of the original file, which is less than the compression ratio of [36] and close to the MSE values.

CONCLUSION
This paper presents a CS-based compression system for compressing and securing audio signals. The audio signals are segmented as frames of 8×4 small matrices. The frames are then multiplied by a sensing matrix of 3×8 or 4×8, which are generated using Gaussian random numbers. The whole system is a linear system Y=AX and could be solved to reconstruct X using the Moore-Penrose pseudoinverse to calculate A -1 , which makes the system low-cost and easy to implement with less time consumption, while it provides good compression ratios with a reasonable rate of security.
The implementation results and statistical analysis metrics prove that the proposed system provides a reliable compression system and a reconstruction of the good quality signal. This is demonstrated by the correlation coefficients and SSIM, which are very close to 1, the MSE values are small, such as 5.0E-4, while PSNR is within an acceptable range. The analytical results also show that the proposed system provides results that are close to, and better than, an alternative system. Finally, it should be noted that when the file size is bigger, the system performance is better.

ACKNOWLEDGMENT
This work is supported by Natural Science Foundation of Top Talent of SZTU (grant No. 20211061010016).