Image compression approach for improving deep learning applications

ABSTRACT


INTRODUCTION
The number of cell phones in use in the world is estimated at 4.7 billion phones. All users of these phones take photos and videos which usually accumulate up to 80% of their storage. These images and videos should be compressed. Image compression means reducing the size of the graphics file while keeping the image quality within an acceptable level, which allows more images to be stored and reduces the time required to send and download an image over the internet [1]. The image compression is performed in five steps: color space conversion, down-sampling, discrete cosine transform (DCT), quantization, and entropy encoding [2]. The digital image is a two-dimensional array of pixels. Image manipulation used mathematical functions and transformations, including smoothing, sharpening and segmentation of the image [3]. Additionally, computer vision by using images can solve more complex problems such as facial recognition (used, for example, by Snapchat to apply filters). Unsolvable digital image processing problems are now resolved by deep learning (DL) methods such as a convolutional neural network (CNN), a prime example of this is image classification [4].
Deep learning is a subfield of machine learning concerned with algorithms that are inspired by the structure and function of the brain called artificial neural networks. Deep learning technologies have evolved from Google Brain, founded by Andrew Ng, chief scientist at Baidu Research, and via various Google services [5]. Also, the following examples show where images are used in computer vision and deep learning: behavioral tracking (customers and how they behave), inventory management (generating a very accurate ISSN: 2088-8708  Image compression approach for improving deep learning applications (Raed Altabeiri) 5609 sections and optimizing these sections individually. They implemented their proposed strategy using the popular limited memory-Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization algorithm. The most prominent factor that impairs the performance of the L-BFGS algorithm is the total number of optimization parameters, which is very high. The researchers suggest a simple but very effective strategy, called the partial limited memory BFGS strategy (pL-BFGS), to implement the L-BFGS algorithm. Their classification experiments showed that the proposed method greatly enhances the training process of a DNN classifier and improvements the classification accuracy. The researchers did not care to solve the problem of training time.
Gueguen et al. [13] proposed training the CNN directly with discrete cosine transform (DCT) coefficients computed in the middle of the JPEG codec. The researchers modified libjpeg to directly produce DCT parameters, modified ResNet-50 to accommodate inputs of different sizes and steps and evaluated performance on ImageNet. The study results were compelling: in a performance like ResNet-50 baseline, the researchers found an acceleration of 1.77x, and in performance significantly better than baseline, they obtained an acceleration of 1.3x. The results of this study may be effective for speeding up processing in a wide range of speed-sensitive applications, from processing large datasets in data centers to processing JPEG images locally on mobile devices.
Optimizing CNNs through parameter fine-tuning (learning transfer) was done by [14]. This method can improve CNN performance and can help overcome the lack of training data. The researchers investigated three main areas. The first is the optimal learning rate for transferred layers. The second is to analyze how different source datasets affect the outcome of several target datasets. Lastly, they compared an ensemble of fine-tuned networks to an ensemble of randomly initialized networks. The results of the study indicated that setting parameter finetuning always improved the accuracy of image classification, but the only possible downside is the increased training time required for pre-training.
Cheng et al. [15] start from that current DNN models are computationally expensive and consume intensive memory, which hinders their deployment in devices with low memory resources or in applications with stringent latency requirements. Therefore, the normal procedure is to compress the model and accelerate it in deep networks without significantly reducing model performance. The researchers reviewed the recent techniques for compacting and accelerating DNN models, such as parameter pruning and quantization, lowrank factorization, transferred/compact convolutional filters, and knowledge distillation. also, they reviewed some very recent successful methods, such as dynamic capacity networks and stochastic depths networks. A good compression method should achieve almost the same performance as the original model with much smaller parameters and less computing time.
The idea of integrating source coding and DL to obtain better performance in classifying images. The author proposed a new CNN topology done by [16], which absorbs the original input along with its various compressed versions. This architecture facilitates compressed information via compression inputs from low quality to high quality and allows the device to learn from all potential compressed information by itself. The researcher was able to increase the accuracy of the modern CNN image classification: 0.374% increase in Top-1 accuracy, 0.346% increase in Top-5 accuracy in terms of the inception V3 model, and 0.39% increase in accuracy Top-1, and 0.228% increase in Top-5 accuracy of the ResNet-50 V2 model. But like other technologies, there were some limitations to the new CNN topology: the 11 branches of this structure are needed to insert compressed versions of the originals. The total number of parameters in the new architecture is 11 times the original. This can easily deplete the graphics processing units (GPU) random-access memory (RAM) resource when training and evaluation of the entire architecture are desired. Therefore, the block-byblock training method is applied incrementally to avoid overusing computational resources at the start.
Jia et al. [17] wanted to improve system scalability and balance the communication and computation ratio for CNN. Researchers have built a highly scalable DL training system for GPU-dense sets with three main contributions: i) they proposed a mixed-precision training method, ii) they proposed an optimization approach for an extremely large minibatch size (up to 64K), and iii) they proposed highly optimized all-reduce algorithms. As a result, when training the ImageNet dataset, the researchers achieved 58.7% of the accuracy of the Top-1 test with AlexNet (95 epochs) in just 4 minutes using 1024 Tesla P40 GPUs, and they achieved 75.8% of the accuracy of the Top-1 test with ResNet-50 (90 epochs) in 6.6 minutes using 2048 Tesla P40 GPUs, which outperform current systems. In summary, the related works showed that there was needed to develop an image compression approach to reduce the size of the dataset and improve the classification accuracy, which speeds up the machine learning process while maintaining image quality.

METHOD
This section introduces the dataset compression method used to improve deep learning applications. This method is presented in three main steps. First: creating a database of compressed images by applying the standard JPEG algorithm. Second: create a CNN model for image classification. Third: examine the effect of image compression on the model training time, classification accuracy, and the size of the image data set on 5610 the storage device. The primary purpose of this method is to study the effect of the compression of the dataset on the performance and accuracy of the deep learning model. This study will help determine whether it is worth compressing the images for the training purpose in the machine learning model.

The phases of the implemented compression method
In this paper, the approach was developed based on six consecutive phases as detailed in Figure 1, and the following paragraphs. These six phases were implemented on a personal laptop computer with the following characteristics: Processor: Intel® Core ™ i7-7500U CPU @2.7 GHz 2.90 GHz. Installed memory (RAM): 8.00 GB. System type: 64-bit Operating System, x64-based processor. Windows edition: Windows 10 Pro © 2019 Microsoft Corporation. In phase 1, the Cats vs. Dogs images dataset (25,000 images) is getting and copy into the work folder, which was divided into two sub-folders: Cats and Dogs, then the original images were copied to the subfolders according to the label (name of the image file). Images were classified according to their content by their labels, using the word "dog" or "cat" followed by a number indicating the sequence of the image within the images dataset. In the meantime, the compressed image folders were prepared. Each folder was divided into two subfolders: Cats and Dogs. Then the original images dataset is read and compressed to the desired folder according to the given quality (10,20,40,80,100). Custom code was written to load images into the memory image-byimage from the dataset folder according to the desired color pattern (color or gray), and resize them, then save them ready for modeling as a single NumPy array. For each array of images and labels, the NumPy array will look like this when printed: In phase 2, the Python, TensorFlow, and Keras libraries are imported into the Jupyter framework. Python constitutes the best programming language in the artificial intelligence (AI) and machine learning (ML) fields [18]. It has become the best for developers, it is a great option for ML, data science, and the internet of things (IoT) that have been gaining momentum recently [19]. Keras is a powerful tool for developing DL models. It wraps the powerful numerical computation libraries Theano and TensorFlow [20]. Keras surpassed all libraries used in building and training DL models because of its features that made it at the forefront [21]. TensorFlow is a comprehensive system of tools and libraries that greatly ease and accelerate the application of neural network models [22]. Jupyter framework was used in DL projects because it is a great environment in which to develop code and communicate results. The primary programming languages that Jupyter supports are Julia (Ju), Python (Py), and R [23]. Jupyter Notebook has many features that have made it a popular application in the field of DL, which supports over 40 programming languages, notebooks can be shared with others, and there is an interactive output feature: hypertext markup language (HTML), images, videos, LaTeX, and custom multipurpose internet mail extensions (MIMEs) [24]. The main training parameters are chosen for covering different cases. The first parameter is epochs, which takes the following values (10, 20, 40). One Epoch is an entire dataset, that is passed forward and backward through the neural network. Epoch is divided into several smaller batches. The second parameter is the batch size, which takes the values (64, 128, 256). These values match training accuracy and test accuracy, and also reduce training time. Batch size (Bs) is the number of training images present in one iteration. When (N) represents the total number of training images, and (In) represents the number of iterations needed to complete one epoch, then the number of iterations is calculated as shown in (1).

=
(1) The third parameter is the quality list. The quality list parameter takes the following values (10, 20, 40, 80, and 100). These values are taken to form a doubling ratio for image quality from 10 to 100. The quality value is the complement of the compression ratio.
In phase 3, the CNN model was created, and the inputs were adjusted (image size and color mode). The color mode takes the values 1 for gray and 3 for color, based on the number of arrays representing each mode. Then the model was compiled and become ready to do the training process. Neural network is a machine learning (ML) technique that is inspired by the human nervous system and the structure of the brain. It was structured from artificial neurons, called nodes. These nodes are stacked in three layers: the input layer, the hidden layer(s), and the output layer. Inputs are provided to each node. The node multiplies the inputs with random weights, calculates them, and adds a bias. Finally, activation functions (nonlinear functions) are applied to determine which neuron to fire to get the output, through the (2) [25].
Bias value allows shifting of the activation function left or right, to be more flexible about the features that will be learned. Each layer needs a single bias node. A bias node can add at the first few layers and not at the last ones [26]. Bias affects output values only; so, the role of bias is to help ensure that the output fits the incoming input better. Biases are typically configured to be zero, as the fraction of inconsistency is provided by small random numbers in the weights [27]. The previous process is shown in Figure 2.
A CNN model was developed by using the Keras library. The model involves stacking convolutional layers with small 3×3 filters followed by a max-pooling layer. Together, these layers form a block, and these blocks can be repeated where the number of filters in each block is increased with the depth of the network such as 32, 64, and 128 for the three blocks of the model. The rectified linear unit (ReLU) activation function was used in each layer. As shown in (3)  The most commonly used activation functions are: linear activation function, sigmoid activation function, tanh function, ReLU function, SoftMax activation function [28]. The researchers suggest that the CNN model can benefit from regularization techniques, such as Dropout [17]. This technique drops some weights to reduce the time required to train the model. The classification task here is a binary classification task, requiring the prediction of one value of either 0 "dog" or 1 "cat". An output layer with 1 node and a sigmoid activation function has been used. In compile process, the CNN model has been optimized using the Adam optimizer and the binary cross-entropy loss function.
In phase 4, the images dataset was divided into the training dataset which contains 23,000 images for dogs and cats, and the testing dataset which contains 2,000 images for dogs and cats. Then the training/testing dataset is divided into images and labels. Then the part of the image for each dataset is converted to a Python array. Now, the images' Python array must reshape for each dataset to make it ready to use in the training/testing process. Also, the part of the label for each training/testing dataset must be transferred into the Python array but reshaping is not made. The model was trained on the training dataset according to (batch size, and number of epochs), then the training time was obtained. In phase 5, the images dataset of testing and its set of labels were passed to the model during the evaluation process to obtain the classification accuracy.
In phase 6, the results were printed on the screen and saved to a file. Printed and saved results included scenario name, file name, file creation time, image size, epochs, batch size, image quality, model training time, and classification accuracy. Phases 4-6 were repeated; so that the training, evaluating, printing, and saving processes were repeated for all compressed image datasets according to different compression ratios (10, 20, 40, 80, and 100). The main programming of the developed approach according to the previous six phases was explained in the pseudocode in Figure 3.

The executed scenarios
After preparing the images dataset within the folders, the programming code that is necessary to perform all scenarios on this data was created. The scenarios were divided to include all the possibilities of the variables used in this study. The scenarios were divided into several groups. Dataset within these groups is processed according to multiple variables, such as the image size (80×80, 120×120), the batch size (64, 128, 256), and the image quality if it is original or gray, whether it is color compressed (10, 20, 40, 60, 100), or gray compressed (10,20,40,60,100), and according to the number of epochs (10,20,40). Thus, the number of groups used in performing the developed approach on the image's dataset was 12, and the number of scenarios that have been implemented was 112. The training period for the DL model on these scenarios took 289 training hours. Summary of groups and scenarios are shown in the following sections.

Colored image with original quality
9 scenarios are implemented for a size 80×80, where epochs take the values (10,20,40), then each one of these values takes three values of the batch size (64, 128, 256). The training time is nearly doubled due to the increase in the number of epochs. Note that the size of the original dataset on the hard disk was remained 546 MB in each scenario. In addition, for size 120×120, this includes 9 scenarios, the training time increases dramatically. Note that the size of the original dataset on the hard disk was remained 546 MB in each scenario.  (64, 128, 256). The training time is nearly doubled due to the increase in the number of epochs. Note that the size of the original gray pattern dataset on the hard disk is 941 MB. This is illogical since the size of gray images should be less than the size of the color images. In addition, for size 120×120, this includes 9 scenarios, the training time increases dramatically. Note that the size of the original gray pattern dataset on the hard disk is 941 MB. This is illogical since the size of gray images should be less than the size of the color images.

Compressed gray, size 80x80, different qualities
This includes 15 scenarios for epochs is 10 and the image quality (which is a complement to the compression ratio) is doubled (10,20,40,80,100), which means that it is compressed in the following ratios (90%, 80%, 60%, 20%, 0%) respectively. Then each one of quality values takes three values of the batch size (64, 128, 256), bringing the total to 15 scenarios. Note that the size of compressed data of the gray pattern on the hard disk varies according to the image quality (142, 211, 317, 594 and 1,180 MB) respectively. When the epochs are 20, this includes 15 scenarios, the training time doubles due to the increase in the number of epochs. A case of epochs is 40 and the batch size is 64, this includes 5 scenarios (each quality takes one scenario), still the training time doubles due to the increase in the number of epochs.

Compressed gray, size 120x120, different qualities
This includes 15 scenarios for epochs is 10 and the image quality is doubled (10, 20, 40, 80, 100). Then each one of quality values takes three values of the batch size (64, 128, 256), bringing the total to 15 scenarios. When increasing the image size from 80×80 to 120×120, the training time increases dramatically. Note that the size of compressed data of the gray pattern on the hard disk varies according to the image quality (142, 211, 317, 594 and 1,180 MB) respectively. In case epochs is 20, bringing the total to 15 scenarios, the training time doubles due to the increase in the number of epochs. A case of epochs is 40 and the batch size is 64, this includes 5 scenarios (each quality takes one scenario), the training time doubles due to the increase in the number of epochs. START 1.
Preparing a folder to contain the original images dataset 2.
Divide the folder into two sub-folders: Cats, Dogs 3.
Copy the original images to the sub folders according to the label 4.
Preparing the folders to contain the compressed images 5.
Divide each folder into two sub-folders: Cats, Dogs 6.
Compress images according to the required ratio and copy them to the desired folder 7.
Prepare the training/testing dataset and divide it into images and labels.

2.
CONVERT the images for each dataset to Python array.
CONVERT labels for the training/testing dataset to Python array 5.
CREATE the CNN model and adjust inputs according to the given image size 6.
COMPILE the model 7.
TRAIN the model on the training dataset according to batch size, epochs 8. GET

Compressed color, size 80x80, quality 40
This includes 3 scenarios for epochs is 10 and the batch size was changed with (64, 128, 256) (each batch size takes one scenario). Note that the size of the compressed gray on the hard disk according to the image quality 40 is 317 MB. For size 120×120, increasing the image size from 80×80 to 120×120, the training time increases dramatically.

RESULTS AND DISCUSSION
In principle, the number of epochs is affecting the training time, while the image compression ratio is affecting the classification accuracy. Therefore, scenarios that give too long training time and scenarios that give low classification accuracy will be excluded. The scenarios that fulfill the three study objectives will be weighed together to obtain the best possible scenario. In this study, 112 scenarios were applied to the image's dataset, which took approximately 289 hours of training CNN models. The following variables and its values are used for each scenario: image size (80×80 and 120×120), number of epochs (10,20,40), batch size (64, 128, 256), image quality (original color, original gray, compressed color images quality (10,20,40,80,100), compressed gray images quality (10,20,40,80,100)). Therefore, the scenario is expressed in the following way: scenario (number) (image size, number of epochs, batch size, image quality). The following abbreviations are used: cc to represent the compressed color image quality and cg to represent the compressed grayscale image quality. The following scenario: Scenario (37) (80, 10, 64, 10 cg); means the scenario number is 37, the image size is 80×80, the number of epochs is 10, the batch size is 64, and the quality value is 10 (90% compression ratio) with compressed grayscale image.

The tested scenarios
When scenarios were analyzed, it is noticed that the training time is divided into four categories, which are (50 minutes, one hour and a half, three hours and a half, seven hours and a half). The highest classification accuracy in each category is as follows: 86.4, 88.2, 89.4, and 89.5, respectively. The difference in the previous four classification accuracy values is not significant value, because obtain the highest accuracy is not from the aims of the study. Therefore, the first training time category was chosen because it achieves one of the study objectives, which is to reduce the training time (not to exceed 50 minutes).
Within this category, it is noted that the scenario that achieves the highest possible accuracy is the following scenario: Scenario (39) (80, 10, 64, 40 cg); images dataset quality is 40, compressed gray (cg). This gave 48 minutes of training time, a classification accuracy of 86%, and image dataset size of 317 MB on the storage device. This size makes up 58% of the original images dataset size and saves 42% of storage space. Note that when image resizing is made for the gray compressed images dataset on the storage device to 80×80, the size of the dataset becomes 123 MB, which is approximately 23% of the original dataset size and saves 77% of the storage space.

Results summary
In the research papers reviewed in this paper, which on developing deep learning, although they have made great progress in improving deep learning algorithms and developing high-performance training systems, these have not made big attention in images dataset size-which are used in the training process-on the storage device, nor at the training time. The proposed approach in this paper reduces the size of the dataset by compressing it while preserving the image quality, thus reduces the required training time and improves the classification accuracy. Table 1 is showing a comparison between seven scenarios; the best five scenarios that achieve the study objectives, and the best-and worst-case scenarios at all, sorted in ascending order according to the classification accuracy. Figure 4 is showing the best five scenarios that achieved the study objectives, which the training time does not exceed the 50-minutes, sorted in ascending order according to the classification accuracy.  Figure 4. The best five scenarios that achieved the study objectives, which the training time does not exceed the 50-minutes, sorted in ascending order according to the classification accuracy

CONCLUSION
In this study, the Dogs vs. Cats image dataset from Kaggle Corporation was used, which contains 25,000 color images. 112 scenarios were applied in both the color and gray patterns. These scenarios took 289 training hours of CNN models. From this study, it was found that the best scenario that gives very good and acceptable classification accuracy is Scenario (39) (80, 10, 64, 40 cg). The training time of this scenario is 48 minutes, and the classification accuracy is 86%, and the image dataset size 317 MB. This size makes up 58% of the original images dataset size and saves 42% of storage space. Note that when image resizing is made for the gray compressed images dataset to 80×80, the size of the dataset becomes 123 MB, which is approximately 23% of the original dataset size and saves 77% of the storage space.