Medical vision: Web and mobile medical image retrieval system based on google cloud vision

The application of information technology is rapidly utilized in the medical system. There is also a massive development in the automatic method for recognizing and detecting objects in the real world. In this study, we present a system called Medical Vision which is designed for people who has no expertise in medical. Medical Vision is a web and mobile-based application to give an initial knowledge in a medical image. This system has 5 features; object detection, web detection, object labeling, safe search, and image properties. These features are run by embedding Google Vision API in the system. We evaluate this system by observing the result of some medical images which inputted into the system. The results showed that our system presents a promising performance and able to give relevant information related to the given image.


INTRODUCTION
In this globalization era, technology has been in touch in numerous living aspects. In education, massive technological development brings a new learning method which is called online learning or distance learning. Barbara et al. [1] show that students who take online learning performed better than those who take face-to-face instruction. In the medical field, moreover, intelligent technology is adapted to present and analyze the medical image to help the reader (doctor, nurse, etc.) in making the right decision. A computerized analysis system was firstly initiatedby Lusted in 1960s. He showed that an automatic system could be used to determine the abnormality in chest photofluorograms [2]. Others work later studied a computer analysis and diagnosis on bone cancer image [3]. Since then, various computer-assisted diagnosis (CAD) systems were developed in medical image.
When designing CAD, the characteristic of image has to address firstly to ensure the kind of method needed to improve the performance of the system. Medical image is taken from a high-end medical device which cannot gain by human vision capabilities. This image has two characteristics; high resolution and high pixel depth [4]. However, in a certain condition, the produced image is not clear enough because of noises. Therefore, improving the quality of the image is necessary to deliver valuable information to the doctor. In advance, an intelligent system can be used to provide an early diagnosis for them.
To deal with medical image challenges, various research tried to addopt an intelligent method to process and provide analysis automatically. In breast cancer, the research area is detecting cancer in hyperspectral imaging [5], breast cancer classification [6,7], optical imaging and augmented reality visualization [8,9]. Then, automatic methods were build to Lung diseases on CT images [10,11]. Moreover, similar methods were also created to analyze skin diseases from skin image [12][13][14][15][16][17][18][19][20][21]. If we take a look in more detail on the method, deep learning has been chosen recently and massively as one of the methods for automatically analyzing the medical image [22][23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39]. Deep learning has been widely used to analyze medical images in skin lesion classification [19][20][21], breast cancer classification [23][24][25][26], and melanoma detection [27][28][29]. From those research, their methods are built for the medical environment for specific task. Since advancing method in machine learning, like deep learning, and availability tons of image processing libraries which can be used freely, a system which can assist human to improve their understanding on the medical image is highly needed. In this paper, different with previous research, we proposed a system with deep analysis on medical image. Our system is design to produce extensive analysis on a given medical image. Our system can detect medical object and analyze it for better understanding on it. We design our system to run on web-based system. To improve mobility of our system, we also develope a mobile-based application. These systems are embedded five features. The first feature is medical object detection. Its purpose is detecting all objects occurred in the scene. The second feature is medical website detection which is designed to search related articles or images associate with the given image. The third feature is object labeling. This feature performs entity categorization and rank the results based on confidence score. The fourth feature is safe search which aims to classify the type of content related to the given image. The last feature is image properties. It provides the detail of the image based on pixel information.
This study is written as follows. In section 2. we provide brief information about Google Vision. Then, we present our proposed system in section 3. After that, in section 4. we present the detail of system implementation, the result of the experiments, and some discussions of the result. Then, we give the conclusion of this study and insight for future research in section 5.

GOOGLE CLOUD VISION API
Google cloud vision API is a service from Google cloud platform (GCP) that can provide an analysis of an image. This API was released on May 18, 2017, with Machine Learning and Big Data technology which became the engine behind it. Cloud Vision API is used to identify objects in an image such as text, symbols, and types of product objects digitally.
Google has many scenarios in the Cloud Vision API. For example developers can use Cloud Vision API to detect whether there is a mobile in the image, detect inappropriate content, analyze someone's emotions recorded in the image, and extract the writing. Cloud Vision API supports the detection of objects in images using the same technology on Google Photos so that developers can find out the names of objects in a photo. Besides, this API can be used to avoid inappropriate image content detected with Google SafeSearch. Cloud Vision API can also be used to analyze people's emotions and detect various logos from famous products. Google's API is also able to detect the letters contained in images with automatic language identification.

3.
PROPOSED SYSTEM Generally, the process of our proposed system can be seen in Figure 1a. The image will be uploaded to our system. Our engine will analyze it and provide several informations. We realized these information as features. We designed five features on our system. Those features are object detection, label detection, web detection, image properties, and safe search. Each feature delivers different information to the user. Those features will be explained as follows: (a) Object Detection: This feature is used to detect objects which can be found in the uploaded image.
The detected objects will be rounded with a square. (b) Label Detection: For each image uploaded by the user, Medical Vision will analyze the type of content occurred in it. For example, an actinic keratosis image is uploaded to the system. This system will produce several labels correlated to the image which have been sorted by their confidence score. For the actinic keratosis case, the detected labels will be finger, skin, hand, and thumb, joint, nail, gesture, flesh, and wrist. (c) Web Detection: This feature is used to recommend related websites which explain the image in more detail. Two types category of website are produced; partially matched images and pages. Beside of this links, two more information are given in this feature. The first information is the best guess label which mostly matched for the given image. Secondly, this feature presents entities information. These entities present related diseases related to the image. For example, given an image labeled as actinic keratoses, our system will produce several entities; actinic keratosis, keratosis, actinic cheilitis, Ì ISSN: 2088-8708 skin cancer, a precancerous condition, keratoses, seborrheic keratosis, therapy, skin, and lesion. The first label has the highest confidence score. (d) Image Properties: Image properties feature is used to analyze the image based on its pixel value.
This feature gives the detail information of entities produced in web detection feature. For each entity, three values are computed; RGB, score, and pixel fraction. (e) Safe Search: Safe search feature is used to categorize the image into content classes. In this feature, the image will categorize into adult content, spoof content, medical content, violence content, and racy content. For each category, a score will be given as very likely, likely, possible, unlikely, and very unlikely. For example, the image in Figure 1b will be annotated likely as Medical content, possible as Violence and Racy content, and very unlikely as Adult and Spoof content. To improve ease of access, we designed our system in the form of a web-based system and android application. As seen in Figure 2, the user can upload an image through both applications. This image will be sent to a web server. This server makes a request to Google Vision Server using Google Vision API. After that, this Google Vision Server produces the image references and send it back to the web server in JSON file. Lastly, the web server will send this result to the user.

4.
RESULT AND DISCUSSION As stated previously, Medical Vision was developed in a web-based and mobile-based system. The web-based system was developed under HTML while the mobile application was designed for Android only. As shown in Figure 2, the uploaded image in both browser and mobile-apps will be processed in the web server where process this image using Google Vision API. We evaluate this system by inputting several medical images into the system; actinic keratosis, bullous pemphigoid, chickenpox, eczema, herpes zoster, impetigo, keloid, keratoacanthoma, lichen planus, melanoma, pustular psoriasis, seborrheic keratosis, and tinea barbae.

Web-based implementation
The example of our implementation can be seen in Figure 3-8. In this example, we test our system by inputting the Actinic Keratosis image. Actinic keratosis (AK) is a disease that can occur in our skin caused by ultraviolet radiation. As shown in Figure 3, we upload a picture which is classified as AK. The object analysis tells that the image is labeled as Person with confidence score by 86.17%. Then, in Figure 4, the labels predicted by our system shows that the image is related to finger, skin, hand, thumb, joint, nail, gesture, flesh, and whist where the highest score is obtained as finger. Interestingly, in web detection feature, our system accurately predicts the image as AK with confidence score by 87.34% followed by keratos, artinic chelitis, skin cancer, a precancerous condition, keratoses, seborrheic keratosis, therapy, skin, and lesion as shown in Figure 5. When the web detection result is clicked, the page will be directed to the new page which will show some information related to the image. In the safe search feature, furthermore, the system will give a general type of content for the given image as shown in Figure 8. For AK image, the result is classified likely as medical content, possible for violence and racy content, and very unlikely as adult and spoof content.

Mobile-based implementation
The implementation of Medical Vision in mobile apps can be seen in Figure 9. This image is actually hands which were affected by Leprosy disease in https://www.who.int/lep/disease/en/. This disease is showed that it infects the skin but it also may affect peripheral nerves. As shown in Figure 9a, the system detects two objects in the given image. Those objects are glove and animal where glove has the highest confidence score. In label detection, the labels results show that it highly related to the image qualitatively. Interestingly, the web detection presents that the image labeled as Leprosy with 81.49% evidence score. The apps also provide a similar image from the website.

Performance analysis
The previous subsection shows the example of the implementation of Medical Vision in web-based and mobile-based system. Beside of that, we evaluate our system performance for 13 medical images. We gather the results in Table 1 The first column presents the actual label of the image. In the second column, the detected object for each image is presented aside with the highest confidence score. From those result, most of the objects are highly correlated in qualitatively. However, in pustular psoriasis and seborrheic keratosis, the object detection results are not fit the actual image. In pustular psoriasis, a watermelon object is detected while a baked good is identified in seborrheic keratosis. If we observe, however, the pustular psoriasis image, this disease will make a pattern in the skin and has a similar pattern like watermelon.
The label detection results show promising result where all of them presents high relation to the disease. In the majority, the skin label has the highest confidence score. For actinic keratosis, the sample image which is shown in Figure 3, is classified as a finger.
An interesting result is shown in the web search feature. In Table 1 (see appendix) most guess labels was closely matched the actual image labels. The web search entities are also mostly related to the actual image label. For impetigo, keratoacanthoma, pustular psoriasis, and seborrheic keratosis, web search feature produces uncorrelated labels. However, their entities show that the results for impetigo and pustular psoriasis images are not fit but others match up the labels. This labeling error may occur because of model limitation.
The safe search feature also shows promising. The results present that all test images categorize as medical content, but not all images predict VL. In actinic keratosis, the image classifies likely as medical image. From those results, there is a relation between medical and violence content. When the image labeled as medical content, it also will be said as violence content. For example, impetigo image is predicted VL as medical content and L as medical content.

CONCLUSION
In this study, we have been presented our system which is called Medical Vision. Our system is utilized Google Cloud Vision API for processing the image. This system was built for common people with less knowledge in the medical image. Based on our evaluation, our system may work properly in the given test case. This system still needs some enhancements in the image processing method. In Google Cloud Vision, the model was trained using various objects, so it will not cover all of the medical objects. Therefore, a new model is recommended to be built in the future.