Automatic Leukemia Cell Counting using Iterative Distance Transform for Convex Sets

Received Jan 4, 2018 Revised May 2, 2018 Accepted May 11, 2018 The calculation of white blood cells on the acute leukemia microscopic images is one of the stages in the diagnosis of Leukemia disease. The main constraint on calculating the number of white blood cells is the precision in the area of overlapping white blood cells. The research on the calculation of the number of white blood cells overlapping generally based on geometry. However, there was still a calculation error due to over segment or under segment. This paper proposed an Iterative Distance Transform for Convex Sets (IDTCS) method to determine the markers and calculate the number of overlapping white blood cells. Determination of marker was performed on every cell both in single and overlapping white blood cell area. In this study, there were tree stages: segmentation of white blood cells, marker detection and white blood cell count, and contour estimation of every white blood cell. The used data testing was microscopic acute leukemia image data of Acute Lymphoblastic Leukemia (ALL) and Acute Myeloblastic Leukemia (AML). Based on the test results, Iterative Distance Transform for Convex Sets IDTCS method performs better than Distance Transform (DT) and Ultimate Erosion for Convex Sets (UECS) method. Keyword:


INTRODUCTION
Leukemia is one of the most dangerous diseases and the impact is very deadly if not quickly overcome. Leukemia is caused by an abnormal development of white blood cells produced by bone marrow [1]. Research on acute leukemia cells has been done to support the medical diagnosis, whether it is about leukocyte segmentation or segmentation of nucleus cells. Manual calculations will certainly lead to less effective and weakness of accuracy due to the factor of subjectivity. A computerized system can be used as a support tool for physicians or pathology specialists in order to improve and accelerate the process of morphological analysis [2]. Saat ini beberapa penelitian menggunakan analisa dari data citra untuk mendiagnosa sebuah penyakit seperti pada literatur [3], [4].
One of the stages of the leukemia diagnosis system in the blood cell image data does the white blood cells segmentation. The process of segmentation is an important step because the results of good segmentation will get an accurate feature so as to improve the accuracy of the diagnosis. Currently, the process of segmentation of white blood cells has been done by many researchers [6]- [13]. In a research [6], segmentation process is done by thresholding using Zack Algorithm and eliminating background using arithmetic process. Segmentation based on the parametric model approach of a Gaussian mixture (GM) was used in the study [7]. In the study by Huang et al. [8], leukocyte segmentation by using Otsu thresholding method was previously done in the process of image repair. The segmentation of the nucleus using the Gram- Schmidt orthogonalization method and the snake algorithm for cytoplasm segmentation was performed in the study [9]. In [10], the research proposed an Interest-based Ordering Scheme (IOS) for Fuzzy Morphology on the segmentation of White Blood Cell (WBC) imagery so as to improve the accuracy of the segmentation of the nucleus. Later in the study [11] using Otsu thresholding and K-Means method to detect cells. In [12], [13], the studies performed the process of segmenting white blood cells using morphological operations.
Based on previous studies, solving the problem of segmentation on the image of blood cells with white blood cell object that still overlapping need to be developed. The problem is related to the still over segment and under segment on the results of segmentation of white blood cells, thus reducing the accuracy of cell count calculations.
Nazlibilek et al. [12] performed white blood cell calculations using geometric approaches from component-connected and labeling processes. Counting overlapping cells with a geometry approach was also performed in the study [13] using the eccentricity and thresholding region. In the study [6], Putzu et al. used solidity and thresholding also for the detection and calculation of cell numbers.
In previous studies, the researchers generally did the detection and calculation of white blood cell counts on overlapping cells using a geometric approach. To calculate the number of overlapping cells, they generally divided a larger area of overlapping white blood cells by the average of the single cell areas. However, the area of single white blood cells varies in size, i.e. there is a single cell area that almost equal to the width of the overlapping cell area so that the cell count becomes less accurate. In [14], the study proposed an automated morphological analysis method for nanoparticle tangent objects to separate the tangled or overlapping particles by iterating the erosion morphology process to obtain markers on each cell. This research proposed the Ultimate Erosion for Convex Sets (UECS) method to identify seed points in a region of interest. This study presented a framework for clusters of convex intersecting objects with three approaches: seed point extraction, contour evidence extraction, and estimated contours that have been marked.
Zafari [15] compared some of the methods of object detection that is overlapping or tangent. The comparable methods included the Distance Transform (DT) method and the UECS method proposed by the researchers [14]. Based on this research result of DT method test, it is very good to mark objects which generally of the same size. The number of objects identified by this method is highly dependent on the given threshold value of ρ. For overlapping cells, if the threshold value is small, it causes large objects fail to segment well, while a large ρ value also causes under segmentation because there are objects lost due to the threshold process. This causes the method to fail to be used to separate the overlapping objects. In contrast to the DT method, the UECS method [14] is very good for segmenting uniformly sized images but over segmentation may occur if the shape of the cell is not good at the edges.
This research proposes Iterative Distance Transform For Convex Sets (IDTCS) method. Unlike the usual DT, the IDTCS method repeats the DT process until no object is left and the cell will be marked if the size of concavity is less than the threshold value (ρ). The result of the IDTCS method is that all the overlapping objects are successfully marked as single objects. IDTCS is named for the iterative process to Distance Transform method to get a marker of each object. The concept likes the UECS method but the difference UECS do iteratively to the erosion morphology process to get markers of each object. The study was divided into three stages: segmentation of white blood cells, marker detection and calculation of white blood cell count, and contour estimation stage of each white blood cell. The first stage aims to get the white blood cell area covering the whole cell area and the nucleus area. The second stage aims to get the marker of each cell both in the single and overlapping white blood cell area, then the process of calculating the number of white blood cells can be done. In this second stage, the overlapping white blood cell area can be detected by more than one marker number in the area. The third stage aims to get the contours of every single white blood cell in the area of single and overlapping white blood cells.

RESEARCH METHOD
Stages of the process of segmentation and calculation of white blood cell count is an important stage in the diagnosis of Leukemia disease. In general, the design of the segmentation process and the calculation of the number of white blood cells include segmentation of white blood cell area, stack cell detection, cell count calculation, and stack cell separation. This study proposed the design of the segmentation process and the calculation of the number of white blood cells include segmentation of white blood cell area, marker detection and white blood cell count, and estimation of white blood cell contour. Detection of markers and calculation of the number of white blood cells were performed first in this study without the detection of overlapping cells. The purpose of the detection process of white blood cell marker is to get a marker of every white blood cells in white blood cell area either single or overlapping. A white blood cell area that has more 1733 than one marker is an overlapping white blood cell area. The design of the segmentation process and the calculation of the white blood count can be seen in Figure 1. The input data for this research is microscopic images of acute leukemia which consist of images of Acute Lymphoblastic Leukemia (ALL) and Acute Myeloid Leukemia (AML). The first stage in Figure 1 is the process of segmenting white blood cells aimed at obtaining the cell area of the white blood cells. The second stage is the determination of the marker of each white blood cell and the calculation of the number of white blood cells. Marker detection is done on the single and overlapping white blood cell area so that from the result of the marker detection process can be known where the white blood cell area is piled marked by the number of marks more than one. The third stage is the estimation of the edge or contour of each white blood cell area of single and overlapping white blood cells, so the output from this stage shows the results of the separation of each cell in the area of overlapping white blood cells.

Description of the microscopic dataset of acute leukemia images
The dataset used is the microscopic images of Acute Lymphocytic Leukemia (ALL) and the microscopic image of Acute Myeloid Leukemia (AML). The ALL image dataset is obtained from the ALL-IDB1 dataset [16] which is the peripheral blood smear sample dataset of patients with ALL and non-ALL patients collected at the Tettamaati Research Center research center, Monza, Italy. The AML image dataset was obtained from the Local Government Health Laboratory of South Kalimantan, Banjarmasin, Indonesia [17].
The number of blood smear samples used in this study, for ALL image data is 31 samples, and AML image data is 50 samples. Examples of blood smears from ALL and AML image data can be seen in Figure 2.

Segmentation of white blood cells
The first stage in this study is the process of segmentation of white blood cells. This process aims to separate the area of white blood cells with other components such as red blood cells and background. Flow diagrams of existing processes at the stage of segmentation of white blood cells can be seen in Figure 3. The input image at the stage of white blood cell segmentation is a microscopic image of acute leukemia, then transformed from the RGB color space to the HSV color space. The next process is thresholding using Otsu method to get the object area of cells in the form of white blood cells and red blood cells by separating the cell object from the background. The result of the thresholding process is a binary image with only two components: cells and background. The binary image is used as a mask to obtain the Hue component of red and white blood cell objects in the HSV color space. To determine whether the candidate of a cell object is a white blood cell (WBC), it is necessary to check the Hue component by using histogram analysis. In this research, the T threshold value for ALL image data is different from the AML image data due to the difference of lighting and the staining process. In the ALL image, if the Hue component is less than the T value (T = 0.83) then the cell object is a white blood cell (WBC) candidate, otherwise it is a red blood cell (RBC). While on the AML image, if the Hue component is smaller than the value of T, which has the range of 0.89 -0.95, then the cell object is a white blood cells (WBC), otherwise it is a red blood cell (RBC). The next process is to remove objects that are categorized as noise and remove objects that are on the edge of the image. The process of removing the noise object by calculating the area and solidity of the object is that if the cell object is less than A and the solidity is smaller than S then the object is removed. In this research, there are different values of A and S for ALL and AML images because of different sizes of ALL and AML white blood cells. Therefor, this study uses the value of A = 2000 for ALL cells and A = 7000 for AML cells since the ALL cells are smaller than the AML cells. While the criteria for the solidity value is smaller than S, the research used the value of S = 0.67 for both ALL and AML images. Next, every cell located on the image edges is removed since it is regarded as an imperfect object of the cell. The result of the WBC segmentation stage is a binary image with only two components, the WBC objects, and the background.

Marker detection and calculation of white blood cell count
The second stage in this study is marker detection and calculation of white blood cell count. The input in this second stage is a binary white blood cell image which is the result of segmentation of white blood cells. Some previous studies generally calculate the number of white blood cells (WBC) based on geometry such as area, solidity, or eccentricity. However, this geometry-based method is a constraint if a single cell area is almost the same as the overlapping area. Another method used watershed-based 1735 computation for white blood cells, but the number of cells is still not accurate due to over segmentation. In this research, the method used for detection of a cell marker is based on Distance Transform (DT) because DT can be used to detect the number of objects in an image. Based on the previous research, the DT method is very good for marking objects of the same size and shape, but for the objects of variety sizes and shapes, the DT method results in under-segmentation or over-segmentation [15]. This study proposes the Iterative Distance Transform for Convex Sets (IDTCS) method by repeating the DT process until no object is left and the cell will be marked if the size of the concave is less than the threshold value (ρ). The concept is like in the method of Ultimate Erosion for Convex Sets (UECS) using iterative process to erosion morphology process but different from IDTCS method to DT process. The advantage of using this iterative technique, every touched cell or overlapping cell is getting smaller and more separated by using the concept of concavity. DT is calculated based on the distance of each pixel to the nearest non-zero pixel using the Euclidean distance formula.
The calculation of concavity is as in [14], [18] by using Equation (1) The algorithm of Iterative Distance Distance Transform for Convex Set (IDTCS) method is as follows:

Iterative Distance Transform For Convex Set (IDTCS) Algorithm
Input: Binary silhouette image Output: Object marker (M) Parameter: distance transform threshold and concavity threshold . 1. Perform smart filling and smoothing object algorithm on 2. Initialize ( ) 3. Compute distance transform of ( ) and normalize to [0,1] 4. Create a new binary image by thresholding the image using 5. Compute the concavity of all objects. 6. Mark the object if size of the concavity less than 7. Repeat step 3 to 6 until ( ) ( ) In the IDTCS algorithm, the first process is to do the smart filling and smoothing process that aims to close the small hole in the cell because the process of segmentation of white blood cells is not perfect and the smoothing the edges of white blood cells. The next process in the IDTCS algorithm is calculating the distance transform and then normalized. The following process is to create a binary image from the result of distance transform with the threshold ρ 1 . Calculate all objects with the concavity Equation (1) and mark the object if the size of concavity is smaller than the threshold ρ 2 . Repeat steps 3-6 until no object is left. In this study, the values ρ 1 and ρ 2 used on the IDTCS algorithm are 0.2 and 0.15. Smart filling and smoothing algorithm used in IDTCS algorithm is as follows:

Smart Filling and Smoothing Algorithm
Input: Binary silhouette image Output: Binary silhouette image 1. Set = imfill( ,'hole'); 2. Compute 3. Calculate mean and standard deviation of all objects in based on its size. 4. Delete all objects with size less than the summation of mean and standard deviation. 5. Compute 6. Compute = imtophat( ,strel("disk",10)) 7. Compute Process 1-5 on smart filling and smoothing algorithm is a filling process that aims to close the existing holes in white blood cells but it does not close the holes between cells in the area of white blood cells that are overlapping. While the process 6-7 is a smoothing process that aims to smooth the edges of white blood cells using morphological method of tophat.

RESULTS AND ANALYSIS 3.1. Experimental results of white blood cell segmentation on acute leukemia images
The first stage of this study was the segmentation of white blood cells (WBC) on microscopic images of Acute Lymphocytic Leukemia (ALL) and Acute Myeloid Leukemia (AML). The output of the first stage is the image of segmentation of white blood cells that shows the separation of the white blood cell objects with other objects. The results of each process in this stage of white blood cell segmentation can be seen in Figure 4 and Figure 5. Figure 4 shows the results of each process in the first stage of an ALL image while Figure 5 shows the results of an AML image. Figure 4(a) and Figure 5(a) show the acute leukemia images, and Figure 4(b) and Figure 5(b) present the transformed images using the Hue Saturation Value (HSV) color space. Figure 4(c) and Figure 5(c) show the results of the thresholding process to separate the blood cell object with the background. The next process gets the Hue value information on the blood cell object and the results are shown in Figure 4(d) and Figure 5(d). After that, the process of obtaining white blood cell (WBC) candidates with the criteria of Hue value less than the threshold (T) gives the results as shown in Figure 4(d) and Figure 5(d). And the last process at this stage of WBC segmentation is removing the noise (small object) and the object on the edges of the image and the results are given by Figure 4(e) and Figure 5

The experimental results of marker detection on WBC overlapping in acute leukemia images
The second stage in this study was marker detection and WBC count calculation on microscopic images of acute leukemia. This research proposes Iterative Distance Transform For Convex Sets (IDTCS) method for WBC marker detection. The result of each iteration of IDTCS method for marker detection on WBC Overlapping can be seen in Figure 6. Using the IDTCS method, Figure 6 shows that the markers of each cell on the overlapping WBC can be detected properly. The method is named IDTCS because it repeats the DT process until no object is left and the cell will be marked if the size of concavity is less than the threshold value. IDTCS loop results can be seen in Figure 6(e) to Figure 6(l). In this research, we use Distance Transform with threshold ρ1 = 0.2 and the size of concavity with threshold value ρ2 = 0.15.
In this study, the results of IDTCS proposal method were compared with Distance Transform (DT) and Ultimate Erosion For Convex Sets (UECS) method. An example of WBC marker detection test results on ALL images using DT, UECS, and IDTCS methods can be seen in Figure 7. Examples of WBC marker detection test results on AML images using DT, UECS, and IDTCS methods can be seen in Figure 8.
The DT method is excellent for marking objects that are generally the same size. But in Figure 7(b) of the ALL image, the result of the DT method is under segment due to the shape and area of the different cells. Whereas in the AML image dataset the DT method is quite successful for marker detection because generally the WBC objects have the same size, as shown in Figure 8(b). The result of marker detection using UECS method can be seen in Figure 7(c) and Figure 8(c), there are more over segmentation or number of markers is greater than the actual cell number.
Using the IDTCS method the WBC marker detection results are more accurate even in overlapping cells with cell size variation as in Figure 7(d) and Figure 8(d). It can also be seen from the comparison of recapitulation result of marker or cell count calculation using DT, UECS and IDTCS method as in Table 1 and Table 2. Table 1 shows the results of the comparison of the calculated number of markers on each cell in ALL images whereas Table 2 is the results for the AML images.  In Table 2, IDTCS method also has higher accuracy than DT or UECS method. IDTCS method can detect markers equal to the groundthruth of 48 images from 50 images, while DT method obtains as much as 42 images and UECS obtains 25 images correctly processed. For the AML image dataset, the test results show no under segmentation occurred using the IDTCS method. The test results show that UECS method is more likely to over segment on WBC marker detection process while DT method is more under segment.

CONCLUSION
In this paper, we propose an Iterative Distance Transform for Convex Sets (IDTCS) method for detecting cell markers on overlapping white blood cells (WBC). The IDTCS method repeats the DT process until no object is left and the cell will be marked if the size of concavity is less than the threshold value (ρ). The result of the IDTCS method is that all the overlapping objects are successfully marked as single objects. It is named IDTCS because the process is done iteratively in Distance Transform method to get a marker of each object. From the test results on two datasets of ALL and AML images, IDTCS method has higher accuracy than DT and UECS method. The IDTCS method obtains an accuracy of 0.70 for ALL and 0.96 for the AML images. While UECS has an accuracy of 0.41 for ALL images and 50% for AML images. DT obtained an accuracy of 0.29 for ALL and 0.84 for AML images. The automatic calculation of the number of white blood cells will accurately support the doctor or pathologist in diagnosing the level of acute leukemia disease. The next research is to get an estimate of the contours of WBC cells after separating the touched or overlapping cells in order to obtain a more accurate area and form of white blood cells. With an accurate WBC area and shape it can improve the accuracy of the acute leukemia classification process.