Machine learning for real-time prediction of complications induced by flexible uretero-renoscopy with laser lithotripsy

ABSTRACT


INTRODUCTION
A surgical intervention has a number of risks, some of which are more likely than others depending on the circumstances.Surgical procedures such as cardiothoracic or pulmonary surgery, hepatobiliary surgery, abdominal surgery, prostate removal, and major osseous and articular surgery all have higher risks.Further, depending on their health, certain patients are at higher risk of complications: malnutrition, diabetes, obesity, and antecedents of cardiac insufficiency are all factors that increase the surgical risk.
Like all operations, kidney stone surgery by uretero-renoscopy (URS) presents an increasing rate of infectious complications.These URS related complications may increase the risk of mortality and morbidity especially in high-risk patients.Prophylactic antibiotic use, stent and operation dwell time restriction, early detection and treatment of urinary tract infections (UTIs) and urosepsis, careful planning in patients presenting with a load of substantial calculus and numerous comorbidities are some examples of these complications.With improving technology, flexible minimally invasive URS with laser lithotripsy (FURSLL) has recently been introduced as a new technic to improve the outcome of ureterorenal stone surgery.Even for individuals with proximal kidney and ureter stones smaller than 20 mm, FURSLL has become an attractive option [1], [2].For this kind of stones, stone-free FURSLL rates can be as high as 1% with an overall 28% risk of preoperative complications [3], [4].Although flexible URS lithotripsy is a less intrusive procedure, there is a possibility that complications could emerge [5]- [7].
The complications caused by URS have been divided into two categories: minor and major.complications are considered minors if they can be effectively treated without surgery.They include asymptomatic ureteral perforations, ileus, and fever.Major complications are those that necessitate surgery or present a life-threatening situation.Seldom avulsions, tears, perforations during basketing, intussusception, and infection are among the second category of serious complications that are commonly correlated to stone extractions [8].It is noteworthy, in some cases, following URS for stone removal, Necrosis of ureteral segments is observed [9].Major complications of URS can have serious and lasting consequences.Usually open or laparoscopic surgery is almost always required in major complications and the basic goal is to restore ureteral continuity [10].
Machine learning (ML) is attracting more and more interest as a method for analyzing statistically the complex and expanding body of medical data in order to enhance patient treatment on an individual basis.Integrating the use of ML-based decision support tools into surgical practices can have many benefits.ML refers to algorithms that can be designed to evaluate and make predictions based on new and complex features [11].A system with ML model that incorporates the full range of patient data offers a way to stratify patients with nephrolithiasis to avoid potential complications after surgery.These systems deserve to be used as supportive in a complementary manner, and based on data inputs in a multidisciplinary, multifunctional surgical risk analysis and large reduction protocol.The necessary steps of the decision-making process should be based on the predicted risk stratification data they offer.For renal surgery teams, an equivalent process and technology integration point has the potential to be advantageous for patient safety.Thus, the aim of this work is to propose an intelligent decision support system based on ML, in order to predict the complications for the cases of the patients who underwent a surgical operation using a flexible URS.
The remainder of this article is organized as follows: section 2 presents related works.Section 3 describes the methodology used to predict the operating time of lithotripsy by flexible URS.The simulation results are presented in section 4. Section 5 discusses the experimental results.Finally, conclusions and some recommendations for research perspectives are given in section 6.

RELATED WORKS
By imitating cognitive capacities, artificial intelligence (AI) refers to the capacity of computer to accomplish tasks [12].ML is a subset of AI that allows algorithms, to train models to learn from massive and complicated data then produce valuable predictive outputs [13].New patterns and associations can be identified by using ML algorithms on new, massive data sets, and these results may have positive effects on clinical practice in medicine [14].A large number of studies have examined at the use of ML approaches in healthcare and prove how they significantly increase healthcare quality and safety [15], [16].Table 1 illustrates the latest applications of ML methods in predicting surgical outcomes.

973
Although still in their infancy, machine learning algorithms provide surgeons an opportunity to take capitalize on the abundance of clinical data and enhance personalized patient care.Although artificial intelligence has been increasingly being utilized in healthcare, research still mostly focuses on cardiovascular, nervous system, and cancer problems since they are the main causes of disability and death.Chronic and infectious diseases like diabetes, inflammatory bowel disease, and Clostridium difficult infections have also received some attention.By increasing the quality of the extraction of clinical data and implementing that data into a properly trained and verified system, early diagnosis can now be achieved for a variety of conditions.To the best of our knowledge, no paper up to the moment has addressed the subject of predicting complications related to flexible uretero-renoscopy with laser lithotripsy.This is why we have initiated this work, in order to fill this gap.

METHOD
Predicting the post-operative complication risks is very important for physicians to identify patients who need intensive monitoring or any additional interventions.Risk prediction also concerns patients and their families for decision-making during surgery.However, an intelligent system for estimating the risk of complications will be of a paramount importance.
The methodology proposed in this study aims to implement a real-time prediction system for the complications risks during flexible ureteroscopy with laser lithotripsy operations, using ML.The methodology adopted is described in Figure 1.The proposed system is developed using an initial dataset consisting of 5 classes [31].These classes correspond to the five types of complications that can occur during a flexible ureteroscopy with laser lithotripsy operation according to the clavien-dindo scale (the claviendindo) classification scale makes it possible to classify these complications into 5 different classes, according to the severeness and the required interventions to fix these complications, varying from the absence of complications to the death of the patient [32].Since this dataset is imbalanced and has some classes with only one observation, it cannot be used without being preprocessed.

Data preprocessing
During the data preprocessing phase, the dataset is reorganized into three main classes instead of 5.The classes (0: which corresponds to the class of no complications, 1: representing the class of complications of degree 1 and 2: which represents the class of complications of degree 2) according to the Clavien-Dindo classification scale.Classes 3 and 4 are ignored because they contain very limited number of entries that cannot be augmented in the following step.In this phase, a quality verification of the data is ensured.This verification is guaranteed by eliminating redundant values, solving the problems of missing values as well as Although we preprocessed the dataset the best we could, the size of the dataset is still scarce class wise, henceforth, it needs some augmentation as well a balancing of these classes.

Data augmentation
The dataset resulting from the preprocessing contains a small number of elements of classes 2 and 3 compared to the number of observations of class 0, the proposed prediction system attempts to predict a large number of observations during the test as being of class 0, which will influence the performance of our predictive system.To fix this problem, data augmentation is applied to balance the dataset by generating observations for classes with a very low number of observations.The data augmentation method adopted in this study is the synthetic minority over-sampling technique (SMOTE), based on the generation of synthetic observations between each sample of the class with a low number of elements and its closest observations [33].

Features selection
Once the dataset is balanced, a selection of the most relevant features is applied.This selection is performed by using six different feature selection algorithms.The features selection techniques used in this study are presented in Table 2.The most relevant features selected after the selection phase, are then used by ML models.

Proposed classification system
In order to build a more efficient classification system, several methods were compared.First, classification is performed on the entire data set.Then, only the features retained by the best feature selection algorithm among the techniques compared are used.
The second step of classification consisted of an estimation of the risks of complications.The estimation is done by adopting a predictive model based on the vote, without and with the use of hyperparameters.This is to make a final decision while improving the performance of estimating the complication risks of the proposed system.The search for hyper-parameters is ensured by using the tuning function.This technic consists in training the system according to different possibilities for each hyperparameter and keeping the best parameterization having generated the best score during the classification.
For the voting ensemble model proposed in this study, two voting modes were compared.The soft voting based on the sum of prediction probabilities of each class among all models of proposed system and the hard voting based on the label majority of different models of the proposed system [36].The obtained results of complication risks estimation are compared to retain the method providing the best prediction performance.Finally, a comparison of the variables according to their impacts during the classification is carried out, this classification of the variables retained according to their importance is provided by using the SHAP value.The different classifiers used for the system proposed are listed in the following Table 3.

Performance metrics
Performance metrics are an integral part of the ML model evaluation process.They are very useful for measuring and comparing model performance.The metrics used in this work are: − The accuracy: Which represents the rate of correct predictions of the classes.The accuracy is described by the following (1) [37].
− The recall which represents the number of true positives compared to true positives and false negatives.
− The F1-score which is a metric combining precision and recall with respect to a specific positive class, it is a weighted average of precision and recall.The following (4) describes the F1-score [37].

Dataset description
In order to predict the potential postoperative complications during a flexible uretero-renoscopy with laser lithotripsy, the dataset used in this study is composed of 682 observations with 21 features representing three classes.Class 0 representing the case of no complications.Class 1 for grade 1 complications and class 2 for grade 2 complications based on the Clavien-Dindo scale.The distribution of these classes is shown in the Table 4.
We observe that the dataset is unbalanced, with a percentage of 90% the class 0 is the majority, compared to 5% for the two other classes 1 and 2. A dataset balancing was applied using the SMOTE augmentation method.Figure 2 represents the data from the original dataset and the data balanced by class.

Technical description of the used calculator
The experiments conducted were executed on a computer equipped with one processor: Intel® -Core™ i7-8550U with a frequency of 1792 MHz.This processor has 2 cores and 2 threads.The computer is equipped with 8 GB of RAM DDR3 and no GPU.

Simulation results using all dataset features
Table 5 represents the results of the different predictive models used, implemented on the dataset.The results are ranked by the accuracy as a criterion.The models generate different results, which vary between 34% and 96% for the classification of the different observations.

Results with sequential feature selector method
The previously compared algorithms were reused, except that in this case using the sequential feature selector (SFS) method.The SFS method was chosen based on the good classification performance obtained by using this technique compared to the other feature selection techniques.Table 6 represents the results obtained in decreasing order of classification accuracy.We can observe that the overall performance of the different models has improved from an accuracy that varies between 34% and 96% to an accuracy that varies between 72% and 89.6%.

Results using voting ensemble and SFS method without models hyperparameter tuning
In this section, we use the hard and soft voting-based classification without models' hyper-parameter tuning.This classification is performed on the features selected by the SFS method.The results of this experiment are summarized in Table 7.To better compare results, the ensemble prediction system's performances, based on the vote (hard and soft), during the training and the test, using the whole dataset is presented on Figure 3.   8 represents the obtained results.The Figure 4 represent as example performances of predictions, in terms of accuracy estimation of post-operatory complication risks, obtained both hard and soft voting over 10 epochs via the ensembling predictive model using SFS feature selection.5. Based on these results, the random forest, extra tree classifier, gradient boosting classifier, XGBClassifier and BaggingClassifier algorithms constitutes the top five classifications.These algorithms provide an estimation the estimation of the risk of complications with an accuracy of 96.45%, 96.29%, 94.82%, 93.83%, and 93.51% respectively.The result generated by RF is relatively better with a standard deviation of 0.011473 compared to 0.013103 for ETC.Other algorithms are less efficient with a classification accuracy varying between 34.26% and 89.10%.The PC classifier is the least efficient among all the models used.The Figure 5 represents a comparative performance of the different classifiers used on the whole dataset.
In terms of execution time, the RF and ETC algorithms provide estimations in 2.351014 s and 1.735743 s respectively.Nonetheless, the algorithm that consumes more time is the GPC with 36.914091s and an accuracy which does not exceed 47%.The SFS method was selected as the feature selection technique.Analyzing the results in Table 6, generated by the different algorithms proposed in this study using the SFS method, the ABC, GPC, LRC, RC, BNBC and LDAC algorithms allow a prediction with an accuracy of around 90%, and the execution time varying between 0.04 s and 2.4 s.The PC algorithm represents the least-performing classifier with a classification accuracy of about 70%.The application of the SFS variable selection method has significantly improved the performance of the lowest performing models by over 36%.Therefore, the accuracy of PC classifier increased from 34% to 70% and the GPC increased from 48% to 89.6%, hence the interest of using the most important features.Figure 6 represents a comparison of the classification results of the different models used in this section.These results show that the use of the variables selected by the SFS method allows an estimation with the highest accuracy.The accuracy in this case was respectively 99.35% and 92.33% for training and testing in the hard vote and 99.95% and 94.38% respectively for training and testing in the soft vote.The execution of the same proposed model, but with the use of hyper-parameters, provides the results shown in Table 8 for the different features selection methods.The use of hyper-parameters significantly increases the estimation performance of the proposed model and the SFS method is also the best feature selection methods in this case.The accuracy is 100% for training and, 94.33% for testing in hard voting and 100% for testing and 95.38% for training in soft voting.The soft vote generates a better prediction in both cases, without and with the use of hyper-parameters.The study of the impact of the different variables on the performance of the predictive model, during the classification, is accomplished by calculating the SHAP value for the different variables.Figure 7 represents the SHAP value of the selected features.The most important feature is nephretic_colic, high values for this value have a significant negative impact on the estimation of the classification system, while low values of this variable have a positive impact on the performance of the system.The other variables that have a significant impact on the prediction are size and density_TDM.Figure 8 presents the impact of the different features on the different classes of the dataset.Figure 8 shows that the variables nephretic_colic, size and density_TDM have a significant impact on all classes of the dataset.The three classes of the dataset use the most relevant variables almost equally, except that the size variable is used more by classes 1 and 2.

CONCLUSION
Despite their power, AI systems cannot replace medical and surgical teams.But they can be used in certain cases to help these teams make better decisions.For example, these systems can be used to predict the likelihood of complications during surgery.Kidney surgery is a high-risk field that seeks to reduce problems and enhance surgical methods and results.The contemporary surgeon operates in a technology setting and has access to a wealth of data input before, during, and after surgery.ML has the potential to be employed in more creative ways to enhance patient safety during kidney surgery and raise the likelihood of successful outcomes.
The use of ML in urologic surgery was highly heterogeneous, and this, together with the preponderance of single studies, shows that there is a significant potential for further refining the studies in this field.The ensemble approach proposed in this work appeared to be successful in predicting postoperative complications in urological surgery, by predicting with precising a variety of surgical results.Ongoing use of ML through well-developed clinical decision support systems is probably going to reduce complication rates and improve the safety and quality of surgery.Our future work consists on developing a system that integrates.Our future work consists of developing operational decision support software for surgeons that incorporates these ML models.

Int−
The precision representing the number of true positives compared to true positives and false positives.

Figure 3 .
Figure 3. Algorithms comparison for all balanced datasets using votes without hyper-parameter search

Figure 4 .
Figure 4. Algorithms comparison for SFS feature selection using votes with hyper-parameter search

Figure 5 .
Figure 5. Algorithm's comparison using all dataset Figure 6.Algorithm's comparison using SFS method

Figure 7 .
Figure 7. SHAP value of the features selected by the SFS technique during the classification

Figure 8 .
Figure 8. SHAP value of the features selected by the SFS technique by class

Table 1 . Latest applications of ML methods in predicting surgical outcomes
Machine learning for real-time prediction of complications induced by flexible uretero … (Chafik Baidada)

Table 4 .
Description of the dataset

Table 5 .
Obtained results using the whole dataset sorted by performance

Table 6 .
Results obtained by applying the SFS method of features selection

Table 7 .
Results by voting without hyper-parameters tuning and applying the SFS features selection algorithm

Table 8 .
Results by voting without hyper-parameter fine-tuning and using SFS algorithm for feature selection