Verification and comparison of MIT-BIH arrhythmia database based on number of beats

ABSTRACT


INTRODUCTION
The electrocardiogram (ECG) was generally used for the observation of cardiac physiology as a cost-effective and non-invasive process. For the cardiologist to diagnose cardiac diseases, the ECG signal shows heart functionality. The ECG field is developed significantly, considering the most common death is generally from cardiovascular diseases [1]. Many applications are based on the ECG signal, such as measuring the heart-rate, biometric-identification, movement-recognize, and diagnosing-abnormality [2].
Generally, the first ECG standard material available for testing and performance evaluation is the MIT-BIH arrhythmia database [3]. It played together with the American Heart Association (AHA) database an interesting role in stimulating manufacturers of arrhythmia analyzers to compete on the basis of objectively measurable performance. The value of common databases for basic research and medical device development and evaluation is attributed to the MIT-BIH arrhythmia database. The MIT-BIH has comprised variable ECG signals with a variable: noise, artifacts, beat types, and wave shapes. A 48-records with two channels for each ECG-signal and an annotation file are included. These signals are recording from 25 men and 22 women for a half-hour period at 360 samples per second. The database has been annotated with 112,647 annotations, and these annotations have been verified [3]. It has been classified into two main annotation categories: the beats and the non-beats. The beat annotations for the MIT-BIH arrhythmia database consist of 15 subtypes, and the non-beats annotations consist of 24 subtypes, as shown in Table 1. Until now, more than two thousand works cited the MIT-BIH arrhythmia database. It is unique in terms of arrhythmia classification since it offers five arrhythmia standards groups [2]. The QRS detection methods are essential for most of the cited works, including arrhythmia detection, classification, and diagnosing applications. Depending on this database, many QRS detection algorithms have been developed, tested, and evaluated. The QRS detection algorithms are based on the beats annotations in the database signals for testing and evaluation. These beats are used as learning data for the methods depending on the learning technique.
Many researchers used MATLAB for algorithm implementation based on the waveform database (WFDB) Toolbox [4]. This Toolbox consists of the functions that are used for reading, writing, and signals processing the files of PhysioNet databases. The MIT-BIH arrhythmia is one of the PhysioNet databases which contains data and annotations files. The WFDB is used to extract the ECG-signals and these annotations from the MIT-BIH arrhythmia database for all records. It can extract one type of beats or nonbeats annotations or extract all annotations without any filter. So, it is not easy to extract all beat annotations only, which is leads to errors from reading the non-beat annotations. When reviewing the existing methods that used the MIT-BIH arrhythmia database, we noted that not all these methods are considered the same number of beats for the same database records. Also, this difference affects even slightly the evaluation results used to compare the performance of the methods. This work will study the reasons for reading different numbers of beats and methods comparison with correction and verification. Furthermore, a new function is designed to extract the correct beats and remove the non-beats annotations from the original database files based on WFDB Toolbox for MATLAB.
In section 2, the MIT-BIH arrhythmia database and its annotation types in detail are described. Section 3 present the proposed function that extracts the correct beat from the annotation files. Then, section 4 demonstrates the results and discussion for revising the existing methods with a comparison based on each method's beat number. Finally, in section 5, the conclusion is summarized.

MIT-BIH ARRHYTHMIA DATABASE
The MIT-BIH arrhythmia database is one of the most substantial ECG databases. Contrasting database signals, noise, and artifacts make it suitable for testing and evaluation. Moreover, the verified annotations files that contain the beats and non-beats types, as shown in Table 2 and Table 3. These tables show the MIT-BIH arrhythmia database annotations for each record based on the PhysioNet annotations descriptions for beats and non-beats annotations. There are more than these annotation types, which are shown in other databases. Each beat's annotation is a QRS-complex with different types as normal-beat or other beats. On the other hand, The Non-beat annotations are ventricular flutter wave, start/end of ventricular flutter, and starting for many types of rhythm like (sinus, paced, ventricular, supraventricular, atrial fibrillation, atrial flutter, and heart block). These are annotated ECG signal to show at this point one of the rhythms are starting. So, it is not a beats (QRS) annotation. The ventricular flutter (record 207) is excepted for many QRS-detection methods because it is defined on the ECG by a sinusoidal wave without a clear showing of the QRS-complex wave and T wave. The QRS detection methods based on the MIT-BIH arrhythmia database use the beat annotation only because the non-beats annotations are not shown QRS waves for testing, evaluation, and learning. The number of beats annotations are shown in Table 2 with (109,494 Beats) for all 48 records. This number should be a standard number of beats depending on the original database annotation details and the PhysioNet annotations descriptions for beats and non-beats types. Also, the QRS detection methods are excluded from the 472 ventricular flutter waves from record no. 207, because these waves are considered as non-beat annotations based on the annotation's description of PhysioNet as shown in Table 3.

HEARTBEATS FILTER FUNCTION
In this paper, a MATLAB function is designed to filter the annotations file for any PhysioNet databases included the MIT-BIH arrhythmia. The function removes the non-beat annotation shown in Table 3 so, the annotations file will contain the beat annotation only shown in Table 2. On the other hand, the existing MATLAB-WFDB function (rdann) reading the annotations file can read all annotations or one annotation. So, rdann cannot filter the annotation by beats or non-beats type; for this reason, the function with new features was proposed with new features to filter the data correctly without any errors. This function is simple, but it is important to standardize the beats number for any researcher that are used PhysioNet databases. This function can be added to the MATLAB-WFDB toolbox to simply filtered the annotations files by removing the non-beat annotations precisely with the standard values. The function read and search all annotations data files for each record, as shown in Figure 1. If the annotation is one of the non-beat types, this annotation will be removed from the annotation data. Also, it has to be used for any PhysioNet database to extract the beat annotation by removing the non-beat annotations used to prepare the data for many applications, including QRS-detection methods.

Start
Read the annotation (Ann)

If
Ann is non-beat Table 2 Delete The Ann

THE COMPARISON AND VERIFICATION RESULTS WITH DISCUSSION
The work focuses on the verification and comparison of the MIT-BIH arrhythmia database used for the QRS-detection algorithm. The proposed heartbeats filter function can apply to all MIT-BIH databases from the PhysioNet site. The reviewed QRS-detection methods are not using the same number of heartbeats for the MIT-BIH arrhythmia database. This number should be standard for this database because it depends on the original database's beats number. Simultaneously, not all the QRS-detection methods are considering the same number of beats for the same database records. The revision for the existing QRS-detection methods using the MIT-BIH arrhythmia database has summarized the errors for these methods based on the beats for records shown in Table 4 (see in Appendix). The incorrect records are indicated by bold, the Total (T), and Errors (E) in this table. The methods should use the same number of beats without any difference, but the errors are occurring by researchers. All the reviewed methods are revised, compared, and verified based on beats number for each database record. Table 5 summarizes the total beats number, total error per record, and total error per database for different methods to evaluate these methods' incorrectness.
The total number of beats for the MIT-BIH arrhythmia database used from the reviewed methods is calculated; this number should be 109,494 heartbeats for all database records, as shown in section 2. The beat errors for these methods compared to the correct number of beats for this database are determined to find the number of methods that used the correct beat's value. Also, the other methods contained errors start from 1 beat to 1400 beats for the overall database. Table 5 shows the percentage of references number for each error per the total references that were reviewed. Moreover, it shows the total number of errors for each reference per each record (sum of the absolute values of errors) and the total number of errors for each reference per overall database, which takes a positive or negative value.
The beats errors per all data record up to 1400 beats and 29% of the total reviewed methods use the correct beats number. On the other hand, 71% are using incorrect beats number. Also, the number of incorrect methods is higher than the number of correct methods based on our comparison. So, we propose this study. Each record in the database for the reviewed methods has been studied for beats errors calculation.  Table 5 and Figure 2 show the difference between these methods for the same records used from the same database. After the results are studied, the following obvious points are established: a. The correct number of the beats is 109,494 beats without adding or removing any data. b. The designed function extracts the correct heartbeats number of all records for the MIT-BIH arrhythmia database. c. If the beats number exceed the correct number: − Some non-beat annotations have been added and should be mentioned in the methods. − The data has been repeated for record and should be mentioned in the methods. d. If the beats number less than the correct number: − Some beat annotations have been removed and should be mentioned in the methods. e. This database contains some errors before digitalization and verification [1]. f.
The WFDB toolbox does not include the beats or non-beats filter for the (rdann) function that reads the annotations files. g. The copy and paste records beat numbers between the researchers without verification. h. A high number of annotation types (39 annotations) confuse the researchers. i.
According to Figure 2, the most error occurs in record no. 207 because many researchers are counting the 472 ventricular flutter waves, but these waves are considered as non-beat annotations based on the annotation's description of PhysioNet. j.
From Figure 2, records no. 209 is the second, and records no. 214 is the third most errors beats for the reviewed methods, but the number of errors is low and not exceeds eight beats and nine beats, respectively. k. According to Figure 2, the lowest error records (102, 103, 112, 117, 119, 122, 123, and 230) because these records contain the lowest non-beat annotations.

CONCLUSION
This paper presented a method for finding the correct beats number for the MIT-BIH arrhythmia database with a comparison study and design a function for MATLAB to extract the correct values for any PhysioNet databases. In this way, the number of beats that are using by the researchers will be standards. The non-beat annotations affected the results of the QRS-detection methods in two ways. First, the proposed methods' evaluation accuracy is not calculated correctly because the number of database beats is incorrect. Second, the methods based on machine learning are trained depending on incorrect information. So, the learning operation was not proper, and the results of the methods are not correct. Most reviewed methods used an incorrect number of beats, 29% of researchers used the correct number, and 71% are used incorrect beats. The proposed function should be added to the MATLAB-WFDB Toolbox to filter the annotations files to remove the non-beat annotations correctly and extract the standard beat values. It can be used for any other programing language to read the annotations files from the PhysioNet databases like python.  1987 1987 1987 1987 1987 1987 1987 1987 1987 1987 1987 1987 1987 1987 1986 1987 1987 121 1863 1863 1863 1863 1863 1863 1863 1862 0   5   10   15   20   25   30   35   40   207  209  214  104  210  108  118  208  222  203  217  231  213  113  100  107  201  215  220  221  233  114  200  202  205  223  101  105  115  116  121  228  232  106  109  111  124  212  219  234  102  103  112  117  119