An algorithm for obtaining the frequency and the times of respiratory phases from nasal and oral acoustic signals

This work proposes a computational algorithm which extracts the frequency, timings and signal segments corresponding to respiratory phases, through buccal and nasal acoustic signal processing. The proposal offers a computational solution for medical applications which require on-site or remote patient monitoring and evaluation of pulmonary pathologies, such as coronavirus disease 2019 (COVID-19). The state of the art presents a few respiratory evaluation proposals through buccal and nasal acoustic signals. Most proposals focus on respiratory signals acquired by a medical professional, using stethoscopes and electrodes located on the thorax. In this case the signal acquisition process is carried out through the use of a low cost and easy to use mask, which is equipped with strategically positioned and connected electret microphones, to maximize the proposed algorithm’s performance. The algorithm employs signal processing techniques such as signal envelope detection, decimation, fast Fourier transform (FFT) and detection of peaks and time intervals via estimation of local maxima and minima in a signal’s envelope. For the validation process a database of 32 signals of different respiratory modes and frequencies was used. Results show a maximum average error of 2.23% for breathing rate, 2.81% for expiration time and 3.47% for inspiration time.


INTRODUCTION
Most diseases related to an obstructed or restricted respiratory system can be characterized by the sounds generated during breathing. These include asthma, chronic obstructive pulmonary disease (COPD), among others [1]. Airway abnormalities can cause abnormal breathing sounds. Examples of this could be the absence of unusual or additive sounds, which are known as continuous adventitious sounds (CAS) [2]. Moreover, most analysis procedures for these ailments require auscultation and depend on the skill and experience of the medical professional [3]. Therefore, there has been a significant effort to research the acoustics of respiration and develop techniques that aid in the diagnosis of respiratory ailments. An expert on auscultation needs to have lot of experience in order to be able to classify the types of sounds and decide on how this information can help in diagnosis or monitoring [4]. Symptoms might be missed, and their severity underestimated by both patients and physicians [5], resulting in proper care not being given. Besides, many evaluation procedures are done on-site with a stethoscope. Other devices such as spirometers can measure respiratory flow and rate, and thorax monitoring equipment with sensors and electrodes can analyze sound signals and waveforms to detect a possible pathology. Nonetheless, these evaluations need to be done on-site Int J Elec & Comp Eng ISSN: 2088-8708  An algorithm for obtaining the frequency and the times of respiratory phases from … (Guillermo Kemper) 359 and with the aid of a medical professional, because it is complicated for a patient to have the required equipment as well as to be able to accurately acquire the signals. In this context, the proposed algorithm aims to be part of a simple and low-cost product that can evaluate a patient either remotely or on-site, regarding the breathing rate and the timings of respiratory phases. This will enable the estimation of aspired/expired air volume in each respiratory phase, the segmentation of waveforms and the analysis of possible pathologies. Thus, this work develops a computational solution for medical applications which require the monitoring, evaluation and on-site or remote tracking of patients with pulmonary ailments such as coronavirus disease 2019 . The objective is to build an easy-to-use application for mobile, desktop, web server or small single-board computer-based systems. It is important to note that, in order to maximize the algorithm's performance, the acoustic signals must be acquired via the low-cost mask equipped with strategically located and connected microphones, which is also described in this work.
The scientific literature presents proposals that aim to solve the described problem. For example, Nam et al. [6] proposed a method which detects air flow with a microphone located below the nose, to estimate the breathing rate by averaging the power of the nasal respiratory signals in each phase. The method uses two spectral analysis techniques on the sound envelope: the Welch periodogram and the autoregressive spectrum. Results show the method is highly precise for the considered breathing rate range (6-90 respirations/min), with a low average error of less than 1%. Nonetheless, the method struggles to detect low frequency components.
Li et al. [7] proposed an algorithm to estimate frequency parameters and characteristics for wheezing detection. It uses the Fisher's linear discriminant analysis algorithm to separate wheezing sounds from regular respiratory sounds, resulting in a sensibility and positive predictive rate of 91.51% and 100% respectively. The algorithm only uses short-term data or a few respiratory sound samples, so the computational complexity is low.
On the other hand, Yahya and Faezipour [8] used a support vector machine (SVM) to automatically detect and distinguish between inspiration and expiration phases without air flow measurements. The respiratory signal is processed via the voiced-dull algorithm to distinguish between a voiced period or a dull period (silence). The features of interest are extracted from each voiced phase and are used to train the SVM, achieving a classification precision of 95%. Nonetheless, this algorithm only works for deep respirations.
Nam et al. [9] presented an algorithm to estimate the resting breathing rate directly from light intensity fluctuations from a pulsed light source captured by a smartphone camera, achieving a precision of 95%. However, precision falls when respiration rate exceeds 30 respirations/min. Meanwhile, Chatterjee et al. [10] develop an algorithm to detect respiratory phases from audio data. They transform the signals into spectral-temporal images and train a wheezing detection model based on convolutional neural networks (CNN), achieving a precision of 96.99%, specificity of 97.96% and a sensibility of 96.08%. Nonetheless, the training dataset is limited which restricts the model to only 2 convolutional layers.
Avalur [11] presented a respiration detection and analysis algorithm which classifies a person's respiration into soft, light, and hard, and estimates the breathing rate with a precision of 94.68%. The main limitations are a high sensitivity to background noise and a reduced performance with high amplitude errors. Furthermore, algorithms for feature extraction and analysis of breath sounds in the temporal, spectral and time-frequency domains are proposed in [12], [13]. Nonetheless, Díaz et al. [12] did not report any sensibility or precision results and in study [13], the algorithm detects respiration peaks with a precision of 94.5%.
Javed et al. [14] presented ResCSRF, which takes 4 signals (nasal flow, thorax, abdomen, and finger oxygen saturation) as input. It first detects corporate social responsibility (CSR) cycles and then calculates the respiratory features (cycle-length, lung-to-periphery circulation time. and time to peak flow). It outputs nightly statistics (mean, median, standard deviation, and percentiles) of these features. It was developed and blindly tested on a group of 49 chronic heart failure patients undergoing overnight in-home unattended respiratory polygraphy recordings].
Islam and Lubecke [15] investigated the feasibility of using the independent components analysisjoint approximation diagonalization of eigen-matrices (ICA-JADE) algorithm with a 24 GHz phase comparison monopulse radar transceiver for separating respiratory signatures from combined mixtures of varied breathing patterns. Meanwhile, Sun et al. [16] proposed an adaptive boosting (AdaBoost) method based on the multi-layer neural network (MLP-NN) to predict the respiratory signal accuracy in our previous study. The experiment results demonstrated that the appropriate AdaBoost method based on MLP-NN could predict the respiratory signal accuracy.
The proposed algorithm in [17] is based on the analysis of the respiratory rate variability (RRV) to detect difficulties in falling asleep. Moreover, a method to provide a quality level of the respiratory signal is also proposed. Both methods have been combined to reduce false alarms due to the changes of measured RRV associated not with drowsiness but body movements. The proposed method in [18] has adopted several vital parameters to quantify respiratory patterns and updated all the baselines dynamically while Saatci and Saatci [19] used Hurst exponent to reveal the fractal properties of respiratory signals and respiratory sound signals and to estimate the pressures in the respiratory system. The combination of well-known statistical signal processing methods and optimization were applied to the experimentally acquired 23 records. Oletic and Bilas [20] proposed quantification of wheezing by a sensor system consisting of a wearable wireless acoustic sensor and smartphone performing respiratory sound classification, may contribute to the diagnosis, long-term control, and lowering treatment costs of asthma.
On the other hand, the aim of Paraschiv and Rotaru's research [21] was to present some related works that have been made in this field and the proposed method for classifying the International Conference on Biomedical and Health Informatics 2017 (ICBHI'17) scientific challenge respiratory sound database. The method included the extraction of features using Mel-frequency cepstral coefficients (MFCC) and computing a CNN to classify the database. The results reveal that the proposed method serves an accuracy of 90.21% which provides a suitable method to faster classify any respiratory sounds collected from different devices.
Wang et al. [22] may contribute to the development of low-cost and non-contact respiratory monitoring products specific to home or work health care. To achieve low-cost and remote measurement of respiratory signal, a red-green-blue (RGB) camera collaborated with marker tracking is used as data acquisition sensor, and a Raspberry Pi is used as data processing platform. To overcome challenges in actual applications, the signal processing algorithms are designed for removing sudden body movements and smoothing the raw signal.
Bokov et al. [23] used an SVM classifier to perform wheezing detection. The signals were obtained with a single microphone (SP0410HR5H-PB) which recorded mouth breath sounds. A total of 95 recordings were collected, with 27 of them containing wheezes. 70 recordings with wheezes in 20 of them, were used to train the SVM classifier while the rest were used to test the classifier. Spectral-based features were used for the classifier. The recordings were divided into segments and the features were extracted from each frame of the segmented recordings. Using this method, 71.4% sensitivity and 88.9% specificity were achieved on the validation set at the recording level.
Finally, Sierra et al. [24] developed a non-invasive method for continuously monitoring respiratory rate (RR) based on tracheal sounds. Tracheal sounds were acquired using a contact piezoelectric sensor placed on the examinee's throat and analyzed using a combined investigation of the sound envelope and frequency content. RR estimates were compared to reference measurements taken from a pneumotachometer coupled to a face mask worn by the examinee. RR was also manually counted by a respiratory technician. Two types of breathing (mouth and nose) and three different positions were studied. RR estimated in volunteers had a success rate of 96%, a correlation coefficient of 0.99 and a standard error of the estimate of 0.56.
As can be seen, there are different proposals that aim to detect characteristics and pathologies from the mouth or nasal sound of breathing, applying in most cases signal processing techniques and artificial intelligence. Moreover, different algorithms work in portable devices such as smartphones or tablets due to their low complexity. The results obtained in the scientific literature have been satisfactory although in some cases they were only evaluated with signals available in databases. Also, many algorithms have limitations due to noise and distortion generated in the signal acquisition process.
In this context, the main contribution of the proposed work is focused on the signal acquisition method (buccal or nasal), with the objective of improving the breathing rate and timing interval estimation precision. Moreover, the acquisition method employs a KN95 mask and electret microphones, which are a low-cost solution for patients to acquire the signals by themselves via a microphone, personal computer, or portable device without the aid of medical professionals. Another contribution is that the proposed algorithm has a very low complexity level and can be easily ported to mobile and portable devices without the need of high-performance processors or expensive memory requirements. The algorithm's validation was developed with 32 signals corresponding to 3 different people. The average maximum relative error for different breathing rates and periods was 2.23%, while the same metric for expiration and inspiration timing intervals was 2.81% and 3.47% respectively. Results are satisfactory and show a good performance of the proposed method. The following sections describe the parts and stages that are part of the method, as well as report on the results and conclusions. Figure 1 shows the block diagram of the proposed algorithmic method. The first two blocks (signal acquisition and signal segmentation) are explained jointly in section 2.

Signal acquisition and segmentation
As previously stated, the respiration acoustic signal is acquired via electret microphones installed on a KN95 mask, as shown in Figure 2. Figure 2(a) shows a drawing of a person using the mask. Figures 2(b) and 2(c) show the acquisition device in nasal configuration and buccal configuration respectively. Finally, Figure 2(d) shows the schematic diagram of the microphone electrical circuit. Note that each microphone stands on a small structure designed to support it, which allows adequately fixing them on the KN95 mask. The microphone placement is crucial, because it enables acquiring an adequate acoustic signal to extract the breathing rate and the inspiration and expiration timings.  Microphone placement must satisfy the following requirements: microphone 1 must always be located to one of the sides of the mask as shown in Figures 2(a) and 2(b), while microphone 2 must be located facing one of the orifices through which the person will expire (nasal or buccal configuration). Figure 2(d) shows the electrical circuit diagram of the microphones, which are then connected to the input of an audio card. In this case, a personal computer card can be used, as well as that of a mobile phone or an audio USB adapter for a reduced board computer. The digitization parameters were a sampling frequency of fs=8000 Hz, 16 bits per sample and mono audio channel. The use of the lowest possible sampling frequency available in audio cards is justified because the maximum normal breathing rate must be close to 25 cycles/minute (0.41 Hz) while it must be close to 60 cycles/minute (1 Hz) for a baby [25]. Figure 3 shows an example of an acoustic respiration signal ( ) acquired through the microphone array device. The expiration signal will always have a larger amplitude than the inspiration signal due to microphone 2, which is an important attribute to distinguish between each respiratory phase.   363 acquired respiration signal, it makes it difficult to differentiate the inspiration and expiration phases [26]. All this ends up significantly affecting the performance of the proposed method (based on envelope detection), so it is essential to use the acquisition mask, in order to maximize the signal-to-noise ratio (SNR) of the acquired signal and obtain results and measurements with a very low percentage of error. The mask equipped with the electret microphones, microphone support structures, audio cables and the 3.5 mm audio jack have an estimated price of US$5.00, which is an affordable price for a patient, considering that the device is of strictly personal use due to health reasons. Figure 3 shows that the inspiration and expiration periods and timings are easily detectable from the signal envelope. Starting from this observation, two signal envelopes were extracted. The first one highlights the expiration timings and the signal periodicity. The second one highlights the same parameters but for the inspiration signal. This last step is important to reduce the error in timing detection of inspiration phase signals.

Obtaining the signal envelope
The envelope extraction procedure is explained in the following sections: Step 1, the input signal ( ) is processed block-by-block with a block duration of = 40 seconds, which translates into a total of samples per block of = × = 8,000 × 40 = 320,000. Although the block time duration can be configured, a minimum of 40 seconds is recommended because an appropriate detection of respiratory parameters requires an adequate quantity of signal periods. A signal block ( ) corresponding to block can be expressed as (1) [27].
Where is the number of blocks in the acquired signal.
Step 2, a linear phase finite impulse response (FIR) filter with impulse response ℎ1( ) is used to filter signal block ( ) to obtain its envelope and decimate the sampling frequency by a factor of = 200. This required a cutoff frequency of 1 = / and a very high filter order of 4000 due to the narrowband nature of the signal. Decimation enabled the decrease of the number of samples to process and reduced the fundamental Nyquist range to 20 Hz], which is enough for the low frequency respiratory signals. The decimated envelope signal 2 ( ) can be expressed as (2) [27].
Where 1 ( ) results from the convolution: In this case, if = 320,000, then = 1,600. The sampling frequency after decimation is 1 = = 8000 200 = 40 . Sometimes, undesired high frequency undulations may compromise the interval detection for expiration signals. In these cases, signal 2 ( ) was filtered by a low-pass filter with impulse response ℎ2( ), order 4,000 and cut-off frequency of 2 = /10. The resulting signal can be expressed by (5). Figure 6 shows 3 ( ) for the signal in Figure 3. Figure 6 also shows that the expiration intervals are evident in 3 ( ), which will later help in detecting them. The inspiration phase signals may present a high attenuation, which hampers their detection in 3 ( ). Thus, a fourth signal envelope was computed using a FIR filter with impulse response ℎ3( ), order 4,000 and a cut-off frequency of 3 = 2 /5. This larger bandwidth resulted in a filtered envelope with a larger amplitude and detail for the inspiration phase. The signal can be expressed as (6). Figure 7 shows 4 ( ) for the signal in Figure 3. Note the larger y-axis with respect to Figure 6, such that the inspiration phase has a larger amplitude and detail.   Figure 3

Obtaining the breathing rate via FFT
The breathing rate was computed through the discrete Fourier transform (DFT) of envelope 3 ( ) because it shows the breathing rate more clearly through the expiration phases. Firstly, the DC component of the signal was eliminated, since the detection consists of finding the largest frequency component at a frequency different than zero. A Hamming window was used to reduce the block effects, since the transform is applied to the envelope of each signal block. The transformation was computed by the fast Fourier transform (FFT). The modulus of the frequency spectrum in the Nyquist interval can be expressed as (7) [28]. In this case, due to = 1,600, a transform size of = 2,048 was used so that it could be computed via FFT ( being a power of 2). The 0 position corresponds to the largest frequency component and must fulfill the following condition: The computation of the breathing rate 0 (Hz) for block can be expressed as: And the breathing period 0 in seconds will be: For every acquired signal, the average breathing period (in seconds) is expressed by (14).
(14) Figure 8 shows the spectrum 3 ( ) in Hz, for the signal envelope 3 ( ) in Figure 6. Finally, the breathing rate for block in cycles per minute is (15).
For a complete acquired signal, the average breathing rate will be: A look-up is done in the spectrum, starting from position 0 and looking up to = 6 samples (corresponding to about 6 cycles per minute) in search of an amplitude higher than 0.6 3 ( 0 ). If such an amplitude is found, 0 is updated with the location of this new peak. This is done because, sometimes, the inspiration signal is larger than the expiration signal, which causes the 2 nd spectrum harmonic component to be greater than the fundamental frequency component, which correctly defines the breathing rate. Figure 8. Magnitude frequency spectrum from 3 ( ), which was obtained from envelope shown in Figure 4

Obtaining average expiration time
Envelope 3 ( ) is used to identify and segment the expiration interval. This envelope clearly shows the signal peaks thanks to the microphone distribution during signal acquisition. The algorithmic detection steps are explained.
Step 1: The number of expiration peaks in a signal block is estimated through the block time duration and the calculated breathing rate 0: where ( ) returns the lowest integer value closest to .
Step 2: A counter is started in = 0 and a new discrete signal is generated, 33 ( ) = 3 ( ).
Step 3: The position of the minimum value of 3 ( ) is obtained, .
Step 4: The position of the minimum value of 33 ( ) is obtained, . Step 5: Position 1 is obtained, corresponding to the minimum value of 3 ( ) between = + 1 and = − 1.
Step 10: The values of 3 ( ) are looked-up towards the left (decreasing ), from = until finding the first position where 3 ( ) < 2. This position will be expressed as 1 ( ).
Step 12: If ≤ then = + 1 and the procedure is repeated from step 4. Else, the procedure continues to step 13.
Step 13: 1 y 2 are sorted from smallest to largest. Figure 9 shows 3 ( ) with labels in the positions specified by 1 and 2 . A red label means the beginning of a respiratory phase and a blue label means its end.  Finally, the average expiration time for the whole acquired signal is:

Obtaining average inspiration time
Signal 4 ( ) is used to identify and segment each inspiration interval, using position 1 ( ) as the starting sample and position 2 ( ) as the end sample for each expiration phase detected in the previous procedure. As previously stated, envelope 4 ( ) has more detail in comparison to 3 ( ), which contributes to adequately detecting each inspiration interval. The procedure is described in the following section.
Step 2: The position of the maximum value in 4 ( ) located between 2 ( ) and 2 ( + 1) is stored in position . Then, the position of the minimum value in 4 ( ) located between and 2 ( + 1) is stored in position 1. Finally, the position of the minimum value in 4 ( ) located between 2 ( ) and is stored in position 2.
Step 3: A minimum decision threshold 1 is computed for the sample interval between and 1 ( + 1): Step 4: A minimum decision threshold 2 is computed for the sample interval between 2 ( ) and : Step 5: A look-up is performed in envelope 4 ( ), starting from = towards the right until finding the first position where 4 ( ) < 1. This position is stored in 4 ( ).
Step 6: A look-up is performed in envelope 4 ( ), starting from = towards the left until finding the first position where 4 ( ) < 2. This position is stored in 3 ( ).
Step 7: If ≤ − 1, then = + 1 and the procedure repeats from step 2. Else, the procedure continues towards step 8. Figure 10 shows 4 ( ) with red markers at the start and blue markers at the end of the detected expiration intervals. Moreover, there are yellow markers for the start and cyan markers for the end of the detected inspiration phases. A valid inspiration phase consists of a detected inspiration phase located between two consecutive valid expiration phases. Thus, there will always be one detected inspiration phase less than the number of detected expiration phases ( = 0,1, … , − 2).
Step 8: The estimation of the average inspiration time for block is then: Finally, the average inspiration time for the complete signal can be expressed as (29).
As previously mentioned, the aspired and expired air volumes can be estimated from the expiration and inspiration times. Positions 1 ( ), 2 ( ), 3 ( ) and 4 ( ) serve as timestamps to obtain interval masks on signal block ( ). Thus, a medical specialist can visually verify the timing in each respiration phase. The expiration mask ( ) is: The inspiration mask ( ) is The values 0.7, 0.35 and -0.2 were chosen to adequately visualize ( ) and ( ) when plotted next to ( ). Figure 11 shows the graph of ( ) in blue, ( ) in red and ( ) in green. These masks can be used to segment the waveform for each respiration phase in each period. Thus, future works can use this segmentation for feature extraction and pathology detection via respiratory nasal or buccal acoustic signals.

RESULTS AND DISCUSSION
Three signal acquisition masks were built for algorithm validation purposes. Three people using each their own mask generated the signal samples. A multimedia card and a personal computer were employed to acquire the signals. Person 1 generated 13 signals, Person 2 generated 12 signals and Person 3 generated 7 signals. Each person was adequately instructed regarding the following procedure: Step 1: The person puts the mask on.
Step 2: A verification of correct mask placement is performed.
Step 3: The mask is connected to the audio card in the personal computer.
Step 4: Three seconds of a "silence" signal is captured by asking the person to not breathe. This serves a "floor" noise level.
Step 5: The person is indicated that the acquisition is starting, and they are asked to breath at a low, medium, or high frequency. This was accorded previously with each person.
Step 6: The signal acquisition process is performed for 80 seconds, to obtain two 40 second blocks with enough periods to get well defined frequency components. After building the dataset, each signal was visually inspected and the breathing period, inspiration and expiration timings and the breathing rate were manually labeled in the signal. The beginning and ending of each time interval employed the floor noise level estimated in step 4 of the acquisition procedure.
, and , the average time and frequency values of the algorithm, were manually determined for each signal. Performing the validation with a spirometer was not feasible and thus this option was discarded. It required the simultaneous acquisition of the sound and the air flow emitted by the person, which was quite difficult. The relative error percentage was used to compare the algorithm results with the manual measurements [29], where is the value estimated by the proposed algorithm and is the value obtained via the manual method. Tables 1, 2, and 3 show the results for each person. Figures 12, 13, and 14 show the relative error graphs for each person and signal.  The following statements can be derived from the results: i) the relative error for times and frequencies spans a range from 0% to 7%. This is highly satisfactory since it translates to a minimum precision of 93%; ii) using two microphones for the acquisition method positively impacted the quality of the obtained results; iii) verifiably, the relative error is independent from the breathing rate of each person and from the inspiration and expiration time intervals; iv) in most cases, the inspiration times present the highest errors (although less than 7%). This can be explained because inspiration times are shorter than expiration times and much more susceptible to noise and distortion; v) the microphone acquisition circuit did not use any passive elements to minimize noise or distortion effects and the mask cost. Nonetheless, this circuit could be modified to evaluate if this can reduce the range of the relative error; and vi) finally, Table 4 shows the average relative error percentages for each person. For , the maximum average relative error percentage is

CONCLUSION
The proposed algorithm was developed in Python and implemented in both a web server and a small single-board computer. Regarding the server implementation, the signal was acquired through an Android smartphone connected to the acquisition mask via a purpose-built application. In it, the person registers as a patient, configures the acquisition time and starts the acquisition in a low-noise, relaxed place. The signal is stored in a waveform audio (WAV) file. Then, it is uploaded to the server with the date, time, and personal data of the patient. The server processes the signal and registers the results. These results can be viewed in both a desktop and mobile application. A medical professional can then visualize these results and add notes with the diagnostic and commentaries. In this implementation, the system can be employed for remotely monitoring patients with respiratory ailments and aid in their correct control without the need of commuting to the health establishment.
Regarding the single-board computer implementation, it serves as a first step to develop portable respiratory evaluation equipment that patients can use in health establishments. The successful results motivated the development of the stated applications. Finally, the time masks which the algorithm outputs can be used by future projects which aim to identify respiratory pathologies or perform estimations of the volume of inspired air and expired air as determined by a spirometer. Thus, the developed applications can have new functionalities to aid medical professionals in reducing their time to diagnosis.