Bivariate modified hotelling’s T2 charts using bootstrap data

The conventional Hotelling’s  charts are evidently inefficient as it has resulted in disorganized data with outliers, and therefore, this study proposed the application of a novel alternative robust Hotelling’s  charts approach. For the robust scale estimator , this approach encompasses the use of the Hodges-Lehmann vector and the covariance matrix in place of the arithmetic mean vector and the covariance matrix, respectively.  The proposed chart was examined performance wise. For the purpose, simulated bivariate bootstrap datasets were used in two conditions, namely independent variables and dependent variables. Then, assessment was made to the modified chart in terms of its robustness. For the purpose, the likelihood of outliers’ detection and false alarms were computed. From the outcomes from the computations made, the proposed charts demonstrated superiority over the conventional ones for all the cases tested.


INTRODUCTION
In manufacturing, Statistical control charts have been known as the most tool for monitoring the process of production. In monitoring the characteristics of product quality, in the beginning, the employment of control charts was facilitated. Considering the existence of various characteristics of quality in product quality determination, this approach is insufficient in terms of practicality. Nonetheless, multivariate control charts (MVCC) with the capacity in identifying the changes in covariance matrix Σ and the mean vector μ in order to achieve optimal performance of the product [1,2].
The Hotelling's 2 T chart is among the most common MVCC methods. [3,4]. With the capacity in detecting multiple outliers, mean shifts and deviations in the dispersal of control distribution [5]. The statistic 2 i T employs the estimators x and S which are directly impacted by the presence of outliers in the case of false alarms, resulting in failure in imposing control in the processes of production. The increase in the complexity of the manufacturing of the products in addition to their characteristics that generally contain outliers, contribute to the failure of the chart in performing its designated task.
In order to overcome the impact of outliers on the formed control chart, the application of robust estimators would be an appropriate solution. These robust estimators should be employed in place of the mean vector x and the variance covariance matrix S as in the conventional Hotelling's 2 T chart. Accordingly, a chart is considered robust if it could produce strong reaction to the changes to the production process, and this reaction is compelled by the controlled false alarms and the probabilities values of detection outliers are large enough and tends to 100%. The bootstrap method encompasses a nonparametric technique due to its independence from the presumptions of data parametric distribution. Nonetheless, in the monitoring of a single process, this technique can generate single variable control charts. In this regard, Phaladiganon et al. in [6] mentioned the possibility in integrating multivariate control charts with the charts of the bootstrap technique that have proven their effectiveness.
Considering that it is possible to gather sample that is small in size, violation to the normality assumption distribution is possible. Besides that, in general, estimation to the in-control state of the control charts has to be carried out, but this will have adverse impact on the performance of the control chart. As indicated in Mostajeran et al. in [7], non-parametric bootstrap control charts are appropriate for an unidentified distribution or when making estimation on the process parameters from Phase I dataset or when it is impractical to gather sample of large size.
For observation purpose, control charts generally require normal distribution. Non-parametric control charts including charts of sign control are appropriate for non-normal distributions case. For this situation, the parameters of control chart could be computed with the use of the algorithm of non-parametric bootstrap. In the situation where assumptions of distribution are not required, original observations could be employed.
Jones and William in [8] are among those who have applied bootstrap in the formation of the control charts. In their study, bootstrap was described as a statistical technique which employs power of computing in place of the conventional parametric assumption. The proposed control chart was presented alongside the extensive results of computer simulation, and each control chart was assessed performance wise according to the average length of run.
Niaki and Abbasi in [9], a novel bootstrap-based methodology for deriving the limits of control on the attributes was proposed and formulated. The use of the methodology allows the simultaneous creation of confidence intervals on the attributes. The performance of the proposed method was then examined, in accordance with the in-control and out-of-control average run length criteria. The authors also made a simulation based comparison with a comparable work performed by Bonferroni and Sidak, and the results of the proposed method appeared to be better. Lastly for attributes, the authors made comparison between the bootstrap method and the T 2 control chart.
The application of a bootstrap-based multivariate T 2 control chart was demonstrated in Phaladiganon et al. in [6]. This chart can competently monitor a process in data distribution that is non-normal or unknown. With the application of a simulation study, the authors evaluated the performance of the control chart proposed in their study. The kernel density estimation (KDE)-based T 2 control chart and the conventional Hotelling's T 2 control chart were compared in terms of performance, and from the results of the simulation study, the proposed method demonstrated better performance as opposed to the conventional T 2 control chart. As opposed to the KDE-based T 2 control chart, the proposed method shows comparable performance.
Gandy and Kvaløy in [10] proposed a method grounded upon the bootstrapping concept, where the data were bootstrapped and then employed in the estimation of the in-control state. The use of this method appears to be appropriate for diverse types of control charts. It is also applicable for charts that are based upon regression models. For non-parametric bootstrap, this method is deemed as robust. The author employed large sample properties of the adjustment. The advantages of using the proposed approach were demonstrated using a simulation study.
Edopka and Ogbeide in [11], the authors employed a non-parametric approach in the assessment of the cumulative sum (Cusum) and the exponentially Weighted Moving Average (EWMA) control limits for certain dataset. In the determination of the control limits, the authors employed the underlying dataset conditional distribution. In evaluating the control limits and also in identifying the in-control and out of control of the distribution, the authors applied the method of bootstrap. Here, there was no rigid assumption, for instance, the normality condition for the statistical process control to be dispersed.
In Mostajeran et al. [1], the authors demonstrated the application of a new bootstrap algorithm in the construction of Hotelling's T 2 control chart. In assessing the performance of the proposed method, the authors employed a simulation study. Then, the authors made a comparison between the results of the proposed method and those obtained from the conventional Hotelling's T 2 control chart and also the results of bootstrap reported by Phaladiganon with the application of in-control and out-of-control average run lengths respectively represented by ARL0 and ARL1,.
In Mostajeran et al. [7], the authors presented the use of non-parametric bootstrap multivariate control charts |S|, W, and G, and this method is grounded upon the use of bootstrapped data in the estimation of the in-control state. In this study, the authors succeeded in obtaining satisfactory performance of bootstrap  [12] demonstrated the application of a bootstrap multivariate control chart and compared it with a Hotelling's T 2 parametric multivariate control chart, a multivariate sign control chart, and a multivariate Wilcoxon control chart. A simulation study was employed for the purpose.
This study attempts to improve the performance of Hotelling's 2 T chart and therefore, a new method is proposed. In particular, modification on the sensitivities towards outliers is to be carried out. Further, in the construction of the new methodology, this study applies the robust estimator of location as follows: the Hodges-Lehmann estimator and the covariance matrix of the robust scale estimator . Meanwhile, in resampling the data from the normally distributed data, the method of bootstrap is employed. Accordingly, the concept of Hodges-Lehmann estimator and the properties of the scale estimator of will be highlighted in the next section (Section 2). This is followed by the description of the construction of the Hotelling's 2 T charts. Then, the ensuing section (Section 4) will describe the findings of simulation in a summary form. The final section concludes the paper.

ROBUST LOCATION AND SCALE ESTIMATORS
This paper demonstrates the application of a novel robust location estimator and three robust scale estimators. Aside from allowing easy implementation in the calculation and construction of the Hotelling's 2 T chart, these methods appear to be appropriate technically when dealing with multivariate data.
The following section highlights the properties of each estimator.

Robust location estimator: hodges-lehmann estimator
The location estimation for a sample containing n observations was first introduced in Hodges and Lehmann (1963). This estimator takes a median of the averages of the ) 1 ( 2 1  n n potential observation pairs. As provided by Brown and Kildea (1978), the estimator is defined as follows: "A simple Hodges-Lehmann estimator for that and an asymptotically equivalent estimator n ˆ is ." The significance of properties of this location estimator has 29% breakdown, symmetric about the parameter θ, about 0.955 asymptotic relative efficiency and it requires O ( 2 n ) operation at minimum.

2.2.
Robust scale estimator: n S In Rousseeuw and Croux [13], the estimator n S for the sample n x ,..., x 1 was defined as follows: Where: 1926 denotes a correction factor in making Sn unbiased for predetermined samples. For Sn, its primary properties are as follows: carries 50% maximum breakdown, 58% efficient at normal distribution, limited function of influence, and encompasses an affine equivariance estimator. Accordingly, the work by Rousseeuw and Croux [13] presented more specifics regarding Sn. 1. For each column within the n x p matrix, compute the Hodges-Lehmann estimator. 2. Then, the robust variance covariance matrices are computed, follows the steps as described below:

CONSTRUCTION OF THE ROBUST HOTELLING'S
Considering the symmetric characteristic of variance covariance matrix of , the main diagonal encompasses the matrix of sample variance covariance signified by = 2 where j=1 and 2, as demonstrated in [14][15][16][17][18]. For other elements of this matrix, they encompass the covariance between each pair of two variables including  (2) 3. Finally, the new charts of the proposed Hotelling 2 are formed using the equation below: Evaluated was made to the proposed robust Hotelling's 2 T chart with the application of simulated datasets in 5000 replications. As for the simulation, it follows the settings as follows: a. The general likelihood of false alarm is established at α = 0.05, b. The number of variables encompasses p = 2, and, c. The sizes of sample n = 20, 30, 40, 50 and 100. Meanwhile, the chart is formed and assessed in two phases as follows: a. Phase I produces 5000 datasets from N p (0, I p ) in two circumstances, that is, Case A includes independent variables, while Case B contains dependent variables. Further, the estimators of Hodges and Lehmann (HL) and the robust scale covariance matrix for for the conventional and robust charts are calculated. b. Phase II includes the creation of fresh observation for each dataset in order to allow the performance of assessment. The performances of the new robust chart is evaluated concerning its false alarms and its likelihood of detecting outliers, and such performances are equivalent to the fraction of the amount of values of robust statistics for new observations which are greater than the upper control limit (UCL) to the amount of replications (5000). For all processes of computation, they are executed with MATLAB version 2015.

RESULTS
The outcomes generated by the conventional Hotelling's 2 As also can be observed, the modified Hotelling's As can be construed from the generated result, the modified chart is more robust in reacting to changes in the process of production. For Case B that contains dependent variables as shown in Table 2, the rates of false alarms and those of outliers' detection of the robust charts appear to be superior compared to the exact rates in the conventional chart when there are outliers notwithstanding the n, and . Notably, the false alarms rates decreases with the increase of the sample size (n). Also, the probability detection rates appear to be increasing nearly 100%.

EMPIRICAL CASE
We used the example from Vargas, Queensberry data sets in order to compare and evaluate results of the performance of both the conventional and modified control charts. Their data comprises of two characteristics, random variables, namely 1 X and 2 X on 30 different products taken from the production process. In Vargas, Queensberry data set's two variables were used. The observations of both random variables are shown in Table 3 (Appendix). The table also shows the values of the new Hotelling's T 2 statistics along with the conventional T 2 statistics.
We calculated the UCL using the simulation for the robust and the conventional T 2 charts to be 8.03 and 6.4619 respectively. We set the value of all UCL for the robust charts and the conventional for α=0.05. This case has false alarm probability with 30 observations. The final results show that in the case of conventional chart, the production process is not in control at two observations, the second and twentieth observations, whereas the process is out of control only on second observation in case of robust charts.

CONCLUSION AND DISCUSSION
The modified robust alternatives Hotelling's 2 chart demonstrates superiority in performance as opposed to the conventional Hotelling's 2 chart particularly concerning false alarms. Also, as opposed to the conventional Hotelling's 2 chart, the modified robust alternatives Hotelling's 2 chart appears better at outliers detection.