Modified hotelling’s TT22 control charts using modified mahalanobis distance

Received Apr 12, 2020 Revised Jun 13, 2020 Accepted Jul 5, 2020 This paper proposed new adjusted Hotelling’s TT2 control chart for individual observations. For this objective, bootstrap method for producing the individual observations were employed. To do so, both arithmetic mean vector and the covariance matrix in the traditional Hotelling’s TT2 chart were substituted by the trimmed mean vector and the covariance matrix of the robust scale estimators Qn, respectively which, in turn, its performance is carried out by simulated. In fact, the calculation of false alarms and the probability of detection outlier is used for determining the validity of this modified chart. The findings revealed a considerable significance in its performance.


INTRODUCTION
Statistical process control charts employ statistical implements to notice the accomplishment of the production process. During the process of practicing, more than one quality feature is defined by the overall quality of a product. Thus, the quality of a certain type of any product may be defined by degree of hardness, thickness, weight, width and length etc. So, various features of a manufactured component require simultaneous monitoring. Hence, it can be used both; the multivariate Shewhart-type 2 or the Hotelling's 2 control charts might be used.
Hotelling's 2 statistic is one of the most public methods in the multivariate statistical control charts [1,2]. Furthermore, the Hotelling's 2 statistic is the multivariate generalization of the Student's t-statistic. Hence, the Hotelling's 2 statistic is the expanded case after taking the square for the two sides of the equation of the Student's t-statistic. In other words, Hotelling 2 statistic is equal to.
Where � and are the sample mean vector and the p×p covariance matrix in succession. Several values among the measurements of the characteristics make the traditional Hotelling's 2 ineffective, although the Hotelling's 2 control charts are effective and appropriate when the data are taken from normal distribution. As a result, Mostajeran, Iranpanah and Noorossana in [3] pointed out that employ non-parametric bootstrap control charts for unknown distribution. Where non-parametric bootstrap control charts are suitable when the employ of large size deems impossible in addition to evaluate the process parameters from the Phase I. Generally speaking, normal distribution is required for control charts for permitting observation. In respect of cases with non-normal distributions, the usage of non-parametric control charts containing charts of sign control will be applicable. It is worth mentioning that the algorithm of non-parametric bootstrap in this study is employed for calculating the control chart parameters. By and large, original observations might be employed in cases that do not entail any distribution assumption. Similarly, robust statistics were employed rather than sensitive statistics in the Hotelling 2 chart. Such methods were regarded efficient for overcoming poor performance issue in the existence of extreme values in product features.
There are several previous studies that employed this technique and the majority of them are rendered in the above-mentioned studies relating with improving this model, see [4]. The abovementioned studies that employed the bootstrap samples, the location, and scale estimators, the trimmed mean and the trimmed covariance were mentioned. One of which [5] who used the so-called trimmed mean and trimmed covariance matrix as an alternative of arithmetic mean and covariance matrix, respectively. Surtihadi in [6] employed median as a robust location estimator when he constructed robust bivariate sign tests of Blumen and Hodges. Alfaro and Ortega in [7] suggested a new alternative chart by substituting the sample mean vector with trimmed mean vector and the covariance matrix by trimmed covariance matrix.
Jones in [8] used in control chart constructions, bootstrap rather than the traditional parametric assumption. Bootstrap methods use computing power. Phaladiganon et al. in [9] employed a bootstrap-based multivariate T 2 control in their study and indicated the capacity of the chart in observing a process in non-normal or unknown distribution of data. The author uses a simulation study to assess the performance of the control chart.
The method provided in [10] depended on the notion of bootstrapping. For this objective, the authors bootstrapped the data, and then such data were applied on in-control state estimation. The usage of a non-parametric approach demonstrated in [11] in conducting estimation on the cumulative sum (Cusum) as well as the exponentially weighted moving average (EWMA) control limits on a given dataset. The usage of an innovative bootstrap algorithm was revealed in [12] in the Hotelling's T 2 control chart creation.
Applying nonparametric bootstrap multivariate control charts |S|, W, and G depends on bootstrapped data utilization in evaluating the in control state that was discussed in [3]. The findings reveal that the bootstrap control charts achieved reasonable performance.
Similarly, in [13] indicated the application of a bootstrap multivariate control chart following Hotelling's T 2 statistic. The author in [14] modified three robust Hotelling's T 2 charts by replacing the mean vector and the covariance matrix by the trimmed estimators. The trimming was done using the modified Mahalanobis distance, where the location estimator is the median and the scale estimator is one of the robust scale estimators MAD n , S n and n .
Tukey and McLaughlin in [15] proposed new substitute Bivariate robust Hotelling's T 2 chart. By Exchanged the arithmetic mean with winsorized modified one step M-estimator (wMOM) vector and substituted the sensitive covariance matrix with the covariance matrix of robust scale estimator n respectively.
The current study aims at enhancing Hotelling's 2 T chart in terms of its functioning. Thus, a novel method was suggested, which contains modification to the sensitivities towards outliers. There are a wide spread choice of robust location and scale estimators that might be regarded in this issue. This research seeks to develop the function of Hotelling T 2 chart by replacing the mean vector and the covariance matrix with trimmed mean and its corresponding covariance matrix following [5]. The trimmed mean based on robust scale estimator Qn. To appraise the function of the new modified robust Hotelling T 2 control chart, the outliers are inserted in the data, which are coming from the standard normal distribution by using bootstrap method in generating the samples, then calculate the false alarms and the probability of detection of outliers as a technique to judge the function of the modified chart. The following section (section 2) presents the details of the research methods. After that, section 3 presents the results and discussion. followed by the conclusion in section 4.

RESEARCH METHOD 2.1. Trimming method
The covariance matrix is influenced by the emergence of the outliers. As a result, we used the trimmed variance-covariance matrix as an exchange for the covariance matrix. The computation of such an estimator is based on the winsorized covariance matrix. So, the winsorized variance-covariance matrix is used to calculate the covariance matrix for the winsorized sample. In order to produce the modified Hotelling  2 control charts it has been used the robust location estimator the trimmed mean and the trimmed variancecovariance matrix. The trimming of the outliers data are produced by implementing the method of modified Mahalanobis distance, such as the modification approved out by replacing the sample mean vector in the Mahalanobis distance formula by the median vector and replacing variance-covariance matrix, by the robust scale variance-covariance matrices of Qn [16]. The trimming and replacement of the data to obtain the winsorized sample are achieved by employing the values of modified Mahalonobis distance that the trimming dependent on the percentage 20% from each end. The natural way of Mahalanobis distance. In the studies of [5,17], each end from outlier observation was trimmed the data that are provided by the largest two values of Mahalanobis distance. Meanwhile, this technique is appropriate for dealing with subgroup observations, but our case is concerned with the bootstrap individual observations. As such, the percentage for trimming is more appropriate. The authors in [16] suggested that the best percentage is 20%-25% in a symmetric distribution. The authors in [18,19] proposed to trim 20% from each tail of the data. The author in [19] proposed the best percentage to trim 20% for each side. Consequently, the percentage of trimming in this paper for each end is 20%.

Constructing control charts
Substituting arithmetic mean vector � by the alternative robust location estimator, the trimmed mean � and substitute the sample variance covariance matrix S by the trimmed variance covariance matrices each one of them relies on the robust scale estimator . The computation of the trimmed covariance matrix requires calculating the winsorized covariance matrix before. The winsorized variance covariance matrix is the variance covariance matrix for the winsorized sample. Its sample is achieved by using some techniques to trim the outliers from the data. The technique of Mahalanobis distance is employed in such study. The formula of Mahalanobis distance relies on the arithmetic mean vector and the variance covariance matrix. This paper modified the Mahalanobis distance by modifying the arithmetic mean by the robust location estimator the median and modifying the variance covariance matrix by the covariance matrix of the robust scale estimators [20]. In respect to the type of robust scale estimator the winsorized sample is achieved. Thus, the type of winsorized sample is formed. With regard to such winsorized sample, one trimmed mean has been calculated X � tQn and one trimmed variance covariance matrix , had been calculated. Based on this type of winsorized sample, the Modified Hotelling's 2 control chart is structured as follows: − Defining the adjusted Mahalanobis distance values ∆ for each vector X i1,….., X ip in the individual bootstrap data groups, where i = 1, … , n, and p number of variables. − Organizing the values of modified Mahalanobis distance orderly and according to [16] who stated that the best percentage of trimming is 20%-25% from each end in symmetric distribution. The authors [18,19] proposed trimming 20% from each tail of the data. Thus, we employed the 40% as a trimming percentage.
where is the winsorized covariance matrix, n t is the number of the rest data after trimming. The diagonal components in the above mentioned trimmed variance covariance matrix are the sample-trimmed variance where = 2 and the other elements are the sample trimmed covariance matrix of the two vectors , , and is computed as follows: a) Compute t �X j �, t �X g �; j=1,…,p, g=1,…,p and, ≠ . 287 b) Compute the spearman rank correlation between and , denoted by corr ( , ) because this type of correlation is robust against the extreme data [21]. c) The sample covariance between the variables and is computed according to the following formula: − Compute the opposite of the sample trimmed standard covariance matrix for , , which is manifested by −1 . − Defining the modified Hotelling's 2 control chart by substituting the sample mean in the traditional Hotelling's 2 control chart by the robust location estimators � ,, and substitute the opposite of the sample standard covariance matrix in the traditional Hotelling's 2 control chart by the opposite robust scale covariance matrices S tQn −1 . After that, the modified Hotelling's T 2 control chart anchored on this form:

Independent and dependent variables
This study touched upon the independent and dependent cases which are acquired by Case A and Case B respectively. The following formulas clarify the contamination model of the mixture normal distribution: To illustrate, ε is the percentage of the outliers. N p (µ 0 , Σ 0 ) is the in control distribution and the parameters µ 0 , Σ 0 are called in control parameters. Meanwhile, the distribution N p (µ 1 , Σ 1 ) is the out of control distribution and the parameters µ 1 , Σ 1 are called out of control parameters. Two cases of the variables shall be taken when such variables are ought to be independent and called (A) while the dependent variables are called as (B). The following formula clarify that.

Case (A):
(1-ε)N p �0, I p � + ε N p (µ 1 , I P ) Without loss of generality, the in control means vector µ 0 is 0, while the variance covariance matrix in the in control and out of control distributions is equal to the Identity matrix . The authors in [22,23] indicates that the variance covariance matrix is regarded as a homogenous variance covariance matrix with 1 for the main diagonal and 0 for the other elements in the matrix that considers that without correlation among the variables. However, as the value of the out of control parameter µ depends on the non-centrality parameter as follows: where µ a vector is manifesting the amount of the shift for the mean vector. The larger value of the non-centrality parameter stands for larger extreme outliers. As such, according to many statisticians such as [7,24,25] they took the following values for the non-centrality parameter µ like 0 (when there is no alteration), 3 and 5 for obtaining more extreme outliers.

Case (B):
( In control parameter, without loss in generality, stands for vector µ 0 is equal to 0 while the variance covariance matrix for the in control and out of control distributions are equivalent (i.e. 0 = 1 = 0 ). 0 is a homogenous variance covariance matrix of size × with high level of correlation between the variables. For example, the components of the main diagonal in 0 are 1 and the other elements are 0.9 [23,26]. However, the out of control parameter µ 1 receives the values 0 (without alteration) and 5 (when there are  [27] whenever the Hotelling's 2 statistics is out of control, this clarifies that the correlation among the variables has changed. In case (B) there is a high correlation between the variables, during that, it can be measured whether the alteration in the correlation between the variables impact on the values of the probability of Type I error and the probability of detection of outliers for the modified Hotelling 2 statistics.

Upper control limits (UCL)
Since the distribution of the Hotelling's 2 statistic is vague when the sample size is small, the upper control limits are calculated by employing the simulation. The simulation occurs during two phases I and II. Phase I the bootstrap data sets generated from the standard normal distribution �0, �. The traditional and the robust estimators are calculated for these data sets. Phase II produces a new additional observation from the standard normal distribution �0, �. Repeat the generating for 5000 times. Then calculate the corresponding modified Hotelling's 2 statistic for these 5000 new additional observations. The percentile 95 computed for these values of adjusted Hotelling's 2 statistic and then such value of the percentile considered as the UCL.
The simulation for computing false alarms and the possibility of detection outliers undergo through two phases I and II. To clarify, Phase I the bootstrap method for the generated individual observations from distribution �0. � is used. Then the outliers are added according to the two cases independent and dependent variables (case (A) and case (B)). The traditional and robust estimators computed in this phase. According to phase II, the false alarms is calculated and the possibility of detection outliers. For instance, the false alarm is calculated when we pro a new observation from in control distribution. Meanwhile the possibility of detection of outliers is calculating when the new observation is produced from the out of control distribution.

RESULTS AND DISCUSSION
The following table represents both the results of false alarms and the probability of detection outliers for the new modified robust chart. As shown in Table 1 particularly in case (A), the bivariate variables with level of significance α =0.05. In the presence of the data outliers, the values of the rates, in the robust chart for false alarms are better than the values the rates in the traditional Hotelling's 2 chart. In general, the rates of false alarms are under control when the percentage of outliers ε=0.1. As sample sizes increase, the rates of false alarms become conservative whenever the percentage of outliers is ε=0. 2 In respect of the possibility of detection, outliers are considered better in the robust Hotelling's 2 control charts comparing with traditional Hotelling's 2 control charts. Particularly when the percentage of outliers ε=0.2, the huge difference could be detected between the rates of the possibility of detection of outliers between the robust chart and the traditional chart which implies that the findings concerning robust charts are better than the findings in the traditional charts. Furthermore, the robust Hotelling 2 control chart, the more values of the possibility of detection outliers increase, the more sample size increase that achieves100% detected when the sample size achieve to 100. On the contrary, the more values of the probability of detection for the traditional control charts decrease, the more sample size increase. Therefore, it can be deduced that the modified robust Hotelling's 2 control charts are better than the traditional Hotelling's 2 control charts particularly in the possibility of detection outliers.
With regard to case (B) concerning the dependent variables, the robust chart has better performance comparing with the performance of the traditional chart relating with false alarms and the possibility of detection. It has been observed that the values of the possibility of detection of outliers are better particularly when the percentage of outliers ε=0.1 despite that the values of false alarms in this case are not well and reasonable. The increasing of the sample size impacts on the possibility of detection, but does not impact on the values of false alarms. This suggests that the modified robust Hotelling's 2 control chart is better than the traditional Hotelling's 2 control chart particularly in the possibility of detecting outliers.
Case A indicates that independent variables are demonstrated in Figures 1-4 with the number of characteristics at p=2. The findings reveal that in the detection of outliers, the new adopted chart has better performance comparing with the conventional chart. It is obvious the strong performance of the new chart in detecting outliers data when the sample sizes are 30 and 40 than other sizes. In respect of the dependent variables (case B), Figures 5 and 6 confirms the superiority of the new chart to the conventional one, indicating the appropriateness of the new chart for the above-mentioned case.

Empirical case study
Vargas J in [28] findings were employed in order to compare between them. Thus, the findings of the suggested technique and those of the conventional and adjusted control charts were contrasted. It is obvious two features of random variables in the data namely 1 X and 2 X . Most importantly the data were elicited from 30 various products from the production process. Vargas J in [28] used two variables from Quesenberry dataset. As such, Table 2 reveals the observations of the above-mentioned random variables namely the values of the new Hotelling's T 2 and the conventional T 2 chart statistics. The simulation for the robust charts was used in the UCL calculation. This study reveals that the value of all UCL was established at 9.4787 for α=0.05. The elicited findings, for the conventional chart, only one outlier was detected namely the 2 nd , while the robust charts were able to detect five outliers namely the 2 nd , 5 th , 14 nd , 17 th and 20 th .

CONCLUSION
The Hotelling's T 2 chart trimmed covariance matrix and trimmed mean were used in this study respectively as the scale covariance matrix and the location vector. The comparative concerning the modified chart and the conventional chart aimed at revealing their performance in detecting outliers and false alarms. To this end, two cases were utilized case (A) and case (B). To illustrate, the former contains the independent variables, while the later contains the dependent variables. The findings of the simulation results show that modified chart was able to control false alarm rates under most of conditions. Surprisingly, such ability started to reduce followed by the enhance in the shifted mean vector µ and proportion of outliers. The robust chart has indicated its superior ability in producing the possibility of detection outliers comparing with the ability of conventional T 2 chart.