# What is high leverage

## One-way ANCOVA: Finding outliers

**One-way ANCOVA**

Outliers are another possible source of **Distortions** in statistical analyzes and most methods are only slightly or not robustly robust if there are outliers in the data set. A single outlier can be the reason for a non-significant or a significant result. You can easily check this yourself by simply quadrupling one value in the example data set. These effects are immediately reflected in the significance and effect sizes of the ANCOVA.

We're going to check outliers here using two different methods: **Leverage values** and **Cook distances.**

### Leverage values

The leverage value (English *leverage*) is a measure of how far the value of an independent variable is from other values. A high leverage would mean that there are no other cases near this case. It could be an outlier. The leverage value can assume values between 0 and 1, whereby a value of 0 would mean that the case has no influence on the prediction and 1 that the prediction is completely determined by this one value.

There are various formulas and cut-offs for calculating when a leverage value is large enough to be classified as an outlier. Many of them depend on the number of groups *k *and covariates *c* and the number of cases *n*. The value p is calculated from *k* and *c* doing so: *p* = *k* – 1 + *c*. With one covariate and three groups, ours would be *p* so 3.

- Huber (1981) recommends a general cut-off value of .2, regardless of other parameters
- Igo (2010) recommends the formula \ (\ frac {2 \ cdot p} {n} \) for reasonably large data sets of
*n*−*p*> 50 - Velleman & Welsch (1981), however, recommend \ (\ frac {3 \ cdot p} {n} \) for
*p*> 6 and*n*−*p*> 12 - Hoaglin & Welsch (1978) recommend \ (2 \ cdot \ frac {p + 1} {n} \) as a rule of thumb for "large leverage values"

Now we are spoiled for choice. We have with our sample dataset *p* = 3 in 145 cases. According to Igo (2010), an outlier would be a leverage value of .0413 or greater. According to Hoaglin and Welsch (1978), however, at .0552. And in Huber (1981) - regardless of all other parameters - in .2.

Leverage values are checked by arranging the column in the SPSS data view in descending order. To do this, we go to the data view and right-click on the column LEV_1 and arrange it in descending order, as in the video below

After that, the largest values of LEV_1 are above:

The first value (.05816) can be seen as an outlier according to both Igo (2010) and Hoaglin & Welsch (1978). Here we could consider whether we want to exclude this observation from further analysis or not. According to Igo (2010), however, the first 13 cases would also be outliers.

### Cook distances

Cook's distance is also a measure of the influence that a single case has on the entire model. It measures how much the regression line would change if we excluded the case. In general, values greater than 1 are considered to be outliers and should be investigated more closely.

The check is carried out in a similar way as before: We arrange the variables COO_1 in descending order:

After we've sorted, our dataset would look like this:

The highest value here is .06 and thus far from the cut-off criterion of 1.

### What to do if...

If we have outliers in our data set, we can consider whether to exclude them from further data analysis. Here it is also advisable to exclude the values and carry out the analysis again. This often improves statistics like that*p*Value or the variance explanation, which we will discuss later.

Since there are several ways to classify outliers, all methods should too **used in combination** become. A value is most likely to be an outlier when multiple procedures identify it as such.

We would generally recommend being careful about excluding cases. Any exclusion represents an interference with the data that should be carefully considered. If several cases emerge as outliers, it should also be checked whether there is one behind them **Systematics** lies. It can often be the case that “outliers” accumulate on another variable, e.g. if, in a visual experiment, people are outliers who the stimulus material could not see. If further variables have been recorded, it can be useful to get to the bottom of a possible cause.

As always when excluding cases from further data analysis, the following applies: Document and report everything! If we exclude cases, this must be stated and justified in the paper.

### bibliography

- Huber. (1981).
*Robust statistics*. New York: John Wiley. - Igo, R.P. (2010). Influential Data Points. In N. J. Salkind (Ed.),
*Encyclopedia of Research Design*(Vol. 2, pp. 600-602). Los Angeles: Sage. - Velleman, P. F., & Welsch, R. E. (1981).
*Efficient Computing of Regression Diagnostics*. The American Statistician,*35*(4), 234. doi: 10.2307 / 2683296 - Hoaglin, D. C., & Welsch, R. E. (1978). The Hat Matrix in Regression and ANOVA.
*The American Statistician*,*32*(1), 17-22. doi: 10.1080 / 00031305.1978.10479237

- What is 4chan's culture
- What is the disadvantage of smartphones
- Why can't we replace switches with routers?
- Why are European governments against Google
- What is meant by cervical syndrome
- What is Einstein's theory of local realism
- The Shiites pray tarawih and Friday prayers
- Information security is a lucrative career
- Where is the brain located
- How can I access my subconscious
- What is now similar to Backpage
- What is pleasure for the soul
- Are there community guidelines for Quora?
- Why do people think teenagers need sex
- The farmers buy their own food
- What is freefall acceleration
- Who can call him an economist himself?
- Is a mosquito an animal anyway
- Multiple inheritance is supported in PHP
- Why are BMW grilles getting bigger
- What's new
- Use Quora for Android more data
- Should people pay hackers who use ransomware
- How do I dry tea bags