What is the purpose of standardizing data

z transformation

Author: Hans Lohninger

One often has the problem of making sample values ​​comparable, whereby it is usually important to characterize individual values ​​of a sample with regard to their position in the sample distribution. An often used tool for this is the so-called z transformation, in which the values ​​of a sample are converted into z-values. z-scores) are converted:

With

zi ... z-transformed sample values
xi ... original values ​​of the sample
... mean of the sample
s ... standard deviation of the sample

The z-transform is also called standardization or Auto scaling designated. Z-transformed values ​​can be compared primarily because the sample values ​​after the transformation are no longer measured in the original units of measurement, but in multiples of the standard deviation of the sample. In addition, the mean of z-values ​​is always zero. If the original values ​​are normally distributed, the z-values ​​are standard normally distributed (μ = 0, σ = 1).

The following example shows the effect of standardizing data. Suppose we have two normal distributions, one with a mean of 10.0 and a standard deviation of 30.0 (top left), the other with a mean of 200 and a standard deviation of 20.0 (top right). The standardization of both data sets now results in comparable distributions, since the z-transformed distributions both have a mean of 0.0 and a standard deviation of 1.0 (lower line).

Note:Sometimes one reads in the literature that the z-values ​​are normally distributed. That is wrong - the z-transformation does not change the shape of the distribution, only the mean and the standard deviation are changed. Figuratively speaking: The distribution is shifted and stretched or compressed along the x-axis.