What does different than zero mean statistically?

(Statistical) significance, significance test, significance level

In empirical social research, S. generally stands for statistical Significance and relates to the problem of inferring from a (random) sample of the population. The result of a hypothesis test - the Significance tests - if the assumption is plausible that a theoretically assumed relationship between characteristics or difference between groups found in the data cannot be explained solely by the uncertainty associated with the sampling.

The justification of this assumption can never be proven with certainty, and one cannot say in the context of the established test that the assumption "probably" is correct; even more so, it is not possible to state the probability with which the sample connection or difference also applies to the population. Rather, the statistical test is based on the following consideration:

  • We go hypothetical Assumes that in truth there is no connection / difference, or at least a different connection / difference than the observed one. (This hypothetical "counter-assumption" to the actually assumed connection is called the null hypothesis.)
  • In many cases we can now state the probability with which certain sample results will occur would if Null hypothesis would apply.
  • If there is a result in the current sample that would be quite improbable under this hypothetical assumption, then we have some justification to assume that the null hypothesis does not apply.
  • Since the null hypothesis asserts the opposite of what we suspected, this also suggests that our assumption, the research hypothesis (often also referred to as the alternative hypothesis), could be correct.

The property "quite unlikely" has to be quantified. As a rule, you choose a probability of 0.05 (sometimes 0.01 and 0.001), i.e. 5 (or 1 or 0.1) percent. This size is known as Level of significance. Sometimes the term probability of error is also used here, but the mentioned probability only relates to the first type error, precisely the erroneous conclusion that the null hypothesis is invalid.

In general, the S. test is based on a research hypothesis that relates to a relationship, a difference or an influence (for example in the form of a coefficient of a regression equation). This is confronted with a null hypothesis, which usually states that no There is a connection / difference / influence; however, null hypotheses are also conceivable such that the relationship etc. does not exceed a certain amount. (The null hypothesis is often referred to as H.0 the research hypothesis - also known as the alternative hypothesis - as H.1).

A test statistic is then calculated on the basis of the data, the type of which depends on the question and the type of data available (especially its level of measurement). This test statistic represents a random variable that corresponds to a known probability distribution. It can be used to indicate the probability of obtaining a result like the present sample result or an even more extreme result (deviating from more than the null hypothesis) if the null hypothesis applies in the population. Depending on the selected significance level, there is a critical value for this test statistic that corresponds to the corresponding quantile of the distribution of the relevant random variable (e.g. if a significance level of 5 percent or 0.05 has been selected, the critical value separates the 95 percent below the null hypothesis is most likely from the 5 percent least likely values).

If the test statistic calculated from the data (if necessary: ​​in the absolute amount) is greater than this critical value (one also says: it is in the rejection area), the null hypothesis is rejected, otherwise it is retained (until further notice). A more precise distinction must be made here between one-sided and two-sided hypotheses. In the first case, the critical value typically separates (with a selected significance level of α percent) the lower 100 − α from the upper α percent of the distribution or, in the case of negative differences / correlations, the lower α from the upper 100 − α percent; in the second case there are two critical values, namely one for the α / 2 quantile and one for the 100− (α / 2) quantile.

In research practice one usually proceeds somewhat differently: In general, one calculates the probability with which, if the null hypothesis is valid, the observed sample result or a sample result that is even less compatible with the null hypothesis would be expected. This probability is usually referred to as the p-value. One can then say: A result is statistically significant if the p-value is lower than the previous significance level. Often it is simply stated which of the usual significance levels the p-value falls below (e.g. one asterisk [*] for a p-value <0.05, two asterisks for p <0.01 and three for p <0.001). In this way, the audience can choose within limits which level of significance they want to use (direct specification of the p-value would, however, allow more options.)

Frequently used S. tests are e.g. B. the t-test, the F-test of analysis of variance, or the chi-square test for crosstabs.

Whether or not a statistical test is significant depends not only on the level of significance (the probability of error), but above all on the size of the sample. With increasing size, even small and insignificant relationships or differences can be confirmed as significant. A significant (test) result cannot therefore be equated with an important (research) result without closer examination.

Note that the term “significance test” is not used in a completely uniform way in the statistical literature. The explanation used here corresponds roughly to the language used by Fahrmeir et al. (1997) or Hartung et al. (1982); Kühnel & Krebs (2001), on the other hand, reserve the term only for tests in which the null hypothesis is that in the population no There is a connection (or difference), but the alternative hypothesis is that the connection (or difference) is different from zero. This usage is likely to be the most widespread in statistics; it also corresponds to the one on which the recurring arguments about the meaning and purpose of significance tests are based (see Morrison et al. 1970 and Harlow et al. 1997. Finally, Daly et al. speak of a significance test if, on the basis of the calculated test statistics, the " empirical significance level ", ie the probability of receiving a test value of the calculated quantity if the null hypothesis is valid is referred to there as "fixed-level testing").

See also: Type 1 and 2 errors, inferential statistics, confidence interval.


  • Daly, F./Hand, D. J. / Jones, M. C. / Lunn, A. D./McConway, K. J .: Elements of Statistics, Harlow: The Open University / Addison-Wesley, 1995
  • Fahrmeir, Ludwig / artist, Rita / Pigeot, Iris / Tutz, Georg: Statistics. The way to data analysis. Berlin, Heidelberg, New York: Springer, 1997, since then further editions
  • Harlow, Lisa L./Mulaik, Stanley A./Steiger, James H. (Eds.): What if There Were no Significance Tests? Mahwah, New Jersey; London: Erlbaum, 1997
  • Hartung, Joachim / Elpelt, Bärbel / Klösener, Karl-Heinz: Statistics. Teaching and handbook of applied statistics. Munich, Vienna: Oldenbourg, numerous editions since 1982, the 14th edition appeared in 2005
  • Kühnel, Steffen-M./Krebs, Dagmar: Statistics for the social sciences. Basics, methods, applications. Reinbek near Hamburg: Rowohlt, 2001
  • Morrison, Denton E./Henkel, Ramon E. (Eds.): The Significance Test Controversy. Chicago: Aldine, 1970

© W. Ludwig-Mayerhofer, ILMES | Last update: 11 Nov 2016