How can I appear for Mu Sigma

Evaluation of the attempt at F1 error distribution with GnuPlot


The experiment F1 (introductory internship (2007) pages 1 and 2) deals with a cause of the measurement uncertainty and its statistical distribution. To measure means to compare. A physical quantity, e.g. a length is compared with a defined standard with the help of a measuring device, e.g. with the help of a ruler. In order to simplify this comparison for different values ​​of the variable to be measured, the measuring device is almost always equipped with a scale on which the desired value can be read, either directly or with the help of a pointer. The reading process itself is a cause of the measurement uncertainty, regardless of whether the experimenter carries it out himself or whether it is implemented using electronic aids (digital measuring devices).

Anyone can carry out the F1 experiment on their own computer. This will be the program GnuPlot and the GnuPlot script F1.gnuplot is required. By entering the command

load "F1.gnuplot"

in the terminal window of GnuPlot the attempt is started.

In the graphic window that opens, a scale from 0 to 10 with a subdivision in steps of 0.5 is shown after a leader. A pointer is shown at a randomly selected position on this scale (left side of Figure 1). The position of the pointer should be estimated and noted with 2 decimal places. After 5 seconds, the true value of the pointer position, rounded to three decimal places, is displayed for another 4 seconds (right-hand side of Figure 1). This must also be noted. This creates a table of measured values ​​with 100 pairs of values.

Further computer-aided evaluation can also be done with the program GnuPlot respectively. To do this, the first step is to transfer this table to a file, e.g. `` data.txt '', using a simple text editor, each value pair in a separate line, in the first column the read value, in the second column the true value. The difference between the read value and the true value is the reading error, the statistics of which are to be examined further below. If you enter into the terminal window of GnuPlot the command

stats "data.txt" using ($ 1- $ 2);

a large amount of information about the distribution of the reading error is obtained in this way. If the warning `` Can't read data file '' appears instead, the command

cd "Path to the file data.txt"

change to the directory in which the data file is saved. The current path, the current working directory, can be entered with

pwd

be queried. With the arrow keys up and down can scroll through the list of previously entered commands. This allows them to be edited and executed again.

The sizes , and the number of data points required.

print STATS_min
print STATS_max
ntotal = STATS_records

To display the histogram, the values ​​for and manually set so that all data points are included. In this example:

xmin = -0.26
xmax = 0.26

The two values ​​for the mean and the standard deviation can also be used for later use.

mw = STATS_mean
std = STATS_stddev

The area divided into a suitable number of equal intervals. Each should contain at least 1 measuring point. There are different recommendations for the number of intervals. In the introductory script (2007), on page 23, no further explanation is given specified. Many statistics textbooks refer to Sturges (1926), the one for interval width suggests. is the range of data values . Scott (1979) uses the standard deviation determined from the data to determine the interval width. . The number of intervals then results from the interval width and the range of data values. Here is with forwarded.

ncolumn = 10

The next step is to create the frequency table. This contains the center point in the first column of the respective interval, in the second column the number of measuring points that lie within the interval. The third column shows the value of the cumulative frequency, the sum registered. This table, which is saved in a second file, e.g. `` histogrammdata.txt '', can either be done manually or with the program GnuPlot to be created. To do this, the next step is to calculate the interval width, define the file name and set the start value for the cumulative frequency to 0.

xdelta = (xmax-xmin) / ncolumn
set print 'histogrammdata.txt'
shn = 0

By using the already known command stats the number of measured values ​​contained in each interval can be queried. To do this, it is necessary to specify the range limits. Since we are only processing one column of values, the difference between the measured value and the true value, this is used by GnuPlot considered as a column of y-values. In this case, the line number in the data file is regarded as the x-value. With the complex command

do for [i = 1: ncolumn] {
xmax = xmin + xdelta
set yrange [xmin: xmax]
stats "data.txt" using ($ 1- $ 2) nooutput;
n = STATS_records
shn = shn + n
print sprintf ("% .3f% 3i% 3i", (xmin + xmax) / 2, n, shn)
xmin = xmax
}

the file with the histogram data is generated. Editing takes place only after the last closing bracket has been entered. The error message "All points out of range" means that an interval does not contain any data points. In this case the number of intervals should be reduced or the values ​​of and easy to change. Next is the file with the calculated histogram data with the command

set print

close. The last value of the cumulative frequency should match the number of data points. This can be easily done with

print shn, ntotal

check. The definition of the y range limits must be reset to the standard behavior

unset yrange

and define the axis labeling

set xlabel "measurement deviation"
set ylabel "absolute frequency"

The histogram can now be drawn.

plot "histogrammdata.txt" using 1: 2 with boxes notitle;

The height of the individual bars indicates the number of data points in the respective interval. The relationship corresponds to the probability density function of the underlying statistical distribution, which is for and would result. In very many cases this is the density function of the Gaussian distribution

with the mean and the standard deviation .

dg (my, sigma, x) = exp (-0.5 * ((x-my) / sigma) ** 2) / (sigma * sqrt (2 * pi))

With the values ​​initially determined from the data for the mean value and the standard deviation, the density distribution for the histogram can be drawn.

plot "histogrammdata.txt" using 1: 2 with boxes notitle, \
xdelta * ntotal * dg (mw, std, x) with lines notitle;

Probability density functions are always normalized. This means that the area under the density curve has the value 1. The additional factor xdelta * ntotal corresponds to the area of ​​all bars in the histogram. This means that the area under the drawn density distribution is just as large as the area of ​​the histogram.

The agreement between the distribution of the data points and a given probability distribution can be assessed much better with the aid of the distribution function. This is generally defined by:

For the normal distribution, this results in: With the substitution this can be applied to the distribution function of the standard normal distribution , the normal distribution with and to be led back.

The distribution function of the standard normal distribution can, if it is not defined as an independent function, also with the help of the error function be calculated.
(1)

In GnuPlot can calculate the distribution function of the standard normal distribution using the function norm (x) be calculated. This allows the calculated cumulative frequencies to be plotted together with the expected distribution function.

set yrange [0: 105]
set ylabel "absolute cumulative frequency"
plot "histogrammdata.txt" using 1: 3 notitle, \
ntotal * norm ((x-mw) / std) with lines notitle;

The deviations can be seen better in this representation (Figure 3). These could become even clearer if the y-axis of the graph were divided in such a way that the distribution function resulted in a straight line. This can be achieved by scaling with the inverse function of the distribution function (equation 1).

(2)

In it is the inverse error function, the inverse function of the error function. By using this function, which is available in GnuPlot as invnorm (x) is already defined, on the y-values ​​of the graph, the distribution function is shown as a straight line.

unset yrange
set ylabel "sigma"
plot "histogrammdata.txt" using 1: (invnorm ($ 3 / ntotal)) notitle, \
((x-mw) / std) with lines notitle;

In this representation (Figure 4) the y-axis is in multiples of divided. The values ​​of the cumulative frequency can no longer be read directly. To change this, the right y-axis can be provided with a correspondingly scaled division. The following commands are necessary for this,

set yrange [-3.0: 3.0]
set y2range [0: 1.0]
set ytics nomirror
set link y2 via norm (y) inverse invnorm (y)
set y2tics 0.1 format "% .2f" nomirror
set y2tics add (0.01, 0.05, 0.95, 0.99)

with which the scaling of the y2-axis is linked to the y-axis via the distribution function. In addition, both axes should be labeled.

set ylabel "sigma"
set y2label "relative cumulative frequency"

To make it easier to read the values ​​from the graph, you can use

set grid xtics y2tics

add another coordinate network. This network, printed out on paper, was called `` probability paper '' and was used in pre-computer times for the quick comparison of an empirically determined distribution with the normal distribution. Now everything can be redrawn together again.

plot "histogrammdata.txt" using 1: (invnorm ($ 3 / ntotal)) notitle, \
((x-mw) / std) with lines notitle;

The straight line drawn in Figures 4 and 5 was based on the values ​​for and calculated. The question now arises as to whether there might not be a straight line that is better adapted to the data points. The command stats be used. Only now it is on the histogram data, on the columns with and cumulative frequency applied. The cumulative frequencies are scaled with the inverse distribution function (equation 2).

stats "histogrammdata.txt" using 1: (invnorm ($ 3 / ntotal));

As a result, in addition to the statistical information on the data in the two columns, information on the statistical relationships between the two data columns is also obtained. Including the parameters and the straight line that best describes an existing linear relationship. These can be done with the commands

a = STATS_slope
b = STATS_intercept

be taken over. The graph is with the fitted straight line

y (x) = a * x + b

redrawn (Figure 6).

plot "histogrammdata.txt" using 1: (invnorm ($ 3 / ntotal)) notitle, \
y (x) with lines notitle;

From the two parameters and let the mean values and the standard deviation determine the associated normal distribution, and with

print -b / a, 1 / a

in the terminal window of GnuPlot output. These values ​​should be compared to the values ​​of and

print mw, std

within the error intervals that are also in the output of the command stats are included.

literature

Basic physical internship: Introductory internship
Faculty of Mathematics and Natural Sciences at the HUB
Institute for Physics 2007
http://gpr.physik.hu-berlin.de/Skripten/Einfuehrungspraktikum/PDF-Dateien/Einfuehrungspraktikum.pdf
Retrieved on December 14, 2016 9:49 a.m.

Basic physical internship:
Introduction to the measurement, evaluation and presentation of experimental results in physics
Faculty of Mathematics and Natural Sciences at the HUB
Institute for Physics 2007
http://gpr.physik.hu-berlin.de/Skripten/Einfuehrung/PDF-Datei/Einfuehrung.pdf
Retrieved on January 12, 2017, 11:57 a.m.

H.A. Sturges, The choice of a class interval
Journal of the American Statistical Association 21, 65-66 1926

D.W. Scott, On optimal and data-based histograms.
Biometrics 66, 605-610.1979


Peter Schaefer 2018-01-30