A thought experiment

Consider the mean daily temperature in Berlin for a nice summer day in June. Let us say we measure a mean daily temperature of \(19^\circ C\). Now, the task is to estimate the mean temperature of tomorrow.

There are of course several more or less sophisticated approaches to solve this task. One may look up archived data and base the estimation on descriptive statistics, or one may even apply a very sophisticated modeling approach with tons of model parameters, or one may just make a guess based on common sense. However, no matter what approach is chosen the result will always be an estimate, associated with some level of uncertainty.

To reflect that uncertainty we do not present a point estimate of tomorrow’s temperature but an interval estimate. In order to achieve a high accuracy of our estimate, which means we want to be very confident that our interval contains the actual value, we apply a very large margin of error. For example, we state that tomorrow’s temperature is \(19 \pm 20^\circ C\). Based on common sense we would probably agree that it is very likely that the mean daily temperature of any summer day in June in Berlin is between \(-1\) and \(39^\circ C\).

Although the interval is very large and it may even contain all mean daily temperatures in June for the weather observation history of Berlin, there is still a small chance that we are wrong. Imagine any natural or man-made cataclysmic event such as a huge volcanic eruption, an asteroid impact or a nuclear war; in those fortunately very unlike cases even such a huge margin of error may not guarantee that tomorrow’s mean temperature is within the given interval. Nonetheless, it is important to note that in order to achieve a high accuracy we increase the confidence level and thus the width of the confidence interval.

We stated that tomorrow’s temperature is in the range of \(19 \pm 20^\circ C\). But is such a statement, despite the high accuracy, of any value? Does this estimate help us decide what clothes to wear tomorrow? No, not at all!

Thus, in many applications we are not interested in accuracy alone, but we are also interested in the precision of an estimate. A guess at tomorrow’s temperature of higher precision would be \(19 \pm 2^\circ C\). Such a prediction definitely helps us to decide which clothes to wear; however, the chance that we are wrong is much higher.


Note: It is important to memorize that by increasing the precision we narrow down the width of the confidence interval and thus decrease the confidence level. This trade-off between accuracy and precision when selecting a proper confidence level is very important in real life applications.


Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.