‘Sizeless science’ and the cult of significance testing
"A typical situation is the following. A scientist formulates a null hypothesis. By means of asignificance test, she tries to falsify it. The analysis leads to a p-value, which indicates how likely it would have been, if the null hypothesis were true, to obtain data at least as extreme as those she actually got. If the p-value is below a certain prespecified threshold (typically 0.01 or 0.05), the result is deemed statistically significant, which, although far from constituting a definite disproof of the null hypothesis, counts as evidence against it. (...) The Cult of Statistical Significance is written in an entertaining and polemical style. Sometimes the authors push their position a bit far, such as when they ask themselves: “If nullhypothesis significance testing is as idiotic as we and its other critics have so long believed, how on earth has it survived?” (p. 240). Granted, the single-minded focus on statistical significance that they label sizeless science is bad practice. Still, to throw out the use of significance tests would be a mistake, considering how often it is a crucial tool for concluding with confidence that what we see really is a pattern, as opposed to just noise. For a data set to provide reasonable evidence of an important deviation from the null hypothesis, we typically need both statistical and subject-matter significance."
https://rwer.wordpress.com/2015/03/10/sizeless-science-and-the-cult-of-significance-testing/
"How statistics skew research results (...) "Consider this statement – a change is statistically significant if we are unlikely to get the observed results, assuming the treatment under study actually has no effect. If you find that difficult to understand, you’re in good company. Statistical significance testing relies on weird backward logic, and there’s clear evidence that most students and many researchers don’t understand it.
Another problem is that statistical significance is very sensitive to how many people we observe. A small experiment studying only a few patients probably won’t identify even a large effect as statistically significant. On the other hand, a very large experiment is likely to label even a tiny, worthless effect as statistically significant. For this and other reasons, it’s far better to avoid statistical significance as a measure and use estimation, an alternative statistical approach that’s well known, but sadly, little used.
Estimation tells us things such as “the average reduction in pain was 1.2 ± 0.5 points on the 10-point pain scale” (1.2 plus or minus 0.5). That’s far more informative than any statement about significance. And we can interpret the 1.2 (the average improvement) in clinical terms — in terms of how patients actuallyfelt.
The “± 0.5” tells us the precision of our estimate. Instead of 1.2 ± 0.5 we could write 0.7 to 1.7. Such a range is called a confidence interval. The usual convention is to report 95% confidence intervals, which mean we can be 95% confident the interval includes the true average reduction in pain. That’s a highly informative summary of the findings."https://larspsyll.wordpress.com/2012/06/19/how-statistics-skew-research-results/
Example: my estimation method gives me an effect of 2jobs (effect=2); I'll test whether, if the null hypothesis (effect=0) were true, what would be the probability of me getting the wrong effect of 2jobs (because of wrong sampling)?; in other words, what is the probability of me getting a sample with means=2 when the true distribution has a means=0?; this gives me the p-value; if it is below 5% I assume that there is not many evidence against my estimation.