**I. (May 21) Introduction: Controversies in Phil Stat**

**Reading:**

**SIST*: Preface, Excursion 1**

**Preface**

Excursion 1 Tour I

Excursion 1 Tour I

**Excursion 1 Tour II**

*Note: The above are from proofs, participants should have a copy of the book

**Notes:****NOTES on Excursion 1****Postcard**

Please ask any questions from the First Meeting in the comments of this blog.

### **General Info Items: **

**-References: Captain’s Bibliography **

**–Souvenirs **Meeting 1: Souvenir A Postcard to Send, Souvenir B Likelihood versus Error Statistical, Souvenir C A Severe Tester’s Translation Guide, Souvenir D Why We Are So New

**-Summaries of 16 Tours (abstracts & keywords) **

**–Excerpts & Mementos on Error Statistics Philosophy Blog**

**Slides & Video Links for Meeting 1:**

**Slides:** (PDF)

**Intro video from July 28, 2019**

**(Viewing in full screen mode helps with buffering issues.)**

Prof. Mayo,

Thank you for preparing this interesting course. I am a statistics student and followed eagerly the first live lecture. Then I went to the book to pass through the concepts again. I have the following questions (in order of interest priority for me):

1. Is it fair to come away from this discussion thinking that, our statistical methods allow us to only refuse a null hypothesis but never confirm? Does this extend to other sciences that depend on experimentation?

2. Your description of severe testing makes sense to me. Is passing a severe test the best thing that a proposed theory can hope for?

3. On page. 38 you mention a very interesting example of the trick deck and explain how restricting all information to come from the likelihood will lead one to believe that the deck is tricked every time they condition on a random draw from the deck. This is very disturbing, from a practical point of view. Is it fair to say that using prior knowledge is the only way to avoid this seaming paradox (at least in practical terms) ?

4. On page 31 of your book you mention the problem of a phenomenon being explained equally well (or perfectly) by two competing theories. Is this dichotomy, in principle, solvable?

5. On page. 35 you say “the Likelihoodist disagrees with the significance tester”. Can you explain the difference?

Thank you again – looking forward to the next class.

Konstantinos:

these are all great questions, and I will answer as many as I can as soon as possible.

Dear Professor Mayo,

I have a question!

On page 47 you write the following:

Armitage […] thinks stopping rules should be reflected in overall inferences. He goes further:

[Savage] remarked that, using conventional significance tests, if you go on long enough you can be sure of achieving any level of significance; does not the same sort of result happen with Bayesian methods? (ibid., p. 72)

He has in mind using a type of uniform prior probability for μ, wherein the posterior for the null hypothesis matches the significance level.

Indeed on page 430, you seem to show, following Berger and Wolpert, that it is possible to design an experiment with the stopping rule ‘Keep sampling until the 95% confidence interval excludes 0’ that is guaranteed to stop for the Bayesian too. In other words, it seems possible to trick the Bayesian into assigning a probability of 0.95 to an interval estimate for the parameter θ that does not include 0 despite θ=0. But if this is right (i.e. if I understood this correctly!), then I have trouble understanding the following remark made by Sprenger (2013, 29):

The posterior probability of a hypothesis cannot be arbitrarily manipulated (Kadane et al. 1996). If we stop an experiment if and only if the posterior of a hypothesis exceeds a certain threshold, there will be a substantial chance that the experiment never terminates. It is therefore not possible to reason to a foregone conclusion with certainty by choosing a suitable stopping rule. Similar results that bound the probability of observing misleading evidence have been proved by Savage (1962) and Royall (2000).

Do you think Sprenger’s wrong here? Or perhaps you think he’s right, and that this does not contradict your point (i.e. that I’m misinterpreting what you wrote)?

Thank you and Looking forward to tomorrow’s seminar!

Thanks for your question. This is a new blog, and the comment system is rather mysterious. A quick reply:

See the discussion on p. 431 by Berger and Wolpert (in the Likelihood Principle). They grant the corresponding example with confidence intervals that are sure to omit 0, but given the prior do not really consider this “being misled”. Various ways to bound the error probabilities in Bayesian accounts are given. See for example, Note 4 on p. 47. It will generally involve either stipulating a fixed set of hypotheses and/or restricting or altering the prior. I believe that updating priors on each round also works. Jay Kadane’s ‘reasoning to a foregone conclusion’ gives stipulations to avoid this for a given hypothesis https://errorstatistics.com/screen-shot-2020-05-27-at-12-02-45-pm/

I worked on this long ago with Michael Kruse

Dear Konstantinos:

These are some partial replies that I will get back to after our seminar tomorrow.

1. Is it fair to come away from this discussion thinking that, our statistical methods allow us to only refuse a null hypothesis but never confirm? Does this extend to other sciences that depend on experimentation?

No I do think we “corroborate” as severely tested claims. Most often this is with a one-sided test rather than a point hypothesis. Failing to find statistical significance––something we’ll discuss tomorrow– can minimally be construed as: these data do not provide evidence against Ho. After repeated failure to demonstrate an effect with a test that is capable of revealing its presence, were it to exist, we are warranted to infer its absence–at least with existing data. However, we can also set upper bounds to discrepancies that have been well ruled out statistically. We can do this with confidence bound(s) or a severity assessment. Also see the example of an inference to a “null” effect (SIST p. 157) in the case of inferring the “equivalence principle” in physics. SIST contains much more on this topic as will the next 2 sessions, so we can leave it until then.

2. Your description of severe testing makes sense to me. Is passing a severe test the best thing that a proposed theory can hope for?

No. It should also be informative and solve problems of interest. A claim may be severely tested but trivial.

3. On page. 38 you mention a very interesting example of the trick deck and explain how restricting all information to come from the likelihood will lead one to believe that the deck is tricked every time they condition on a random draw from the deck. This is very disturbing, from a practical point of view. Is it fair to say that using prior knowledge is the only way to avoid this seaming paradox (at least in practical terms) ?

No. We can rule it out as passing a terribly insevere test. To be fair to the likelihoodists, however, they would not say you believe it, even though it’s best supported.

I’ll get to 4 and 5 later on.

Thanks for the queries.