I noticed an interesting phenomenon while reading this paper about a gene therapy that was intended to treat advanced heart failure.

The paper (which documents a Phase I/II study) found an 82% reduction in the risk of cardiac events in the high-dose group vs. the placebo group, with a p-value of 0.048. Weakly significant, but a big effect size.

On the other hand, if you look at the raw numbers you see that some people have a lot of cardiac events, and some have few or none. If you divide people into “healthy” vs. “unhealthy”, where “unhealthy” people have at least one cardiac event within a year, and “healthy” people don’t, then the placebo group had 7 healthy and 7 unhealthy patients, while the high-dose group had 7 healthy and 2 unhealthy patients.

If you do a one-sided t-test of this, you get a non-significant 0.07 p-value.

And intuitively, it makes *sense* that 7 out of 14 unhealthy patients vs 7 out of 9 unhealthy patients could very easily be a fluke.

How you frame the problem, what you consider to be an “event” in your probability space, matters. Do you count cardiac events? Mostly healthy vs. mostly unhealthy people? People with no cardiac events vs. people with any cardiac events? (the latter gives you p=0.089).

One way of framing it is that you posit some kind of hierarchical model. In this case, your risk of having a cardiac event is drawn from a probability distribution which is something like a mixture of two gamma distributions, one with a “low risk” parameter and one with a “high risk” parameter.

You could make a generative model to test the null hypothesis. Under the assumption that the therapy doesn’t work, you could randomly choose the size of the “high risk” vs. “low risk” population, and then for each patient, draw to see whether they’re high risk or low risk, and then draw again (repeatedly) from the appropriate gamma distribution to get their pattern of cardiac events. Sampling from this can give you the posterior probability distribution of the actual data given the null hypothesis.

You could even make the number of clusters in your mixture, or the cutoffs of the clusters, random variables themselves, and average over different models. That’s not really *eliminating* the fact that choice of model matters, it’s just pushing your agnosticism up a meta-level; but it may be general enough to be *practically* like model-agnosticism (e.g. adding more levels of hierarchy to the model might eventually cease to change the answer to “is this therapy significantly effective?” Note that you’re only getting p-value differences of a few percent even when we’re only tweaking a single parameter. At some point I should try this empirically and see how much difference added model flexibility actually makes.)

But there’s a basic principle here which I see in a lot of contexts, where the output of a statistical algorithm is dependent on your choice of ontology. And, I think, your choice of ontology is ultimately dependent on your *goals*. Do I want to measure reduction in number of heart attacks or do I want to measure number of people who become heart-attack-free? There can’t be an answer to that question that’s wholly independent of my priorities. Even averaging over models is essentially saying “over a wide range of possible priority structures, you tend to get answers to your question lying in such-and-such a range.” It doesn’t mean you couldn’t construct a really weird ontology that would cause the algorithm to spit out something completely different.

This is interesting… I nearly wrote a reply to your post on “Uncertainty and Confidence,” in which was was going to suggest that not knowing the right ontology is one reason you can be in a state of Knightian uncertainty. Working out the details of that suggestion seemed complicated enough that I decided I didn’t have time for it… but now that you have raised the ontology issue yourself, I’m wondering if that one-sentence version makes sense to you?

My post “How To Think Real Good” (http://meaningness.com/metablog/how-to-think) lays out some related issues. I was going to develop this point on LessWrong, having introduced Knightian uncertainty at http://lesswrong.com/lw/igv/probability_knowledge_and_metaprobability/ ; but ran out of time to finish the projected sequence there.

I’m not yet sure how to make that notion rigorous, but I think I may agree with you. Knightian uncertainty says “you can’t know the probability of event X.” If X is not an event, but something like a set of overlapping events in different ontologies — like “has a heart attack within a year”, “has more than 10 heart attacks”, “has a nonzero number of heart attacks”, “has a fatal heart attack”, all of which serve as proxies for “you know, the *bad* cluster”, then a claim that Knightian uncertainty is true is a claim about the probabilities of these overlapping “events” being all over the map. (Or, sort-of-equivalently, a claim that the structure of the hidden variables in your model matters a lot to whether your significance test spits out “yes” or “no.”)

In general I’ve become more open to “subjectiveness” recently — claims that you can only know things *relative* to an ontology or goal structure. (*Given* your ontology, well-posed questions have only one correct answer.)