This sequence of posts is an experiment in fleshing out how I see the world. I expect to revise and correct things, especially in response to discussion.
“Ontology” is an answer to the question “what are the things that exist?”
Consider an reasoning agent making decisions. This can be a person or an algorithm. It has a model of the world, and it chooses the decision that has the best outcome, where “best” is rated by some evaluative standard.
A structure like this requires an ontology — you have to define what are the states of the world, what are the decision options, and so on. If outcomes are probabilistic, you have to define a sample space. If you are trying to choose the decision that maximizes the expected value of the outcome, you have to have probability distributions over outcomes that sum to one.
[You could, in principle, have a decision-making agent that has no model of the world at all, but just responds to positive and negative feedback with the algorithm “do more of what rewards you and less of what punishes you.” This is much simpler than what humans do or what interesting computer programs do, and leads to problems with wireheading. So in this sequence I’ll be restricting attention to decision theories that do require a model of the world.]
The problem with standard decision theory is that you can define an “outcome” in lots of ways, seemingly arbitrarily. You want to partition all possible configurations of the universe into categories that represent “outcomes”, but there are infinitely many ways to do this, and most of them would wind up being very strange, like the taxonomy in Borges’ Celestial Emporium of Benevolent Knowledge:
Those that belong to the emperor
Those that are trained
Mermaids (or Sirens)
Those that are included in this classification
Those that tremble as if they were mad
Those drawn with a very fine camel hair brush
Those that have just broken the flower vase
Those that, at a distance, resemble flies
We know that statistical measurements, including how much “better” one decision is than another, can depend on the choice of ontology. So we’re faced with a problem here. One would presume that an agent, given a model of the world and a way to evaluate outcomes, would be able to determine the best decision to make. But the best decision depends on how you construct what the world is “made of”! Decision-making seems to be disappointingly ill-defined, even in an idealized mathematical setting.
This is akin to the measure problem in cosmology. In a multiverse, for every event, we think of there as being universes where the event happens and universes where the event doesn’t happen. The problem is that there are infinitely many universes where the event happens, and infinitely many where it doesn’t. We can construct the probability of the event as a limit as the number of universes becomes large, but the result depends sensitively on precisely how we do the scaling; there isn’t a single well-defined probability.
The direction I’m going to go in this sequence is to suggest a possible model for dealing with ontology, and cash it out somewhat into machine-learning language. My thoughts on this are very speculative, and drawn mostly from introspection and a little bit of what I know about computational neuroscience.
The motivation is basically a practical one, though. When trying to model a phenomenon computationally, there are a lot of judgment calls made by humans. Statistical methods can abstract away model selection to some degree (e.g. generate a lot of features and select the most relevant ones algorithmically) but never completely. To some degree, good models will always require good modelers. So it’s important to understand what we’re doing when we do the illegible, low-tech step of framing the problem and choosing which hypotheses to test.
Back when I was trying to build a Bayes net model for automated medical diagnosis, I thought it would be relatively simple. The medical literature is full of journal articles of the form “A increases/decreases the risk of B by X%.” A might be a treatment that reduces incidence of disease B; A might be a risk factor for disease B; A might be a disease that sometimes causes symptom B; etc. So, think of a graph, where A and B are nodes and X is the weight between them. Have researchers read a bunch of papers and add the corresponding nodes to the graph; then, when you have a patient with some known risk factors, symptoms, and diseases, just fill in the known values and propagate the probabilities throughout the graph to get the patient’s posterior probability of having various diseases.
This is pretty computationally impractical at large scales, but that wasn’t the main problem. The problem was deciding what a node is. Do you have a node for “heart attack”? Well, one study says a certain risk factor increases the risk of having a heart attack before 50, while another says that a different risk factor increases the lifetime number of heart attacks. Does this mean we need two nodes? How would we represent the relationship between them? Probably having early heart attacks and having lots of heart attacks are correlated, but we aren’t likely to be able to find a paper that quantifies that correlation. On the other hand, if we fuse the two nodes into one, then the strengths of the risk factors will be incommensurate. There’s a difficult judgment call inherent in just deciding what the primary “objects” of our model of the world are.
One reaction is to say “automating human judgment is harder than you thought”, which, of course, is true. But how do we make judgments, then? Obviously I’m not going to solve open problems in AI here, but I can at least think about how to concretize quantitatively the sorts of things that minds seem to be doing when they define objects and make judgments about them.
10 thoughts on “Epistemology Sequence, Part 1: Ontology”
Thank you so much for writing this and your previous ontology post! The class of examples you use has helped me untangle some of my confused thoughts about how to think about the ontology problem. An abstraction with a tiny leak at one level can become a fire hose destroying your values at another level.
You can make decisions without an ontology if you focus solely on reward signals. That has its own problems (wireheading), but it should be mentionned! Regarding multiverses, causality separated universes aren’t particularly an issue, but quantum multiverses are hairier. It’s not even clear to me what evolution produces in terms of utility function over a quantum multiverse and why.
ok, the decision-making w/o ontology makes sense (take the decision that has the highest expected reward).
I don’t know a lot about physics, tbh. My motivation was thinking about Tegmark universes and difficulties with the notion of defining a probability over all mathematical statements, but I thought the inflation context was a less controversial example of the “measure problem.” Am I wrong and there are actually multiple different “measure problems”?
also it occurs to me that you still need to define what qualifies as a “decision”, even in the case that you’re wireheading.
With the medical example, am I right in concluding that the reason we cant have separate nodes for “Heart attack under 50” and “lifetime number of heart attacks” is lack of data? If we had the complete medical records of a million people we would be able to find the correlation between these nodes, but currently only have medical papers and “we aren’t likely to be able to find a paper that quantifies that correlation.” Is that correct?
Yes, that’s true. Highly multivariate data would make a lot of medical research easier, but as far as I know it’s very difficult to obtain. Even with more variables observed we wouldn’t necessarily be able to observe every conceivable variable, but I could imagine an experiment with so much data that it answered all questions of this form that a human would be likely to pose.
One more thing, from the description you’re giving of how you intended to build your Bayes net, the error you’re committing is trying to read the probabilities directly from the papers. You can’t do that for the reasons you mention and others. You’re getting the arrow backwards, the principled way to do this is to update your model conditional on such and such paper having been published. Conditional on your model of human diseases and metabolism, what is the probability than X et al. wrote and publish a paper that showed increase heart diseases among 50 years old? Update on *that*. And you get to deal with small sample size and publication bias for free because you can incorporate those straightforwardly in that model.
Now, your *computational* problems become much worse, but those you can deal with by making simplifications and compromises, but you know exactly what compromises you’re making. An unprincipled approach might be tempting because it seems that it could almost work, but there’s a debt you keep on paying.
That would require my having a model of the human body. I can see that this would be better, but it would require way more domain-specific medical knowledge than I was prepared to invest in at the time. Existing “expert systems” for automated diagnosis are significantly less principled than the way I was trying to do it; their idea of a conditional probability for “disease, given symptom” is the frequency of co-occurrence of disease words and symptom words in medical documents.
If we throw away results that are not easily commensurable, keeping only the strongest and most general, do we expect to make non-negligibly better predictions than without this model?
If we try commensurability with various approximations/heuristics (e.g., “risk factor A increases heart attack risk under age 50 by 5%” => “and gives us zero knowledge for ages 50+” combined with existing actuarial data on heart attack risk by age to get the result’s impact on lifetime heart attack risk), do we expect to make better predictions than without this model?