Yes, the probability will go up roughly under the circumstances when you’d expect it to go up. But, there will be a “gap”; it will approach some number less than 1, rather than approaching 1. How much less than 1? Unfortunately that’s totally arbitrary, based on the initial distribution of wealth among traders in the system. We can manipulate the belief in a target undecideable sentence to be high or low, by adding traders whose only job in life is to put money for/against those sentences.

]]>Can logical inductors jump to believing the universal statement with some probability that is *almost* 1? ]]>

> We generally use the symbols φ, ψ, χ to denote well-formed

formulas in some language of propositional logic L (such as a theory of first order

logic; see below),

Then you say:

> The “sentences” which are being assigned probability values by the logical inductors are sentences in L, the propositional logic language. […] Am I missing something?

Logical Inductors can use an L which is first-order logic, in which they learn about the structure of first-order proofs by observing what the deductive process over time. They then make good guesses about what first-order sentences will be proved later, in a way guaranteed to eventually do as well as any poly-time conjecture-making algorithm. So, they’re eventually pretty good at first-order logic.

What they cannot do is make the leap from assigning probabilities which approach 1 to all instances of some universal statement, to then believing the universal statement. This is called the Gaifman property, and implies a very strong kind of uncomputability; hence, is impossible to achieve in this kind of theory. This might seem like saying it “can’t do first order logic” in a way. But really, this is connected with the compactness theorem for first-order logic. No finite set of instances implies a forall statement, so it cannot be that *all* the instances imply the forall statement, either. The Gaifman property actually only makes sense in the context of second-order logic, where we can say that the set of natural numbers is the *least* set generated by the successor operation. Then, we really *should* conclude the universal statement from all the instances, because we *know* that’s all the instances. But, as I said, this becomes rather strongly uncomputable. It is not clear what kind of weaker property we might actually want, in order to model what humans are doing when forming beliefs expressible in second-order logic.

]]>A paper of mine (“Logical Prior Probability”) dealt with what we can do if we drop the computability requirement but keep it approximable. As you say, probability works just as well in this setting; it’s only that we have to drop computability requirements (because the undecidability of first-order logic makes it impossible to computably assign probability zero to contradictions). Fortunately, we can approximate it in a way which gets closer to a coherent distribution over time.

What logical induction does which that kind of approach could never do is give some guarantees about the _way_ in which a coherent distribution gets approximated; it guarantees that good heuristic guesses will be made. So logical induction formalizes the way mathematicians can make conjectures, and illustrates that it necessarily goes beyond probability theory in a certain way (while also keeping as close to probability theory as possible).

If anything, I would say that a better characterization of the limits of logical induction is that it doesn’t do _second-order_ logic. It doesn’t have notions of “finite” vs “infinite” like those which second-order logic can supply. It doesn’t come to believe in a standard model of the natural numbers.

]]>We would be looking for a regularization that identifies necessary and sufficient conditions instead of statistical properties beyond a given threshold, which might require that the system uses language to build a conceptual graph and synchronize it with other speakers. Eventually, this does not seem to very different from training the system to minimize a loss function wrt safe transportation of the learned categories between ontologies?

]]>To give just one relevant example: feed forward neural nets are acyclic, so message passing algorithms provably work for them, but not for arbitrary Bayes nets.

]]>