Do the Best Thing

A Googler friend of mine once asked me, “If you had a program that was running slow, what would you do to fix it?”

I said, “Run it through a profiler, see which step was slowest, and optimize that step.”

He said “Yeah, that’s the kind of thinking style Google optimizes for in hiring. I’m the odd one out because I don’t think that way.”

“That way” of thinking is a straightforward, naive, and surprisingly powerful mindset. Make a list of all your problems, and try to fix the biggest tractable one.  Sure, there are going to be cases when that’s not the best possible solution — maybe the slowest step can’t be optimized very much as is, but if you rearrange the entire program, or obviate the need for it in the first place, your problem would be solved. But if you imagine a company of people who are all drilled in fix the biggest problem first, that company would have a systematic advantage over a company full of people who poke at the code at random, or according to their varied personal philosophies, or not at all.  Just doing the naively best thing is a powerful tool; enough that it’s standard operating procedure in a company reputed to have the best software engineers in the world.

There are other heuristics that have a similar spirit.

Making a list of pros and cons, a decision procedure started by Ben Franklin and validated by Gerd Gigenzehrer’s experiments, is an example of “do the best thing” thinking. You make your considerations explicit, and then you score them, and then you decide.

Double-entry bookkeeping, which was arguably responsible for the birth of modern capitalism, is a similar innovation; you simply keep track of expenses and revenues, and aim to reduce the former and increase the latter.  It sounds like an obvious thing to do; but reliably tracking profits and losses means that you can allocate resources to the activities that produce the highest profits.  For the first time you have the technology to become a profit-maximizer.

The modern craze of fitness-tracking is a “do the best thing” heuristic; you pick a naive metric, like “calories eaten – calories burned”, and keep track of it, and try to push it in the desired direction.  It’s crude, but it’s often a lot more effective than people’s default behavior for achieving goals — people who self-monitor diet and weight are more likely to achieve long-term deliberate weight loss.

Deciding to give to the charity that saves the most lives per dollar is another example of “do the best thing” — you pick a reasonable-sounding ranking criterion, like cost-effectiveness, and pick things at the top of the list.

Notice that I’m not calling this optimization, even though that’s what it’s often called in casual language.  Optimization, in the mathematical sense, is about algorithms for maximizing some quantity.  DTBT isn’t an algorithm, it’s what comes before implementing an algorithm. It’s just “pick an obvious-seeming measure of importance, and then prioritize by that.”  The “algorithm” may be trivial — just “sort by this metric”.  The characteristic quality is picking a metric and tracking it; and, in particular, picking an obviousstraightforward, reasonable-sounding metric.

Now, there are critics of the DTBT heuristic. “Optimize Everything” can evoke, to some people, a dystopian future of robotic thinking and cold indifference to human values.  “Minimize calories”, taken in isolation, is obviously a flawed approach to health.  “Maximize GDP growth” is obviously an imperfect approach to economic policy.  One can be very skeptical of DTBT because of the complicated values that are being erased by a simple, natural-seeming policy.  This skepticism is present in debates over legibility.  I suspect that some Marxist critiques of “neoliberalism” are partly pointing at the fact that a measure of goodness (like “GDP growth” or even “number of people no longer living in poverty”) is not identical with goodness as judged by humans, even though it’s often treated as though it obviously is.

The DTBT response is “Yeah, sure, simplifications simplify.  Some simplifications oversimplify to the point of being counterproductive, but a lot of them are clearly productive. What people were doing before we systematized and improved processes was a lot of random and counterproductive cruft, not deep ancestral wisdom. Ok, Westerners undervalued traditional societies’ agriculture techniques because they were racist; that’s an admitted failure. Communists didn’t understand economics; that’s another failure. Nobody said that it’s impossible to be wrong about the world. But use your common sense — people shoot themselves in the foot through procrastination and weakness of will and cognitive bias and subconscious self-sabotage all the time.  Organizations are frequently disorganized and incompetent and just need some serious housecleaning. Do you seriously believe it’s never possible to just straighten things up?”

Here’s another way of looking at things. Behaviors are explained by a multitude of causes. Some of those causes are unendorsed. You don’t, for example, usually consider “I got a bribe” as a good reason to fund a government program.  DTBT is about picking a straightforwardly endorsable cause and making it master. This applies both intrapersonally and interpersonally. “Optimizing for a personal goal” means taking one of the drivers of your behavior (the goal) and setting it over your other internal motivations.  “Optimizing for a social outcome” means setting the outcome above all the motivations of the individual people who make up your plan.

In some cases, you can reduce the conflict between the “master goal” and the sub-agents’ goals. Popular vote is one way of doing this: the “master goal” (the choice that gets the most votes) minimizes the sum of differences between the chosen outcome and the preferences of each voter.  Free trade is another example: in a model where all the agents have conventional utility functions, permitting all mutually-beneficial trades between individuals maximizes the sum of individual utilities.  If your “master goal” is arbitrary, you can cause a lot of pain for sub-agents.  (E.g.: governments that tried to ‘settle’ nomadic peoples did not treat the nomadic peoples very well.) If your “master goal” is universal, in some sense, if it includes everybody or lets everybody choose, then you can minimize total frustration.

Of course, this isn’t an objectively universal solution to the problem — some people might say “my frustration inherently matters more than his frustration” or  “you aren’t properly measuring the whole of my frustration.”

Another way to reduce conflict is to see if there are any illusory conflicts that disappear upon greater communication.  This is what “dialogue” and “the public sphere” and “town halls” are all about. It’s what circling is about.  It’s what IFS is about. (And, more generally, conflict resolution and psychotherapy.)

And, of course, once again, this isn’t an objectively universal solution to the problem — there might actually be irreconcilable differences.

The pure antithesis of DTBT would be wu-wei — don’t try to do anything, everything is already fine as it is, because it is caused by human motivations, and all human motivations are legitimate. It would be “conservative” in a way that political conservatives would hate: if the world is going to hell in a handbasket, let it, because that’s clearly what people want and it would be arrogant to suppose that you know better.

This extreme is obviously at least as absurd as the DTBT extreme of “All the world’s problems would be solved if people would just stop being idiots and just do the best thing.

It seems more productive to resolve conflicts by the kinds of “universalizing” or “discourse” moves described above.  In particular, to try to discuss which kinds of motivations are endorsable, and argue for them.

One example of this kind of argument is “No, we can’t use the CEO’s new “optimized” procedure, because it wouldn’t work in our department; here’s where it would break.”  Sheer impracticality is pretty much always considered a licit reason not to do something among reasonable people, so a reasonable CEO should listen to this criticism.

Another, more meta example is discussing the merits of a particular kind of motivation. Some people think status anxiety is a legitimate reason to push for egalitarian policies; some don’t. You can argue about the legitimacy of this reason by citing some other shared value — “people with lower relative status are less healthy” appeals to concerns about harm, while “envy is an ugly motivation that prompts destructive behavior” appeals to concerns about harm and virtue.

Geeks are often accused of oversimplifying human behavior along DTBT lines.  “Why can’t we just use ‘ask culture’ instead of ‘guess culture’?” “Why can’t we just get rid of small talk?” “Why do people do so much tribal signaling?”  Well, because what most people want out of their social interactions is more complicated than a naive view of optimality and involves a lot of subconscious drives and hard-to-articulate desires. What it comes down to is actually a debate about what motivations are endorsable.  Maybe some drives, like the desire to feel superior to others, are ugly and illegitimate and should be bulldozed by a simple policy that doesn’t allow people to satisfy them.  Or maybe those drives are normal and healthy and a person wouldn’t be quite human without them.  Which drives are shallow, petty nonsense and which are valuable parts of our common humanity?  That’s the real issue that gets hidden under debates about optimality.

I happen to lean more DTBT than most people, and it’s because I’m fairly Blue in a spiral dynamics sense.  While the stereotypical Blue is a rigid, conformist religious extremist, the fundamental viewpoint underlying it is the more general notion of “loyalty to Truth” — there are good things and bad things, and one should prefer the good to the bad and not swerve from it.  “I have set before you life and death, blessing and cursing: therefore choose life.”  From a Blue perspective, some motivations are much, much more legitimate than others, and one should sanction only the legitimate ones.  A Blue who values intellectual inquiry doesn’t sanction “saving mental effort” as a valid reason to believe false things; a Blue who values justice doesn’t sanction “desire for approval” as a valid motivation to plagiarize.  Some things are just bad and should feel bad.  From a Blue perspective, people do “counterproductive” things all the time — choices that bring no benefit, if the only benefits we count are the licit ones.  (If you counted all motivations as legitimate, then no human behavior would be truly counterproductive, because it’s always motivated by something.)  And, so, from a Blue perspective, there are lots of opportunities to make the world “more optimal”, by subordinating illegitimate motivations to legitimate ones.

The best way to argue to me against some DTBT policy is to show it fails at some legitimate goal (is impractical, harms people, etc).  A more challenging way is to argue that I ought to consider some motivation more legitimate than I do.  For instance, sex-positive philosophy and evolutionary psychology can attempt to convey to a puritanical person that sexual motivations are legitimate and valid rather than despicable.  A flat assertion that I ought to value something I don’t is not going to work, but an attempt to communicate the value might.

I think it would be better if we all moved beyond naive DTBT or simple critique of DTBT, and started trying to put into practice the kinds of dialogue that have a chance of resolving conflicts.

Measures of Awesomeness

Epistemic status: exploratory. I’m building out a model.  I know zero anthropology, so my speculations may very well be reinventing some wheel.

A visit to the anthropological wings of the Museum of Natural History can cure you of cultural relativism in a hurry. Some cultures, in some times and places, made cooler stuff than others.  In other words, the concept of “technology level” refers to a real thing.

In the context of looking at ancient pottery or metalwork, a casual museumgoer won’t see anything too strange about that assumption.  But there are a lot of uncertainties smuggled in.  How do we know that this pot is superior to that pot?  Doesn’t that depend on who you are and what you value?  When we look at an object and consider it “primitive”, does that mean anything besides mere cultural chauvinism?

Tech trees

One potential way to make the idea of “more advanced/less advanced” technology objective is to talk about a dependency graph.  If one technology is a prerequisite for another, then the “child” technology can be identified as “more advanced” than the “parent” technology. This concept has been referred to as fabricatory depth.  You need kiln-firing technology before you can produce glazed pottery; therefore kilns are a prerequisite for glazing, and glazing is more technologically advanced than kilns.  If you see people who can only make unglazed pottery and not glazed pottery, then, in that particular respect, those people are lower-tech than their glazing neighbors.

The computer game concept of a tech tree (really, it’s a tech DAG) is a simplified version of this concept. The “roots” of the tree are primitive technologies; applications and advancements on these technologies take you to higher levels of the “tech tree”, which in turn can lead to even higher levels.

This puts a partial ordering but not a total ordering on technologies. Not every pair of technologies is directly comparable.  Which means that it’s more nuanced than categories like “Stone Age” — it’s possible for Culture A to be more advanced than Culture B in one sector, but less advanced in some other sector. We’re not assuming that technologies line up in one single March of Progress; but we are noticing that some technologies are structurally, by necessity, more “foundational” or “basic” or “primitive” than others.

Thinking in terms of dependencies/prerequisites means we can talk about technology level while keeping some distance away from value judgments. Forget what’s more “useful” or “higher quality.”  A high-tech object is just an object that depends on a lot of accumulated technologies.  It’s an object that requires a long chain of skills to produce.

Note that this isn’t quite equivalent to a high degree of skill. It takes very high skill to hunt with a throwing stick. But probably not a long sequence of techniques, each of which produces many applications.  We don’t have to assume that a low degree of technological advancement implies a low degree of effort or intelligence; it just means that, for whatever reason, you don’t have a big stack of technologies that build on each other.

Prowess Metrics

If you stroll through the museum and ask yourself what makes “higher-quality” objects, you’ll notice some commonalities.

Usually, finer, more precise work is intuitively higher quality. Finer brushwork or filigree or carving, smoother carvings, finer textiles with tighter weave, straighter or more symmetrical shapes, etc.

Stronger and more durable objects tend to be higher quality.  Steel is harder than iron.  Glazing makes pottery water- and stain-resistant.

Highly replicable objects tend to be higher quality. Molds and casts and potter’s wheels allow identical objects to be produced with little effort.

More efficient objects tend to be higher quality. Structures that are lighter relative to their strength. Machines that consume less fuel or physical effort.

Bigger objects can be higher quality.  Buildings or sculptures or cities on a colossal scale.

These kinds of criteria are still relevant even in the modern day. The semiconductor industry runs on making finer, more precise circuits. Materials science continues to make glass, ceramics, and other substances stronger, more durable, and lighter.  The software and manufacturing industries run on making objects more replicable.  “Big data” refers to the technologies necessary for handling information at scale.

There seem to be some simple qualities like these which continue to be valued in technologies, over time and across industries.  I’ll call them prowess metrics, inspired by Venkat Rao’s discussion, because they’re usually related to excelling at a single property rather than being very well suited to a market niche.

Human wants are enormously varied; but certain inputs tend to be common among them. At the most elemental level, almost anything anyone could want will require things like mass and energy; therefore mass and energy are close to universally valuable.  Prowess metrics are capacities which permit a wide variety of applications.

As you go up a tech tree, producing technologies that are necessary for technologies that are necessary for technologies, the technologies that have a lot of descendants will tend to be high on prowess metrics.  If you develop a technique for very reliable duplication, or a stronger construction material, there are a lot of technologies that can be derived from it.  In fact, we can even define prowess metrics as the qualities that predict having a lot of descendants on the tech tree.  They are what make a technology “generative”, productive of new technology.  Prowess metrics might also be expected to correlate with being high on the tech tree, which makes sense if you picture a long-tailed distribution of technology — most “chains” peter out early, but if you’ve reached a certain level of technology, that means you’re more likely to continue going to yet more advanced technologies.

Being high on a prowess metric is no guarantee that an object will be useful.  Usefulness is defined by humans and the context in which the object would be used.  The fastest cars in the world are novelty items, because most people don’t actually need or want the fastest cars in the world.  Identifying the usefulness of an object to actual humans is the basic function of marketing, and prowess metrics can’t substitute for that.  Usefulness is about utility and value judgments and all that squishy stuff.

However, I hypothesize, prowess metrics are decent predictors of the utility of objects. If you have a way to make your widget faster, bigger, finer, stronger, lighter, cheaper, etc, it’s at least worth privileging the hypothesis that there’s going to be demand for it.

The Innovator’s Dilemma defines a disruptive innovation as one which satisfices on a bunch of the standard metrics, optimizes hard on a different metric, and finds a new market that really values this new metric. Usually, the examples given in the book of all the above metrics fit the pattern of prowess metrics; things like size, speed, cost, etc.  Which prowess metrics matter depends on the market and the use case. But that prowess metrics matter is not really disputed.

In engineering-focused domains like the excavator industry or the semiconductor industry, the technical performance of the machinery matters a lot to purchasers. As you move “up the tech tree” to higher-level applications and consumer-facing products, technical prowess becomes less obviously relevant, but still in some sense underlies what’s possible.  Computing power still ultimately determines limits on what software applications are available.

Prowess metrics seem to be behind intuitions that look like the labor theory of value.  A worthy or excellent object, you feel, gives you a lot of something you can measure: many tons of wheat, high tensile strength, etc.  Objects that are “merely” well adapted to their context and highly desirable to their users may be perceived as having “fake” or “superficial” value, as opposed to the “real” value captured by prowess metrics.  “I care about the fuel economy of my tractor, not what color it’s painted!”

From a conventional economic perspective, this is exactly backwards: the prowess metric is only a correlate, a proxy,  of the things that really matter, the supply and demand.  And it’s not even always a good proxy!  But it’s an understandable fallacy once you accept that prowess metrics are frequently good predictors of value.  Moreover, prowess metrics tend to indicate something like “downstream” value — they mean that future applications of the technology can go farther and likely be worth more.

This is the intuition behind “We wanted flying cars, instead we got 140 characters.” Getting better prowess metrics on basic technologies (as you’d need to, to build flying cars), is substantial because it tends to open the doors to a lot of future technology and future value. Getting good product-market fit on an app built from off-the-shelf parts is less valuable in the long term because it isn’t causally necessary for as much future innovation.  (Twitter’s not a great example of a non-technological “tech” company, but it’s easy to think of better ones.)

Obviously, a lot of this is influenced by glamour — modern logistics is arguably as big a technological advance as flying cars would have been — but there still may be a meaningful, semi-rigorous notion of a foundational rather than a trivial technological improvement, and it seems to have to do with prowess metrics and going to nodes that have a lot of descendants on the tech tree.


There’s an intuition that a civilization can have a certain amount of motive power or mana or ability to do stuff.  Thriving cultures are increasing it; declining cultures are stagnating or losing it. And of course trying to make this intuition rigorous is hard, and potentially impracticable. You can’t directly rank cultures on how awesome they are.

But an armchair-observer, outside-view perspective might point to a handful of prowess metrics (literacy rates, cost of a loaf of bread, etc) and try to use them to get a rough, multidimensional picture of “ok, how rich and powerful is this society really? How much mana is there here?”

Studying material culture in this way is how, for instance, Kenneth Pomeranz argued that China was richer than Europe until the 19th century.  The Chinese consistently consumed more calories and more meat, had more furniture in their homes, and even read more books, than the Europeans. Comparing the historical “GDP’s” of China and Europe is uncertain and subject to statistical shenanigans; but if the Chinese consistently seem to have more of all the necessities and luxuries of life, then it starts to seem undeniable that, for most definitions of “rich”, they were richer.

The “material culture” approach is pretty similar to the “look at a bunch of prowess metrics” approach.  You make no attempt to have a single metric of “intrinsic value”. You can only make pretty modest claims. You merely observe that if a culture seems to be booming along a lot of highly general and “upstream” metrics, then there’s probably something vaguely positive going on.  This is the heuristic behind the kinds of claims in The Great Stagnation — things like ‘maximum vehicular speed isn’t increasing’ or ‘life expectancy isn’t increasing.’  Taken together, a lot of stagnant metrics paint a dispiriting portrait.

With a tech-tree model, most of the dependencies are unobserved, including (of course) all the future ones.  It’s hard to work with empirically, and even if you did know the structure, it would be impossible to put a single number on “the tech level.” If we can talk about that kind of structure at all, it’ll be with simplifying models — things like prowess metrics that are shared across many technologies and correlate with technological advancement.  You still can’t say much objectively about “how much mana do we have?” — as always, there’s an irreducible element of selection and storytelling.  But this at least, I think, gives us a starting point to concretize the questions and hypotheses.

The Peril of the Sublime

The “sublime”, as defined by writers such as Burke, Kant, and Keats, is an experience of immensity and awe.  “THE PASSION caused by the great and sublime in nature, when those causes operate most powerfully, is astonishment; and astonishment is that state of the soul, in which all its motions are suspended, with some degree of horror.”  We experience the sublime when we see vast mountains, violent storms, towering pyramids, dazzling details of pattern,the infinity of space.

The standard psychedelic or religious experiences are classic examples of the sublime.  The impression of infinite hugeness or infinite smallness, the impression of endless fractal intricacy, the impression of infinite recursion, the impression of vast significance  — these are intimations of infinity.

Indeed, it may be appropriate to simply define the sublime as the subjective experience of infinity.

But what, concretely, is the experience of infinity?

I suspect that it is merely the experience of being unable to measure or count. A person can innately see one or two objects and recognize them as one or two, without counting; if you show her more than seven or so, her first perception is of “many”.  The experience of uncountable multiplicity is the experience of losing count. “And he brought him forth abroad, and said, Look now toward heaven, and tell the stars, if thou be able to number them: and he said unto him, So shall thy seed be.”  Our metaphor for impossibly many is the innumerable stars. We measure the size of infinities by trying (and failing) to put them into one-to-one correspondence with each other.  Countlessness provokes awe.

So, too, does scalelessness: when we cannot estimate size, we become dizzy with vastness, with smallness, with the scale-free multiplicity of fractals. Timelessness provokes awe, with thoughts of “eternity in a grain of sand.”  When something breaks our units of measure, when it appears to go beyond them, we experience that as infinity.

If you think of perception as working through convolutional neural nets, you notice that higher level nodes are averages or invariants over measurements — the same object, independent of position or rotation or color shift.  Allow the neural net to run its outputs into its inputs long enough, and you begin to see the kinds of images that show up in DeepDream — highly multiscale, intricately patterned.  Some of these higher-level invariants are, clearly, being activated very intensely if the network is allowed to “ruminate” on its own contents.

I might speculate that ordinary perception puts something like frames or limits on this kind of recursive rumination.  As an artist drawing a picture first sketches the proportions of the main objects, before filling in details, to make sure nothing is out of balance, in ordinary perception we put objects or ideas in proportion or in context with the rest of our world. They have a finite size, a particular place in time, a finite importance, and so on.  If this ability to gauge proportion is baffled or broken, we get the impression of infinity and sublimity.

The sublime naturally inspires worship. When something appears to be infinitely important, infinitely vast or complex, eternal or beyond time, how can we not ascribe it with huge significance?  We can easily claim that some particular sacred cow is not, in fact, sacred; but to deny the importance of the sublime is tantamount to saying sacredness itself is not sacred.  From the perspective of someone who has experienced raw barefaced wonder, an enemy of the sublime is a desecrator, a dirty vandal, trying to reduce us all to his level of prosaic blindness.

I am not a vandal. But I am a scientist by training. And so, I find myself in a complicated relationship with the sublime.

The danger of worshiping the sublime is that it can all too easily reduce to worshiping one’s own incapacity.  The sublime’s favorite phrase is “I can’t even.”  It is the inability to put things into context and perspective.  To be overwhelmed by a wildflower is a kind of elevated sensitivity and acuteness of observation; but if you can be overwhelmed by anything, then you have a failure to prioritize.  If you perceive “infinity” as simply beyond what you can measure or comprehend, then seeking a sense of infinity is seeking your own ignorance.  You find yourself looking backwards and inwards, towards childhood, towards faith, towards “unmediated” perception, trying to peel back the layers of ordinary reason towards something “beyond” the workaday world.

I suspect that this kind of a backwards mental move is a fundamental kind of error. I’ve done it myself, enough times to recognize the pattern. You remember experiencing something as awe-inspiring and mysterious; you want to recreate that experience; you try to come up with a rational structure that preserves that intuition of mystery, that delicious sublimity; and look! you find you have come to a dead end, and the facts force you to acknowledge failure.

The first and most canonical example of this pattern is trying to prove the existence of God.  I recognize a similar kind of flavor in trying to defend superrationality, trying to refuse the No-Free-Lunch theorem, and trying to argue against digital physics.  There’s a deep appeal in ideas that seem to cut through our finite, parochial, incremental limitations to something “beyond,” but I’ve frequently found those intuitions impossible to justify.

Beyondness is sublime; locality is mundane.  But “beyond” is not a place you can get to.  We always represent infinity in terms of the failure of the finite. “For every N, there exists an n such that a_n > N.”  In other words: every bound will break. This is a temptation towards falling in love with brokenness.

The danger is that in reaching for infinity, reaching for the sublime, you wind up committing a kind of self-harm. Stunting your actual, real-world powers; admitting frank impossibilities into your belief system; seeking not the universe’s bigness but your own smallness.

The universe really is vast and awe-inspiring — it is not an accident that Carl Sagan, our contemporary poet of transcendence, was an astronomer — but to experience awe at the genuinely vast, you have to actually be moving outward along with the scientists, claiming old territory as comprehensible and well-mapped even as you look toward uncharted skies.  There’s a robust, outward-facing experience of the sublime that is dual to the “stolid, prosaic” approach that treats the world as finite and moderate in importance; if you can be cool-headed and proportionate and realistic, you can take on grand adventures and explorations.


Epistemology Sequence, Part 1: Ontology

This sequence of posts is an experiment in fleshing out how I see the world. I expect to revise and correct things, especially in response to discussion.

“Ontology” is an answer to the question “what are the things that exist?”

Consider an reasoning agent making decisions. This can be a person or an algorithm.  It has a model of the world, and it chooses the decision that has the best outcome, where “best” is rated by some evaluative standard.

A structure like this requires an ontology — you have to define what are the states of the world, what are the decision options, and so on.  If outcomes are probabilistic, you have to define a sample space.  If you are trying to choose the decision that maximizes the expected value of the outcome, you have to have probability distributions over outcomes that sum to one.

[You could, in principle, have a decision-making agent that has no model of the world at all, but just responds to positive and negative feedback with the algorithm “do more of what rewards you and less of what punishes you.” This is much simpler than what humans do or what interesting computer programs do, and leads to problems with wireheading. So in this sequence I’ll be restricting attention to decision theories that do require a model of the world.]

The problem with standard decision theory is that you can define an “outcome” in lots of ways, seemingly arbitrarily. You want to partition all possible configurations of the universe into categories that represent “outcomes”, but there are infinitely many ways to do this, and most of them would wind up being very strange, like the taxonomy in Borges’ Celestial Emporium of Benevolent Knowledge:

Those that belong to the emperor

Embalmed ones

Those that are trained

Suckling pigs

Mermaids (or Sirens)

Fabulous ones

Stray dogs

Those that are included in this classification

Those that tremble as if they were mad

Innumerable ones

Those drawn with a very fine camel hair brush

Et cetera

Those that have just broken the flower vase

Those that, at a distance, resemble flies

We know that statistical measurements, including how much “better” one decision is than another, can depend on the choice of ontology. So we’re faced with a problem here. One would presume that an agent, given a model of the world and a way to evaluate outcomes, would be able to determine the best decision to make.  But the best decision depends on how you construct what the world is “made of”! Decision-making seems to be disappointingly ill-defined, even in an idealized mathematical setting.

This is akin to the measure problem in cosmology.  In a multiverse, for every event, we think of there as being universes where the event happens and universes where the event doesn’t happen. The problem is that there are infinitely many universes where the event happens, and infinitely many where it doesn’t. We can construct the probability of the event as a limit as the number of universes becomes large, but the result depends sensitively on precisely how we do the scaling; there isn’t a single well-defined probability.

The direction I’m going to go in this sequence is to suggest a possible model for dealing with ontology, and cash it out somewhat into machine-learning language. My thoughts on this are very speculative, and drawn mostly from introspection and a little bit of what I know about computational neuroscience.

The motivation is basically a practical one, though. When trying to model a phenomenon computationally, there are a lot of judgment calls made by humans.  Statistical methods can abstract away model selection to some degree (e.g. generate a lot of features and select the most relevant ones algorithmically) but never completely. To some degree, good models will always require good modelers.  So it’s important to understand what we’re doing when we do the illegible, low-tech step of framing the problem and choosing which hypotheses to test.

Back when I was trying to build a Bayes net model for automated medical diagnosis, I thought it would be relatively simple. The medical literature is full of journal articles of the form “A increases/decreases the risk of B by X%.”  A might be a treatment that reduces incidence of disease B; A might be a risk factor for disease B; A might be a disease that sometimes causes symptom B; etc.  So, think of a graph, where A and B are nodes and X is the weight between them. Have researchers read a bunch of papers and add the corresponding nodes to the graph; then, when you have a patient with some known risk factors, symptoms, and diseases, just fill in the known values and propagate the probabilities throughout the graph to get the patient’s posterior probability of having various diseases.

This is pretty computationally impractical at large scales, but that wasn’t the main problem. The problem was deciding what a node is. Do you have a node for “heart attack”? Well, one study says a certain risk factor increases the risk of having a heart attack before 50, while another says that a different risk factor increases the lifetime number of heart attacks. Does this mean we need two nodes? How would we represent the relationship between them? Probably having early heart attacks and having lots of heart attacks are correlated, but we aren’t likely to be able to find a paper that quantifies that correlation.  On the other hand, if we fuse the two nodes into one, then the strengths of the risk factors will be incommensurate.  There’s a difficult judgment call inherent in just deciding what the primary “objects” of our model of the world are.

One reaction is to say “automating human judgment is harder than you thought”, which, of course, is true. But how do we make judgments, then? Obviously I’m not going to solve open problems in AI here, but I can at least think about how to concretize quantitatively the sorts of things that minds seem to be doing when they define objects and make judgments about them.

Changing My Mind: Radical Acceptance

I used to be really against the notion of radical acceptance.  Or, indeed, any kind of philosophy that counseled not getting upset about bad things or not stressing out over your own flaws.

The reason why is that I don’t like the loss of distinctions.  “Science” means “to split.”

If you dichotomize  “justice vs. mercy”, “intense vs. relaxed”, “logic vs. intuition”, and so on, I’m more attracted to the first category. I identify with Inspector Javert and Toby Ziegler. I admire adherence to principle.

And there’s a long tradition of maligning “intense” people like me, often with anti-Semitic or ableist overtones, and I tend to be suspicious of rhetoric that pattern-matches to those associations.  There’s a pattern that either frames intense people as cruel, in a sort of “Mean Old Testament vs. Nice New Testament” way, or as pathetic (“rigid”, “obsessive”, “high need for cognitive closure”, etc).  “Just relax and don’t sweat the small stuff” can be used to excuse backing out of one’s commitments, stretching the truth, or belittling others’ concerns.

There’s also an aesthetic dimension to this. One can prefer crispness and sharpness and intensity to gooey softness.  I think of James Joyce, an atheist with obvious affection for the Jesuitical tradition that taught him.

So, from where I stand, “radical acceptance” sounds extremely unappealing. Whenever I heard “You shouldn’t get mad at reality for being the way it is”, I interpreted it as “You shouldn’t care about the things you care about, you shouldn’t try to change the world, you shouldn’t stand up for yourself, you shouldn’t hold yourself to high standards.  You’re a weird little girl and you don’t matter.”

And of course I reject that. I’m still passionate, still intense, still trying to have integrity, and I don’t ever want to stop caring about the difference between true and false.

But I do finally grok some things about acceptance.

  • It’s just not objectively true that anything short of perfection is worth scrapping.  I can be a person with flaws and my life is still on net extremely worthwhile.  That’s not “bending the rules”, it’s understanding cost-benefit analysis.
  • There’s a sense in which imperfections are both not good and completely okay.  For example: I have a friend that I’ve often had trouble communicating with. Sometimes I’ve hurt his feelings, sometimes he’s hurt mine, pretty much always through misunderstanding.  My Javertian instinct would be to feel like “This friendship is flawed, I’ve sullied it, I need to wipe the slate clean.” But that’s impossible.  The insight is that the friendship is not necessarily supposed to be unsullied.  Friction and disagreement are what happens when you’re trying to connect deeply to people who aren’t exactly like you.  The friendship isn’t falling short of perfection, it’s something rough I’m building from scratch.
  • “Roughness” is a sign that you’re at a frontier. “Mistakes are the portals of discovery.”  Even the most admirable people have experienced disappointment and tried things that didn’t work.  Life doesn’t have to be glossy or free of trouble to be glorious.  Getting through hard times, or making yourself a better person, are legitimate achievements.  Optimizing for “build something” is life-giving; optimizing for “have no flaws” is sterile.
  • Hating injustice, or hating death, is only a starting point. Yes, bad things really are bad, and it’s important to validate that.  Sometimes you have to mourn, or rage, or protest. But what then?  How do you fix the problem?  Once you’ve expressed your grief or anger, once you’ve made people understand that it’s really not all right, what are you going to do?  It becomes a question to investigate, not a flag to raise.  And sometimes people seem less angry, not because they care less, but because they’ve already moved on to the investigation and strategy-building phase of the work.
  • One idea that allows me to grok this is the Jewish idea that G-d chooses not to destroy the world.  Is the world flawed? Heck yes! Is it swarming with human beings who screw up every day?  You bet!  Is it worth wiping out?  No, and there’s a rainbow to prove it.  Which means that the world, in all its messy glory, is net good.  It beats hell out of hard vacuum.

Dopamine, Perception, and Values

The pop-neuroscience story is that dopamine is the “reward” chemical.  Click a link on Facebook? That’s a hit of dopamine.

And there’s obviously an element of truth to that. It’s no accident that popular recreational drugs are usually dopaminergic.  But the reality is a little more complicated. Dopamine’s role in the brain — including its role in reinforcement learning — isn’t limited to “pleasure” or “reward” in the sense we’d usually understand it.

The basal ganglia, located at the base of the forebrain, below the cerebral cortex and close to the limbic system, have a large concentration of dopaminergic neurons.  This area of the brain deals with motor planning, procedural learning, habit formation, and motivation.  Damage causes movement disorders (Parkinson’s, Huntington’s, tardive dyskinesia, etc) or mental illnesses that have something to do with “habits” (OCD and Tourette’s).  Dopaminergic neurons are relatively rare in the brain, and confined to a smal number of locations: the striatal area (basal ganglia and ventral tegmental area), projections to the prefrontal cortex, and a few other areas where dopamine’s function is primarily neuroendocrine.

Dopamine, in other words, is not an all-purpose neurotransmitter like, say, glutamate (which is what the majority of neurons use.)  Dopamine does a specific thing or handful of things.

The important thing about dopamine response to stimuli is that it is very fast.  A stimulus associated with a reward causes a “phasic” (spiky) dopamine release within 70-100 ms. This is faster than the gaze shift (mammals instinctively focus their eyes on an unexpected stimulus).  It’s even faster than the ability of the visual cortex to distinguish different images.  Dopamine response happens faster than you can feel an emotion.  It’s prior to emotion, it’s prior even to the more complicated parts of perception.  This means that it’s wrong to interpret dopamine release as a “feel-good” response — it happens faster than you can feel at all.

What’s more, dopamine release is also associated with things besides rewards, such as an unexpected sound or an unpredictably flashing light.  And dopamine is not released in response to a stimulus associated with an expected reward; only an unexpected reward.  This suggests that dopamine has something to do with learning, not just “pleasure.”

Redgrave’s hypothesis is that dopamine release is an agency detector or a timestamp.  It’s fast because it’s there to assign a cause to a novel event.  “I get juice when I pull the lever”, emphasis on when.  There’s a minimum of sensory processing; just a little basic “is this positive or negative?”  Dopamine release determines what you perceive.  It creates the world around you.  What you notice and feel and think is determined by a very fast, pre-conscious process that selects for the surprising, the pleasurable, and the painful.

Striatal dopamine responses are important to perception.  Parkinson’s patients and schizophrenics treated with neuroleptics (both of whom have lowered dopamine levels) have abnormalities in visual contrast sensitivity.  Damage to dopaminergic neurons in rats causes sensory inattention and inability to orient towards new stimuli.

A related theory is that dopamine responds to reward prediction errors — not just rewards, but surprising rewards (or surprising punishments, or a surprisingly absent reward or punishment).  These prediction errors can depend on models of what the individual expects to happen — for example, if the stimulus regularly reverses on alternate trials, the dopamine spikes stop coming because the pattern is no longer surprising.

In other words, what you perceive and prioritize depends on what you have learned.  Potentially, even, your explicit and conscious models of the world.

The incentive salience hypothesis is an attempt to account for the fact that damage to dopamine neurons does not prevent the individual from experiencing pleasure, but damages his motivation to work for desired outcomes.  Dopamine must have something to do with initiating purposeful actions.  The hypothesis is that dopamine assigns an “incentive value” to stimuli; it prioritizes the signals, and then passes them off to some other system (perceptual, motor, etc.)  Dopamine seems to be involved in attention, and tonic dopamine deficiency tends to be associated with inattentive behavior in humans and rats.  (Note that the drugs used to treat ADHD are dopamine reuptake inhibitors.)  A phasic dopamine response says “Hey, this is important!”  If the baseline is too low, you wind up thinking everything is important — hence, deficits in attention.

One way of looking at this is in the context of “objective” versus “subjective” reality.  An agent with bounded computation necessarily has to approximate reality.  There’s always a distorting filter somewhere.  What we “see” is always mediated; there is no place in the brain that maps to a “photograph” of the visual field.   (That doesn’t mean that there’s no such thing as reality — “objective” probably refers to invariants and relationships between observers and time-slices, ways in which we can infer something about the territory from looking at the overlap between maps.)

And there’s a sort of isomorphism between your filter and your “values.”  What you record and pay attention to, is what’s important to you. Things are “salient”, worth acting on, worth paying attention to, to the extent that they help you gain “good” stuff and avoid “bad” stuff.  In other words, things that spike your dopamine.

Values aren’t really the same as a “utility function” — there’s no reason to suppose that the brain is wired to obey the Von Neumann-Morgenstern axioms, and in fact, there’s lots of evidence suggesting that it’s not.  Phasic dopamine release actually corresponds very closely to “values” in the Ayn Rand sense.  They’re pre-conscious; they shape perceptions; they are responses to pleasure and pain; values are “what one acts to gain and keep”, which sounds a whole lot like “incentive salience.”

Values are fundamental, in the sense that an initial evaluation of something’s salience is the lowest level of information processing. You are not motivated by your emotions, for instance; you are motivated by things deeper and quicker than emotions.

Values change in response to learning new things about one’s environment. Once you figure out a pattern, repetition of that pattern no longer surprises you. Conscious learning and intellectual thought might even affect your values, but I’d guess that it only works if it’s internalized; if you learn something new but still alieve in your old model, it’s not going to shift things on a fundamental level.

The idea of identifying with your values is potentially very powerful.  Your striatum is not genteel. It doesn’t know that sugar is bad for you or that adultery is wrong.  It’s common for people to disavow their “bestial” or “instinctive” or “System I” self.  But your values are also involved in all your “higher” functions.  You could not speak, or understand language, or conceive of a philosophical idea, if you didn’t have reinforcement learning to direct your attention towards discriminating specific perceptions, motions, and concepts. Your striatum encodes what you actually care about — all of it, “base” and “noble.”  You can’t separate from it.  You might be able to rewire it.  But in a sense nothing can be real to you unless it’s grounded in your values.

Glimcher, Paul W. “Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis.” Proceedings of the National Academy of Sciences 108.Supplement 3 (2011): 15647-15654.

Ljungberg, T., and U. Ungerstedt. “Sensory inattention produced by 6-hydroxydopamine-induced degeneration of ascending dopamine neurons in the brain.” Experimental neurology 53.3 (1976): 585-600.

Marshall, John F., Norberto Berrios, and Steven Sawyer. “Neostriatal dopamine and sensory inattention.” Journal of comparative and physiological psychology94.5 (1980): 833.

Masson, G., D. Mestre, and O. Blin. “Dopaminergic modulation of visual sensitivity in man.” Fundamental & clinical pharmacology 7.8 (1993): 449-463.

Nieoullon, André. “Dopamine and the regulation of cognition and attention.”Progress in neurobiology 67.1 (2002): 53-83.

Redgrave, Peter, Kevin Gurney, and John Reynolds. “What is reinforced by phasic dopamine signals?.” Brain research reviews 58.2 (2008): 322-339.

Schultz, Wolfram. “Updating dopamine reward signals.” Current opinion in neurobiology 23.2 (2013): 229-238.

The World is Simple

In the world of image and signal processing, a paper usually takes the form “We can prove that this algorithm gives such-and-such accuracy on images that have such-and-such regularity property.  We tried it on some examples and it worked pretty well.  We’re going to assume that most of the images we might care about have the regularity property.”

For instance, sparse coding and compressed sensing techniques assume that the representation of the image in some dictionary or basis is “sparse”, i.e. has very few nonzero coefficients.

There’s some biological justification for this: the mammalian brain seems to recognize images with an overcomplete dictionary of Gabor filters, only a few of which are firing at any given time.

There’s a basic underlying assumption that the world is, in some sense, simple. This is related to ideas around the “unreasonable effectiveness of mathematics.”  Observations from nature can be expressed compactly.  That’s what it means to live in an intelligible universe.

But what does this mean specifically?

One observation is that the power spectrum, that is, the square of the (discrete) Fourier transform, of natural images obeys a power law

S(k) = \frac{A}{k^{2-\eta}}

where \eta is usually small.  It’s been hypothesized that this is because natural images are composed of statistically independent objects, whose scales follow a power-law distribution.

What does this mean?  One way of thinking about it is that a signal with a power-law spectrum exists at all scales.  It’s referred to as “pink noise”.  You can generate a signal with a power-law spectrum by defining a “fractional Brownian motion“.  This is like a Brownian motion, except the increment from time t to time s is a normal distribution with mean zero and variance |t-s|^{2H} for the Hurst exponent H, which equals 1/2 in the special case of a Brownian motion.  The covariance function of a fractional Brownian motion is homogeneous of degree 2H.  Fractional Brownian motions are Lipschitz-continuous with exponent H.

As a matter of fact, any function whose wavelet transform is homogeneous of degree \lambda is a fractional Brownian motion of degree (\lambda-1)/2.

Cosma Shalizi has a good post on this phenomenon.  Systems in thermodynamic equilibrium, i.e. “boring” systems, have correlations that decay exponentially in space and time. Systems going through phase transitions, like turbulent flows, and like most things you’ll observe in nature, have correlations that decay slower, with a power law. There are many simple explanations for why things might wind up being power-law-ish.

Imagine you have some set of piles, each of which grows, multiplicatively, at a constant rate. New piles are started at random times, with a constant probability per unit time. (This is a good model of my office.) Then, at any time, the age of the piles is exponentially distributed, and their size is an exponential function of their age; the two exponentials cancel and give you a power-law size distribution. The basic combination of exponential growth and random observation times turns out to work even if it’s only the mean size of piles which grows exponentially.

If we’re in a domain with \eta < 1 or H > 1/2, we’re looking at a function with a square-summable Fourier transform.  This is why L^2 assumptions are not completely insane in the domain of signal processing, and why it makes sense to apply wavelet (and related) transforms and truncate after finitely many coefficients.

Not all regularity assumptions in the signal-processing and machine-learning world are warranted.  Image processing is full of bounded-variation methods like Mumford-Shah, and at least one paper is claiming to observe that natural images are not actually BV.  My own research deals with the fact that Riemannian assumptions in manifold learning are not realistic for most real-world datasets, and that manifold-learning methods need to be extended to sub-Riemannian manifolds (or control manifolds).  And I’m genuinely uncertain when sparsity assumptions are applicable.

But decaying power spectra are a pretty robust empirical observation, and not for shocking reasons. It’s a pretty modest criterion of “simplicity” or “intelligibility”, it’s easy to reproduce with simple random processes, it’s kind of unsurprising that we see it all over the place. And it allows us to pretend we live in Hilbert spaces, which is always a win, because you can assume that Fourier transforms converge and discrete approximations of projections onto orthogonal bases/dictionaries are permissible.

Power-law spectrum decay is a sort of minimum assumption of simplicity that we can expect to see in all kinds of data sets that were not generated “adversarially.”  It’s not a great mystery that the universe is set up that way; it’s what we would expect.  Stronger assumptions of simplicity are more likely to be domain-specific.  If you know what you’re looking at (it’s a line drawing; it’s a set of photographs of the same thing taken at different angles; it’s a small number of anomalies in an otherwise featureless plane; etc) you can make more stringent claims about sparsity or regularity.  But there’s a certain amount of simplicity that we’re justified in assuming almost by default, which allows us to use the tools of applied harmonic analysis in the first place.

Taste and Consumerism

Peter Drucker, whose writings form the intellectual foundation behind the modern management corporation, defined “consumerism” as follows:

What consumerism demands of business is that it actually market.  It demands that business start out with the needs, the realities, the values of the customer.  It demands that business base its reward on its contribution to the customer. … It does not ask “What do we want to sell?” but “What does the customer want to buy?”  It does not say, “This is what our product or service does.” It says “These are the satisfactions the customer looks for, values, and needs.”

Peter Drucker, Management

A consumerist business, then, is like an optimization process, and customer feedback (in the form of sales, surveys, complaints, usage statistics, and so on) is its reward function.  A consumerist business is exquisitely sensitive to customer feedback, and adapts continually in order to better satisfy customers. The consumerist philosophy is antithetical to preconceived ideas about what the company “should” make.  Lean Startups, an extreme implementation of consumerist philosophy, don’t even start with a definite idea of what the product is; the company constantly evolves into selling whatever customers want to buy.

Another way of thinking about this: in a market, there are many possible voluntary trades that could happen.  A consumerist company tries to swim towards one of these trade points and slot itself into a convenient niche.  The whole purpose of trade is to produce win-win exchanges; “consumerism” just means being flexible enough to be willing to search through all the possibilities, instead of leaving opportunities unexploited. 

Yet another, more negative slant on consumerism, is that consumerism is the absence of taste.

A manager, according to Drucker, should not ask “What do we want to sell?”  But an artist always asks “What do I want to make?”

Computer scientist Richard Hamming famously said:

And I started asking, “What are the important problems of your field?” And after a week or so, “What important problems are you working on?” And after some more time I came in one day and said, “If what you are doing is not important, and if you don’t think it is going to lead to something important, why are you at Bell Labs working on it?”

A scientist, in other words, has to care what he’s working on.  Problems that are interesting, that have the potential to be world-changing.  Any good scientist is intrinsically motivated by the problem.  If you told Hamming you’d pay him a million dollars to crochet shawls all year, he’d laugh and refuse.  If he were the kind of person who could be induced to quit working on information theory, he wouldn’t be Hamming in the first place.

Ira Glass on creativity and taste:

All of us who do creative work … we get into it because we have good taste. But it’s like there’s a gap, that for the first couple years that you’re making stuff, what you’re making isn’t so good, OK? It’s not that great. It’s really not that great. It’s trying to be good, it has ambition to be good, but it’s not quite that good. But your taste — the thing that got you into the game — your taste is still killer, and your taste is good enough that you can tell that what you’re making is kind of a disappointment to you, you know what I mean?

J.D. Salinger on writing

You wrote down that you were a writer by profession. It sounded to me like the loveliest euphemism I’ve ever heard. When was writing ever your profession? It’s never been anything but your religion. Never…

If only you’d remember before ever you sit down to write that you’ve been a reader long before you were ever a writer. You simply fix that fact in your mind, then sit very still and ask yourself, as a reader, what piece of writing in all the world Buddy Glass would most want to read if he had his heart’s choice. The next step is terrible, but so simple I can hardly believe it as I write it. You just sit down shamelessly and write the thing yourself. I won’t even underline that. It’s too important to be underlined. 

Eric S. Raymond on software:

Every good work of software starts by scratching a developer’s personal itch.

There’s very clearly a tradition, across the creative disciplines, that a creator must be intrinsically motivated by love of the work and by the ambition to make something great.  Great by what standard?  Well, this is often informed by the standards of the professional community, but it’s heavily driven by the creator’s own taste.  She has some sense of what makes a great photograph, what makes a beautiful proof, what makes an ingenious design.  

Is taste universal? Is there some sense in which Beethoven’s 9th is “really” good — is there some algorithmic regularity in it, or some resonance with the human ear, something that makes its value more than a matter of opinion?  Maybe, and maybe not.  I’m inclined to be intrigued but skeptical of simple explanations of what humans find beautiful, like Schmidthuber’s notion of low Kolmogorov complexity.  My own speculation is that hidden symmetry or simplicity is also a fundamental principle of aesthetics: a perfect circle is all right, but an intricate and non-obvious pattern, which takes more attention to notice, is more interesting to the eye, because minds take pleasure in recognition.  

Whether there are some universal principles behind aesthetics or not, in practice aesthetics are mediated through individual taste. You cannot write a book by committee, or by optimizing around a dashboard of reader feedback stats.  You can’t write a proof that way either.  

Creative original work isn’t infinitely fungible and modifiable, like other commodities. The mindset of infinitely flexible responsiveness to feedback is extremely different from the mindset of focused creation of a particular thing.  The former involves lots of task switching; the latter involves blocks of uninterrupted time.  You can’t be a maker and a manager at the same time.  Managing, responding to feedback, being a “consumerist,” requires engaging your social brain: modeling people’s responses to what you do, and adapting accordingly.  Making things involves turning that part of your brain off, and engaging directly with physical objects and senses, or abstract concepts.

Creative work is relevant to businesses.  Design, for instance, matters. So does technological innovation.  But, for a consumerist business, the constraints of creative work are unwelcome limitations.  Makers want to make a particular thing, while the company as a whole needs to find any niche where it can be profitable.

Drucker defines “knowledge workers” as skilled experts, whose loyalty is stronger to their profession than to their company.  They’ll introduce themselves with “I’m a natural language processing guy”, not “I work for IBM.”  Drucker’s “knowledge workers” seem somewhat analogous to “makers.” A cynical view of his book Management is that it’s about how to organize and motivate knowledge workers without giving them any real power.  The NLP guy’s goal is to make a tool that does an excellent job at machine translation. The manager’s goal is to promote the growth and survival of the organization.  These goals are, ideally, aligned, but when they conflict, in a Druckerian organization, the manager’s goal has to take priority.

What this means is that makers, people with taste, have a few options.

1. Work for a manager in a successful company. You’ll have serious constraints on the type of work you do, and you won’t be able to capture much of its financial value, but your work will be likely to be implemented at a large scale out in the world, and you’ll have steady income.

2. Have a small lifestyle business that caters only to the few who share your taste.  You’ll never have much money, and you won’t have large-scale impact on the world, but you’ll be able to keep your aesthetic integrity absolutely.

3. Find a patron. (Universities are the classic example, but this applies to some organizations that are nominally companies as well. A hedge fund that has a supercomputer to model protein folding is engaging in patronage.  Family money is an edge case of patronage.)  A patron is a high-status individual or group that seeks to enhance its status by funding exceptional creators and giving them freedom in their work.  You can make a moderate amount of money, you’ll get a lot of creative freedom (but you’ll be uncertain how much or for how long) and you might be able to have quite a lot of impact. The main problem here is uncertainty, because patrons are selective and their gifts often have strings attached.

4. Start a business that bets hard on your taste.  If you’re Steve Jobs, or Larry Page, your personal vision coincides with market success. You can win big on all fronts: money, impact, and creative freedom.  The risk is, of course, that the overwhelming majority of people trying this strategy fail, and you’re likely to wind up with much less freedom than 1-3.

Howard Roark, the prototypical absolutist of personal taste, picked option 2: he made the buildings he liked, for the people who shared his taste in architecture, refused to engage in any marketing whatsoever, and was nearly broke most of the time.  In fact, Ayn Rand, who has a reputation as a champion of big business, is if anything a consistent advocate of a sort of Stoic retirement. You’d be happier and more virtuous if you gave up trying to “make it big,” and instead went to a small town to ply your craft.  “Making it”, in the sense of wealth or fame or power, means making yourself beholden to lots of people and losing your individuality. 

I’m not sure I’m that much of a hipster. I don’t think the obvious thing for a creative person to do is “retirement.”  Especially not if you care about scope.  If you’ve designed a self-driving car, you don’t want to make one prototype, you want a fleet of self-driving taxis on the streets of New York.  Even more so, if you’ve discovered a cure for a disease, you want it prescribed in hospitals everywhere, not just a home remedy for your family.

What I actually plan to do is something between 1 and 3 (there’s an emerging trend for tech companies to seem to straddle the line between patrons and employers, though I’m not certain what that looks like on the inside) and explore what it would take to do 4.  

How I Read: the Jointed Robot Metaphor

“All living beings, whether born from eggs, from the womb, from moisture, or spontaneously; whether they have form or do not have form; whether they are aware or unaware, whether they are not aware or not unaware, all living beings will eventually be led by me to the final Nirvana, the final ending of the cycle of birth and death. And when this unfathomable, infinite number of living beings have all been liberated, in truth not even a single being has actually been liberated.” The Diamond Sutra

What do you do when you read a passage like this?

If you’re not a Buddhist, does it read like nonsense?

Does it seem intuitively true or deep right away?

What I see when I read this is a lot of uncertainty.  What is a living being that does not have form?  What is Nirvana anyway, and could there be a meaning of it that’s not obviously incompatible with the laws of physics?  And what’s up with saying that everyone has been liberated and nobody has been liberated?

Highly metaphorical, associative ideas, the kind you see in poetry or religious texts or Continental philosophy, require a different kind of perception than you use for logical arguments or proofs.

The concept of steelmanning is relevant here. When you strawman an argument, you refute the weakest possible version; when you steelman an argument, you engage with the strongest possible version.   Strawmanning impoverishes your intellectual life. It does you no favors to spend your time making fun of idiots.  Steelmanning gives you a way to test your opinions against the best possible counterarguments, and a real possibility of changing your mind; all learning happens at the boundaries, and steelmanning puts you in contact with a boundary.

A piece of poetic language isn’t an argument, exactly, but you can do something like steelmanning here as well.

When I read something like the Diamond Sutra, my mental model is something like a robot or machine with a bunch of joints.

Each sentence or idea could mean a lot of different things. It’s like a segment with a ball-and-socket joint and some degrees of freedom.  Put in another idea from the text and you add another piece of the robot, with its own degrees of freedom, but there’s a constraint now, based on the relationship of those ideas to each other.  (For example: I don’t know what the authors mean by the word “form”, but I can assume they’re using it consistently from one chapter to another.)  And my own prior knowledge and past experiences also constrain things: if I want the Diamond Sutra to click into the machine called “Sarah’s beliefs,” it has to be compatible with materialism (or at least represent some kind of subjective mental phenomenon encoded in our brains, which are made of cells and atoms.)

If I read the whole thing and wiggle the joints around, sooner or later I’ll either get a sense of “yep, that works, I found an interpretation I can use” when things click into place, or “nope, that’s not actually consistent/meaningful” when I get some kind of contradiction.

I picture each segment of the machine as having a continuous range of motion. But the set of globally stable configurations of the whole machine is discrete. They click into place, or jam.

You can think of this with energy landscape or simulated-annealing metaphors. Or you can think of it with moduli space metaphors.

This gives me a way to think about mystical or hand-wavy notions that’s not just free-association or “it could mean anything”, which don’t give me enough structure.  There is structure, even when we’re talking about mysticism; concepts have relationships to other concepts, and some ways of fitting them together are kludgey while others are harmonious.

It can be useful to entertain ideas, to work out their consequences, before you accept or reject them.

And not just ideas. When I go to engage in a group activity like CFAR, the cognitive-science-based self-improvement workshop where I spent this weekend, I naturally fall into the state of provisionally accepting the frame of that group.  For the moment, I assumed that their techniques would work, engaged energetically with the exercises, and I’m waiting to evaluate the results objectively until after I’ve tried them.  My “machine” hasn’t clicked completely yet — there are still some parts of the curriculum I haven’t grokked or fit into place, and I obviously don’t know about the long-term effects on my life.  But I’m going to be wiggling the joints in the back of my mind until it does click or jam.  People who went into the workshop with a conventionally “skeptical” attitude, or who went in with something like an assumption that it could only mean one thing, tended to think they’d already seen the curriculum and it was mundane.

I’m not trying to argue for credulousness.  It’s more like a kind of radical doubt: being aware there are many possible meanings or models and that you may not have pinned down “the” single correct one yet.