Dopamine, Perception, and Values

The pop-neuroscience story is that dopamine is the “reward” chemical.  Click a link on Facebook? That’s a hit of dopamine.

And there’s obviously an element of truth to that. It’s no accident that popular recreational drugs are usually dopaminergic.  But the reality is a little more complicated. Dopamine’s role in the brain — including its role in reinforcement learning — isn’t limited to “pleasure” or “reward” in the sense we’d usually understand it.

The basal ganglia, located at the base of the forebrain, below the cerebral cortex and close to the limbic system, have a large concentration of dopaminergic neurons.  This area of the brain deals with motor planning, procedural learning, habit formation, and motivation.  Damage causes movement disorders (Parkinson’s, Huntington’s, tardive dyskinesia, etc) or mental illnesses that have something to do with “habits” (OCD and Tourette’s).  Dopaminergic neurons are relatively rare in the brain, and confined to a smal number of locations: the striatal area (basal ganglia and ventral tegmental area), projections to the prefrontal cortex, and a few other areas where dopamine’s function is primarily neuroendocrine.

Dopamine, in other words, is not an all-purpose neurotransmitter like, say, glutamate (which is what the majority of neurons use.)  Dopamine does a specific thing or handful of things.

The important thing about dopamine response to stimuli is that it is very fast.  A stimulus associated with a reward causes a “phasic” (spiky) dopamine release within 70-100 ms. This is faster than the gaze shift (mammals instinctively focus their eyes on an unexpected stimulus).  It’s even faster than the ability of the visual cortex to distinguish different images.  Dopamine response happens faster than you can feel an emotion.  It’s prior to emotion, it’s prior even to the more complicated parts of perception.  This means that it’s wrong to interpret dopamine release as a “feel-good” response — it happens faster than you can feel at all.

What’s more, dopamine release is also associated with things besides rewards, such as an unexpected sound or an unpredictably flashing light.  And dopamine is not released in response to a stimulus associated with an expected reward; only an unexpected reward.  This suggests that dopamine has something to do with learning, not just “pleasure.”

Redgrave’s hypothesis is that dopamine release is an agency detector or a timestamp.  It’s fast because it’s there to assign a cause to a novel event.  “I get juice when I pull the lever”, emphasis on when.  There’s a minimum of sensory processing; just a little basic “is this positive or negative?”  Dopamine release determines what you perceive.  It creates the world around you.  What you notice and feel and think is determined by a very fast, pre-conscious process that selects for the surprising, the pleasurable, and the painful.

Striatal dopamine responses are important to perception.  Parkinson’s patients and schizophrenics treated with neuroleptics (both of whom have lowered dopamine levels) have abnormalities in visual contrast sensitivity.  Damage to dopaminergic neurons in rats causes sensory inattention and inability to orient towards new stimuli.

A related theory is that dopamine responds to reward prediction errors — not just rewards, but surprising rewards (or surprising punishments, or a surprisingly absent reward or punishment).  These prediction errors can depend on models of what the individual expects to happen — for example, if the stimulus regularly reverses on alternate trials, the dopamine spikes stop coming because the pattern is no longer surprising.

In other words, what you perceive and prioritize depends on what you have learned.  Potentially, even, your explicit and conscious models of the world.

The incentive salience hypothesis is an attempt to account for the fact that damage to dopamine neurons does not prevent the individual from experiencing pleasure, but damages his motivation to work for desired outcomes.  Dopamine must have something to do with initiating purposeful actions.  The hypothesis is that dopamine assigns an “incentive value” to stimuli; it prioritizes the signals, and then passes them off to some other system (perceptual, motor, etc.)  Dopamine seems to be involved in attention, and tonic dopamine deficiency tends to be associated with inattentive behavior in humans and rats.  (Note that the drugs used to treat ADHD are dopamine reuptake inhibitors.)  A phasic dopamine response says “Hey, this is important!”  If the baseline is too low, you wind up thinking everything is important — hence, deficits in attention.

One way of looking at this is in the context of “objective” versus “subjective” reality.  An agent with bounded computation necessarily has to approximate reality.  There’s always a distorting filter somewhere.  What we “see” is always mediated; there is no place in the brain that maps to a “photograph” of the visual field.   (That doesn’t mean that there’s no such thing as reality — “objective” probably refers to invariants and relationships between observers and time-slices, ways in which we can infer something about the territory from looking at the overlap between maps.)

And there’s a sort of isomorphism between your filter and your “values.”  What you record and pay attention to, is what’s important to you. Things are “salient”, worth acting on, worth paying attention to, to the extent that they help you gain “good” stuff and avoid “bad” stuff.  In other words, things that spike your dopamine.

Values aren’t really the same as a “utility function” — there’s no reason to suppose that the brain is wired to obey the Von Neumann-Morgenstern axioms, and in fact, there’s lots of evidence suggesting that it’s not.  Phasic dopamine release actually corresponds very closely to “values” in the Ayn Rand sense.  They’re pre-conscious; they shape perceptions; they are responses to pleasure and pain; values are “what one acts to gain and keep”, which sounds a whole lot like “incentive salience.”

Values are fundamental, in the sense that an initial evaluation of something’s salience is the lowest level of information processing. You are not motivated by your emotions, for instance; you are motivated by things deeper and quicker than emotions.

Values change in response to learning new things about one’s environment. Once you figure out a pattern, repetition of that pattern no longer surprises you. Conscious learning and intellectual thought might even affect your values, but I’d guess that it only works if it’s internalized; if you learn something new but still alieve in your old model, it’s not going to shift things on a fundamental level.

The idea of identifying with your values is potentially very powerful.  Your striatum is not genteel. It doesn’t know that sugar is bad for you or that adultery is wrong.  It’s common for people to disavow their “bestial” or “instinctive” or “System I” self.  But your values are also involved in all your “higher” functions.  You could not speak, or understand language, or conceive of a philosophical idea, if you didn’t have reinforcement learning to direct your attention towards discriminating specific perceptions, motions, and concepts. Your striatum encodes what you actually care about — all of it, “base” and “noble.”  You can’t separate from it.  You might be able to rewire it.  But in a sense nothing can be real to you unless it’s grounded in your values.

Glimcher, Paul W. “Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis.” Proceedings of the National Academy of Sciences 108.Supplement 3 (2011): 15647-15654.

Ljungberg, T., and U. Ungerstedt. “Sensory inattention produced by 6-hydroxydopamine-induced degeneration of ascending dopamine neurons in the brain.” Experimental neurology 53.3 (1976): 585-600.

Marshall, John F., Norberto Berrios, and Steven Sawyer. “Neostriatal dopamine and sensory inattention.” Journal of comparative and physiological psychology94.5 (1980): 833.

Masson, G., D. Mestre, and O. Blin. “Dopaminergic modulation of visual sensitivity in man.” Fundamental & clinical pharmacology 7.8 (1993): 449-463.

Nieoullon, André. “Dopamine and the regulation of cognition and attention.”Progress in neurobiology 67.1 (2002): 53-83.

Redgrave, Peter, Kevin Gurney, and John Reynolds. “What is reinforced by phasic dopamine signals?.” Brain research reviews 58.2 (2008): 322-339.

Schultz, Wolfram. “Updating dopamine reward signals.” Current opinion in neurobiology 23.2 (2013): 229-238.

Do Rational People Exist?

Does it make sense to talk about “rational people”?

That is, is there a sub-population of individuals who consistently exhibit less cognitive bias and better judgment under uncertainty than average people?  Do these people have the dispositions we’d intuitively associate with more thoughtful habits of mind?  (Are they more flexible and deliberative, less dogmatic and impulsive?)

And, if so, what are the characteristics associated with rationality?  Are rational people more intelligent? Do they have distinctive demographics, educational backgrounds, or neurological features?

This is my attempt to find out what the scientific literature has to say about the question.  (Note: I’m going to borrow heavily from Keith Stanovich, as he’s the leading researcher in individual differences in rationality. My positions are very close, if not identical, to his, though I answer some questions that he doesn’t cover.)

A minority of people avoid cognitive biases

Most of the standard tests for cognitive bias find that most study participants “fail” (display bias) but a minority “pass” (give the rational or correct answer).

The Wason Selection Task is a standard measure of confirmation bias.  Less than 10% got it right in Wason’s original experiment.[1]

The “feminist bank teller” question famous from Kahneman’s experiments is a measure of the conjunction fallacy. Only 10% got it right — 15% for students in the decision science program of the Stanford Business School, who had taken advanced courses in statistics, probability, and decision theory.[2]

Overconfidence bias shows up in 88% of study participants. [6]

The Cognitive Reflection Test measures the ability to avoid choosing intuitive-but-wrong answers.  Only 17% get all 3 questions right. [7]

The incidence of framing effects appears to be lower and more variable. Stanovich’s experiments find a framing effect in 30-50% of subjects.[3]   The frequency of framing effects in Kahneman and Tversky’s experiments occupy roughly the same range. [4]  The incidence of the sunk cost fallacy in experiments is only about 25%. [5]

On 15 cognitive bias questions developed by Keith Stanovich, subjects’ accuracy rates ranged from 92.2% (for a gambler’s fallacy question) to 15.6% (for a sample size neglect question). The average score was 6.88 (46%) with a standard deviation of 2.32.  [8]

Many standard measures of cognitive bias find that a minority of subjects get the correct answer. There is significant individual variation in cognitive bias.

Correlation between cognitive bias tasks

Is “rationality” a cluster in thingspace?  “Rational” only makes sense as a descriptor of people if the same people are systematically better at cognitive bias tasks across the board.  This appears to be true.

Stanovich found that the Cognitive Reflection Test was correlated (r=0.49) with score on the 15-question cognitive bias test.  Performance also correlated (r=0.41) with IQ, measured with the Wechsler Abbreviated Scale of Intelligence.

Stanovich also found that four rational thinking tasks (a syllogistic reasoning task, a Wason selection task, a statistical reasoning task, and an argument evaluation task) were correlated at the 0.001 significance level (r = 0.2-0.4). These were also correlated with SAT score (r = 0.53) and more weakly with math background (r = 0.145).[9]

Stanovich found, however, that many types of cognitive bias tests failed to correlate with measures of intelligence such as SAT or IQ scores. [10]

The Cognitive Reflection Test was found to be significantly correlated with correct responses on the base rate fallacy, conservatism, and overconfidence bias, but not with the endowment effect. [12]

Philip Tetlock’s “super-forecasters” — the top 2% most successful predictors on current-events questions in IARPA’s Good Judgment Project — outperformed the average by 65% and the best learning algorithms by 35-60%.  The best forecasters scored significantly higher than average on IQ, the Cognitive Reflection Test, and political knowledge. [11]

Correct responses on probability questions correlate with lower rates of the conjunction fallacy. [16]

In short, there appears to be significant correlation between a variety of tests of cognitive biases. Higher IQ is also correlated with avoiding cognitive biases, though many individual cognitive biases are uncorrelated with IQ and the variation in cognitive bias is not fully explained by IQ differences.  The cognitive reflection test is correlated with less cognitive bias and with IQ, as well as with forecasting ability.  There’s a compelling case that “rationality” is a distinct skill, related to intelligence and math or statistics ability.

Cognitive bias performance and dispositions

“Dispositions” are personal qualities that reflect one’s priorities — “curiosity” would be an example of a disposition.  Performance on cognitive bias tests is correlated with the types of dispositions we’d associate with being a thoughtful and reasonable person.

People scoring higher on the Cognitive Reflection Test are more patient, as measured by how willing they are to wait for a larger financial reward.[7]

Higher scores on the Cognitive Reflection Test also correlate with utilitarian thinking (as measured by willingness to throw the switch on the trolley problem.) [13]

Belief in the paranormal is correlated with higher rates of the conjunction fallacy. [17]

Score on rational thinking tasks (argument evaluation, syllogisms, and statistical reasoning) is correlated (r = 0.413) with score on a Thinking Dispositions questionnaire (which measures Actively Open-Minded Thinking, Dogmatism, Paranormal Beliefs, etc.)

Basically, it appears that lower rates of cognitive bias correlate with certain behavioral traits one could intuitively characterize as “reasonable.”  They’re less dogmatic, and more open-minded. They’re less likely to believe in the supernatural. They behave more like ideal economic actors.  Most of this seems to add up to being more WEIRD, though this may be a function of the features that researchers chose to investigate.

Factors correlating with cognitive bias

Men score higher on the Cognitive Reflection Test than women — the group that answers all three questions correctly is two-thirds men, while the group that answers all three questions wrong is two-thirds women. [7]

Scientists [14] and mathematicians [15] performed no better than undergraduates on the Wason Selection Task, though mathematics undergraduates did better than history undergraduates.

Autistics [18] are less susceptible to the conjunction fallacy than neurotypicals.

Correct responses on conjunction fallacy and base rate questions correspond to better performance on “No-Go” tasks and greater N2, an EEG measure believed to reflect executive inhibition ability. [19]  Response inhibition is thought to be based in the striatum and associated with striatal dopamine receptors.

COMT mutations predict greater susceptibility to confirmation bias. [20]  COMT is involved in the degradation of dopamine. The Val/Met polymorphism makes the enzyme less efficient, which increases prefrontal cortex activation and working memory for abstract rules.  Met carriers exhibited more confirmation bias (p = 0.005).

There doesn’t seem to be that much data on the demographic characteristics of the most and least rational people.

There’s some suggestive neuroscience on the issue; the ability to avoid intuitive-but-wrong choices has to do with executive function and impulsivity, while the ability to switch tasks and avoid being anchored on earlier beliefs has to do with prefrontal cortex learning.  As we’ll see later, Stanovich (independently of the neuroscience evidence) categorizes cognitive biases into two distinct types, more or less matching this distinction between “consciously avoiding the intuitive-but-wrong answer” skills and the “considering that you might be wrong” skills.

Is there a hyper-rational elite?

It seems clear that there’s such a thing as individual variation in rationality, that people who are more rational in one area tend to be more rational in others, and that rationality correlates with the kinds of things you’d expect: intelligence, mathematical ability, and a flexible cognitive disposition.

It’s not obvious that “cognitive biases” are a natural category — some are associated with IQ, while some aren’t, and it seems quite probable that different biases have different neural correlates. But tentatively, it seems to make sense to talk about “rationality” as a single phenomenon.

A related question is whether there exists a small population of extreme outliers with very low rates of cognitive bias, a rationality elite.  Tetlock’s experiments seem to suggest this may be true — that there are an exceptional 2% who forecast significantly better than average people, experts, or algorithms.

In order for the “rationality elite” hypothesis to be generally valid, we’d have to see the same people score exceptionally high on a variety of cognitive bias tests.  There doesn’t yet appear to be evidence to confirm this.

Stanovich’s tripartite model

Stanovich proposes dividing “System II”, or the reasoning mind, into two further parts: the “reflective mind” and the “algorithmic mind.”  The reflective mind engages in self-skepticism; it interrupts processes and asks “is this right?”  The algorithmic mind is involved in working memory and cognitive processing capacity — it is what IQ tests and SATs measure.

This would explain why some cognitive biases, but not others, correlate with IQ.  Intelligence does not protect against myside bias, the bias blind spot, sunk costs, and anchoring effects.  Intelligence is correlated with various tests of probabilistic reasoning (base rate neglect, probability matching), tests of logical reasoning (belief bias, argument evaluation), expected value maximization in gambles, overconfidence bias, and the Wason selection test.

One might argue that the skills that correlate with intelligence are tests of symbolic manipulation skill, the ability to consciously follow rules of logic and math, while the skills that don’t correlate with intelligence require cognitive flexibility, the ability to change one’s mind and avoid being tied to past choices.

Stanovich talks about “cognitive decoupling”, the ability to block out context and experiential knowledge and just follow formal rules, as a main component of both performance on intelligence tests and performance on the cognitive bias tests that correlate with intelligence.  Cognitive decoupling is the opposite of holistic thinking. It’s the ability to separate, to view things in the abstract, to play devil’s advocate.

Cognitive flexibility, for which the “actively open-minded thinking scale” is a good proxy measure, is the ability to question your own beliefs.  It predicts performance on a forecasting task, because the open-minded people sought more information. [21]  Less open-minded individuals are more biased towards their own first opinions and do less searching for information.[22]  Actively open-minded thinking increases with age (in middle schoolers) and correlates with cognitive ability.[23]

Under this model, people with high IQs, and especially people with training in probability, economics, and maybe explicit rationality, will be better at the cognitive bias skills that have to do with cognitive decoupling, but won’t be better at the others.

Speculatively, we might imagine that there is a “cognitive decoupling elite” of smart people who are good at probabilistic reasoning and score high on the cognitive reflection test and the IQ-correlated cognitive bias tests. These people would be more likely to be male, more likely to have at least undergrad-level math education, and more likely to have utilitarian views.  Speculating a bit more, I’d expect this group to be likelier to think in rule-based, devil’s-advocate ways, influenced by economics and analytic philosophy.  I’d expect them to be more likely to identify as rational.

I’d expect them not to be much better than average at avoiding the cognitive biases uncorrelated with intelligence. The cognitive decoupling elite would be just as prone to dogmatism and anchoring as anybody else.  However, the subset that were cognitively flexible would probably be noticeably better at predicting the future.  Tetlock’s finding that the most accurate political pundits are “foxes” not “hedgehogs” seems to be related to this idea of the “reflective mind.”  Most smart abstract thinkers are not especially open-minded, but those who are, get things right a lot more than everybody else.

It’s also important to note that experiments on cognitive bias pinpoint a minority, but not a tiny minority, of less biased individuals.  17% of college students at all colleges, but 45% of college students at MIT, got all three questions on the cognitive reflection test right.  MIT has about 120,000 living alumni; 27% of Americans have a bachelor’s or professional degree.  The number of Americans getting the Cognitive Reflection Test right is probably on the order of a few million — that is, a few percent of the total population.  Obviously, conjunctively adding more cognitive bias tests should narrow down the population of ultra-rational people further, but we’re not talking about a tiny elite cabal here.  In the terminology of my previous post, the evidence points to the existence of unusually rational people, but only at the “One-Percenter” level.  If there are Elites, Ultra-Elites, and beyond, we don’t yet have the tests to detect them.

Conclusion

Yes, there are people who are consistently less cognitively biased than average.  They are a minority, but not a tiny minority.  They are smarter and more reasonable than average.  When you break down the measures of cognitive bias into two types, you find that intelligence is correlated with measures of ability to reason formally, but not with measures of ability to question one’s own judgment; the latter are more correlated with dispositions like “active open-mindedness.”  There’s no evidence to suggest that there’s a very small (e.g. less than 1% of the population) group of extremely rational people, probably because we don’t have enough experimental power to detect extremes of performance on cognitive bias tests.

References

[1] Wason, Peter C. “Reasoning about a rule.” The Quarterly Journal of Experimental Psychology 20.3 (1968): 273-281.

[2] Tversky, Amos, and Daniel Kahneman. “Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment.” Psychological review 90.4 (1983): 293.

[3] E. Stanovich, Keith, and Richard F. West. “Individual differences in framing and conjunction effects.” Thinking & Reasoning 4.4 (1998): 289-317.

[4] Tversky, Amos, and Daniel Kahneman. “Rational choice and the framing of decisions.” Journal of business (1986): S251-S278.

[5] Friedman, Daniel, et al. “Searching for the sunk cost fallacy.” Experimental Economics 10.1 (2007): 79-104.

[6] West, R. F., & Stanovich, K. E. (1997). The domain specificity and generality of overconfidence: Individual differences in performance estimation bias. Psychonomic Bulletin & Review, 4, 387-392.

[7] Frederick, Shane. “Cognitive reflection and decision making.” Journal of Economic perspectives (2005): 25-42.

[8]Toplak, Maggie E., Richard F. West, and Keith E. Stanovich. “The Cognitive Reflection Test as a predictor of performance on heuristics-and-biases tasks.”Memory & Cognition 39.7 (2011): 1275-1289.

[9] Stanovich, K. E., & West, R. F. (1998). Individual differences in rational thought. Journal of Experimental Psychology: General, 127, 161-188

[10] Stanovich, K. E., West, R. F., & Toplak, M. E. (2011).  Intelligence and rationality.  In R. J. Sternberg & S. B. Kaufman (Eds.), Cambridge Handbook of Intelligence (pp. 784-826).  New York: Cambridge University Press.

[11] Ungar, Lyle, et al. “The Good Judgment Project: A Large Scale Test of Different Methods of Combining Expert Predictions.” 2012 AAAI Fall Symposium Series. 2012.

[12] Hoppe, Eva I., and David J. Kusterer. “Behavioral biases and cognitive reflection.” Economics Letters 110.2 (2011): 97-100.

[13] Paxton, Joseph M., Leo Ungar, and Joshua D. Greene. “Reflection and reasoning in moral judgment.” Cognitive Science 36.1 (2012): 163-177.

[14] Griggs, Richard A., and Sarah E. Ransdell. “Scientists and the selection task.”Social Studies of Science 16.2 (1986): 319-330.

[15]Inglis, Matthew, and Adrian Simpson. “Mathematicians and the selection task.”Proceedings of the 28th International Conference on the Psychology of Mathematics Education. Vol. 3. 2004.

[16]Benassi, Victor A., and Russell L. Knoth. “The intractable conjunction fallacy: Statistical sophistication, instructional set, and training.” Journal of Social Behavior & Personality (1993).

[17] Rogers, Paul, Tiffany Davis, and John Fisk. “Paranormal belief and susceptibility to the conjunction fallacy.” Applied cognitive psychology 23.4 (2009): 524-542.

[18]Morsanyi, Kinga, Simon J. Handley, and Jonathan SBT Evans. “Decontextualised minds: Adolescents with autism are less susceptible to the conjunction fallacy than typically developing adolescents.” Journal of autism and developmental disorders 40.11 (2010): 1378-1388.

[19] De Neys, Wim, et al. “What makes a good reasoner?: Brain potentials and heuristic bias susceptibility.” Proceedings of the Annual Conference of the Cognitive Science Society. Vol. 32. 2010.

[20] Doll, Bradley B., Kent E. Hutchison, and Michael J. Frank. “Dopaminergic genes predict individual differences in susceptibility to confirmation bias.” The Journal of neuroscience 31.16 (2011): 6188-6198.

[21] Haran, Uriel, Ilana Ritov, and Barbara A. Mellers. “The role of actively open-minded thinking in information acquisition, accuracy, and calibration.”Judgment & Decision Making 8.3 (2013).

[22] Baron, Jonathan. “Beliefs about thinking.” Informal reasoning and education(1991): 169-186.

[23] Kokis, Judite V., et al. “Heuristic and analytic processing: Age trends and associations with cognitive ability and cognitive styles.” Journal of Experimental Child Psychology 83.1 (2002): 26-52.

 

 

Exit, Voice, and Empire

Economist Otto Hirschman defined the concepts of “voice” and “exit” to refer to firms or political institutions.

If you don’t like the way your group works, you can exercise voice by participating in the decision-making process: voting, registering grievances, lobbying, writing letters to the editor, making your case in a meeting, and so on.  Or, you can exercise exit by leaving the group: emigrating, quitting your job, buying from a different company, forking the project, starting your own meetup, etc.

Exit, in many ways, is more attractive than voice. Voice requires conflict, persuasion, coalition-building: in short, politics. Voice is slow; exit is fast. Voice is often coercive; exit is peaceful.  Voice is messy; exit is clean. Balaji Srinivasan thinks exit is just plain better than voice.

In politics, ideas like seasteading, intentional communities, free cities, federalism, Archipelago, and so on, which revolve around a patchwork of voluntary communities, are based on increasing the role of exit relative to voice.  In democracies, most of what we think of as “politics” is voice.  A whole nation votes on whether we choose X or Y.  Instead, some say, we should side-step the conflict by letting the X-lovers have X and the Y-lovers have Y.  Let people vote with their feet or their dollars.

The problem with exit is that it’s not always practical to fragment groups into ever smaller splinters. There are returns to scale in large companies. There are network effects to living in large cities that become commercial and cultural hubs.  There are advantages to having a common language, common technological conventions, shared communication networks, and so on, across wide numbers of people.  And when one group is hugely dominant and successful, it’s more in your interest to try to shift it slightly towards your point of view than to try to “build your own” from scratch.

As long as there are network effects and advantages to large-scale organizations, there will be reasons to use voice rather than exit.

Empires are the original large-scale organizations. And empires have provided many historical prototypes for the advantages of unified institutions. Roman roads and Roman law.  Qin Shi Huang Di’s unification of language, weights, and measures. The railroads, telegraphs, and trade routes of the British Empire.  The metric system and the Napoleonic Code.

Empires, by definition, do not rule over a single “people”, so they must accommodate cultural diversity. Sometimes they were remarkably tolerant.  (The Jews have traditionally remembered Darius and Alexander fondly.)  Imperial rules have a quality of impartiality, compared with local customs; they must be applicable to a vast and diverse population.

It’s often in your interest to belong to an empire.  The empire has the technology, the comforts of civilization, the military power.  Secede, and you’ll be “free”, but poor, provincial, and vulnerable.

There are profound problems with academic science, for instance. That doesn’t mean it’s obvious that one should just do science outside of academia.  The universities still have the top people, the funds, the equipment, and so on.  It’s not clear in every case that your new fledgling institute will do better than the old Leviathan.

You might choose voice over exit if you want the dominant institution to be more inclusive. Is it better to give gays civil marriage, or for gays to champion non-marital romantic arrangements?  The pro-marriage argument says that marriage is the dominant social institution, it comes with useful advantages, so gays should want to be included in that institution.  Gay marriage proponents want in, not out.  Marriage is nifty; it’s easier to gain access to an existing nifty thing than to create an alternative of equal niftiness.

Imperial structures — by which I mean, rules and institutions meant to work at large scale, for diverse populations — can be made more universal, abstract, and customizable, up to a point.  Chinese writing is standardized; Chinese pronunciation is local.  That which needs to be shared across an empire should follow a single rule; everything else should be up to local or individual choice.  Think of this like a structure with a steel skeleton clothed in colorful tissue paper.  A few rules are firm and universal; everything else is up to choice.

This heuristic shows up everywhere where there are network effects, not just in politics.  Think of any social media platform.  The steel skeleton is the site’s code; a page/feed/tumblr/etc has a certain structure.  The tissue paper is the content, which is endlessly customizable.  The skeleton is not truly impartial: the structure of the site does shape the culture.  But it aims at impartiality.  It has an impartial flavor. Wikipedia is more successful than Conservapedia because Wikipedia is structured to be as universal and neutral as possible.

In an imperial structure, there isn’t much voice.  The rules are hard to change, and the emperor’s power is absolute.  Ideally, though, the rules are somewhat abstract or unbiased.  This allows them to be more persistent over time: the strength of different factions may rise and fall, and you’d like to have a structure that endures those shifts.  This makes some kinds of “tolerance” or “inclusiveness” or “cosmopolitanism” very much in the interest of the empire.

A limited form of exit is used within the empire, to choose local customs within subgroups that still have access to the imperial resources and play by the “skeleton” of imperial ground rules.  True exit, leaving the empire altogether, is much more costly, and usually not worth it.

Empires, I think, basically map to “no voice, plenty of freedom to mini-‘exit’ within the boundaries of the empire, true exit is high opportunity cost to the emigrant but harmless to the empire.”

Small, tight-knit personal communities have pretty much the opposite structure.  Imagine a five-person startup, or a nuclear family.  Voice obviously plays a big role here — if you don’t like something your spouse is doing, you talk to them about it.  When you have a group so small that it would cease to function if it split, communal cooperation begins to make sense.  It even makes sense to have the intuition that unanimity or consensus should be necessary for a decision; if one disgruntled person could destroy a project, it’s important to make sure everybody’s on board.

Internal diversity is impractical in very small groups; if you’re making turkey for Thanksgiving, everybody has to eat turkey, or at least be satisfied with a plate full of sides.  Hard-and-fast rules also don’t make a lot of sense.  The right thing to do in a situation is always a function of the people involved. Things get very granular at small scales, and it matters that Bob can’t stand Alice and Eve is having a family emergency and Dave is being a prima donna but he’s the best geneticist we have.

So very small, personal groups are more like “lots of voice, no mini-‘exit’, true exit is dangerous to the group but can be cheap for the emigrant.”

Should you reform or reject a failing institution?

Would you rather operate in something more like an empire or more like a family?

In a family, you can negotiate if you don’t like how things are being done.  In an empire, you can (up to a point) go off and do your own thing, but the ground rules of the empire are rigid.  An empire has the advantages of scale — network effects, organizational infrastructure, lots of resources.  A family has the advantages of smallness — it can take account of individual needs and situations, it’s “closer to the ground.”

What I’d like to propose is taking account of tradeoffs and being aware of what tactics are appropriate to what situations.

You can’t have a “national discussion about X” because America is a nation of 300 million people, not a friend cluster.  You also can’t split up your meetup group or activist organization every time somebody has a disagreement, because you won’t have a group any more.

Exit always has costs. If you leave an empire, you lose its large-scale resources. If you leave a family, you can break the family.  Exit is worth it if you can easily get what you need outside the group, but it’s not free.

Balaji’s idea of tech companies building better alternative versions of existing institutions is promising, but not because exit is always awesome. Rather, it works to the extent that technological infrastructure can substitute for institutional infrastructure. If what you really need to run a school, say, is superstar teachers, good programmers, and adaptive learning algorithms (in the Coursera/Khan Academy vein), then the infrastructure of the public school system or traditional academia is just not very useful, and you can exit without much opportunity cost.

It’s telling that Balaji talks about web-based education, not about homeschooling, which is also a form of exit from public school. But homeschooling is not scalable — you do it one family at a time.  That makes it harder to make homeschooling a real alternative for vast numbers of people.  Using the tech industry to make independent education convenient and memetically viral — that’s a different story.  It has the potential to make independent education into a new kind of “empire.”

I’d say that Silicon Valley is a growing empire (or interlocking collection of empires) that is beginning to poach people from the post-New Deal American empire(s).  It’s not about people leaving the big city to homestead on the lonesome prairie; it’s people leaving the big city to go to another big city.

Exit from a big institution is easy in two kinds of situations: either you don’t need big institutions, or you have another big institution to emigrate to.  “Tune in, turn on, drop out” means telling people they don’t need an empire at all.  “Go to App Academy, not college” means telling people they can switch from one empire to a different one.  They’re both forms of exit, but they’re structurally very different.

My own view is that empires are very useful in a lot of contexts, and that the ideal (not always attainable) way to deal with a dying empire is to build a new empire to compete with it.  Radical decentralization (like 19th-century homesteading) tends not to last forever; people will always be building cities, businesses will always be trying to become big, frontiers get populated, there are normal human pressures towards centralization.  Institutions start off small and scrappy, grow to mature success, and then become cargo-culted and corrupt.  It doesn’t make sense to fight that life cycle; it makes sense to join it, by being the scrappy upstart David taking on an already-failing Goliath.

The World is Simple

In the world of image and signal processing, a paper usually takes the form “We can prove that this algorithm gives such-and-such accuracy on images that have such-and-such regularity property.  We tried it on some examples and it worked pretty well.  We’re going to assume that most of the images we might care about have the regularity property.”

For instance, sparse coding and compressed sensing techniques assume that the representation of the image in some dictionary or basis is “sparse”, i.e. has very few nonzero coefficients.

There’s some biological justification for this: the mammalian brain seems to recognize images with an overcomplete dictionary of Gabor filters, only a few of which are firing at any given time.

There’s a basic underlying assumption that the world is, in some sense, simple. This is related to ideas around the “unreasonable effectiveness of mathematics.”  Observations from nature can be expressed compactly.  That’s what it means to live in an intelligible universe.

But what does this mean specifically?

One observation is that the power spectrum, that is, the square of the (discrete) Fourier transform, of natural images obeys a power law

S(k) = \frac{A}{k^{2-\eta}}

where \eta is usually small.  It’s been hypothesized that this is because natural images are composed of statistically independent objects, whose scales follow a power-law distribution.

What does this mean?  One way of thinking about it is that a signal with a power-law spectrum exists at all scales.  It’s referred to as “pink noise”.  You can generate a signal with a power-law spectrum by defining a “fractional Brownian motion“.  This is like a Brownian motion, except the increment from time t to time s is a normal distribution with mean zero and variance |t-s|^{2H} for the Hurst exponent H, which equals 1/2 in the special case of a Brownian motion.  The covariance function of a fractional Brownian motion is homogeneous of degree 2H.  Fractional Brownian motions are Lipschitz-continuous with exponent H.

As a matter of fact, any function whose wavelet transform is homogeneous of degree \lambda is a fractional Brownian motion of degree (\lambda-1)/2.

Cosma Shalizi has a good post on this phenomenon.  Systems in thermodynamic equilibrium, i.e. “boring” systems, have correlations that decay exponentially in space and time. Systems going through phase transitions, like turbulent flows, and like most things you’ll observe in nature, have correlations that decay slower, with a power law. There are many simple explanations for why things might wind up being power-law-ish.

Imagine you have some set of piles, each of which grows, multiplicatively, at a constant rate. New piles are started at random times, with a constant probability per unit time. (This is a good model of my office.) Then, at any time, the age of the piles is exponentially distributed, and their size is an exponential function of their age; the two exponentials cancel and give you a power-law size distribution. The basic combination of exponential growth and random observation times turns out to work even if it’s only the mean size of piles which grows exponentially.

If we’re in a domain with \eta < 1 or H > 1/2, we’re looking at a function with a square-summable Fourier transform.  This is why L^2 assumptions are not completely insane in the domain of signal processing, and why it makes sense to apply wavelet (and related) transforms and truncate after finitely many coefficients.

Not all regularity assumptions in the signal-processing and machine-learning world are warranted.  Image processing is full of bounded-variation methods like Mumford-Shah, and at least one paper is claiming to observe that natural images are not actually BV.  My own research deals with the fact that Riemannian assumptions in manifold learning are not realistic for most real-world datasets, and that manifold-learning methods need to be extended to sub-Riemannian manifolds (or control manifolds).  And I’m genuinely uncertain when sparsity assumptions are applicable.

But decaying power spectra are a pretty robust empirical observation, and not for shocking reasons. It’s a pretty modest criterion of “simplicity” or “intelligibility”, it’s easy to reproduce with simple random processes, it’s kind of unsurprising that we see it all over the place. And it allows us to pretend we live in Hilbert spaces, which is always a win, because you can assume that Fourier transforms converge and discrete approximations of projections onto orthogonal bases/dictionaries are permissible.

Power-law spectrum decay is a sort of minimum assumption of simplicity that we can expect to see in all kinds of data sets that were not generated “adversarially.”  It’s not a great mystery that the universe is set up that way; it’s what we would expect.  Stronger assumptions of simplicity are more likely to be domain-specific.  If you know what you’re looking at (it’s a line drawing; it’s a set of photographs of the same thing taken at different angles; it’s a small number of anomalies in an otherwise featureless plane; etc) you can make more stringent claims about sparsity or regularity.  But there’s a certain amount of simplicity that we’re justified in assuming almost by default, which allows us to use the tools of applied harmonic analysis in the first place.

Beyond the One Percent: Categorizing Extreme Elites

A lot of people talk about “1%” as though it was synonymous with “almost nothing.”  Except that when it comes to people, that’s extremely misleading.  One percent of the US population is more than three million people!

Confused thinking is especially common when we talk about extreme elites, of achievement or wealth.  If “top 1%” means millions of people, what about even smaller, even more extreme elites?  The top 0.01% is as far removed from the 1% as the 1% is from the general population; and yet that’s still tens of thousands of people!  How do you have any kind of gauge for these numbers?

Because human intuition is evolved for much smaller social groups than the United States, our mental models can be very badly wrong. If you’re a mathematician at a top-tier school, it feels like “lots” of people are at that level of mathematical ability.  To you, that’s “normal”, so you don’t have much intuition for exactly how rare it is.  Anecdotally, it seems very common for intellectual elites to implicitly imagine that the community of people “like them” is orders of magnitude bigger than it actually is.

So I’ve done a little “Powers of Ten” exercise, categorizing elite groups by size and giving a few illustrative examples.  All numbers are for the US. Fermi calculations have been used liberally.

Of course, people don’t belong to one-and-only-one group: you could be a One-Percenter in money, an Elite in programming ability, and average in athletic ability.

Historical figures: people who achieve things of a caliber only seen a few times a century. People who show up in encyclopedias and history books.

Superstars: people who win prizes that are only awarded to a handful of people a year or so — there are usually dozens alive/active at that level at any given time. Nobel Prize winners (per field) and Fields medalists. Movie stars and pop music celebrities. Cabinet members. Tennis grand slam winners and Olympic medalists (per event).  People at the superstar level of wealth are household names and have tens of billions of dollars in net worth.  Groups of superstars are usually too small to develop a distinctive community or culture.

Leaders: members of a group of several hundred. International Mathematics Olympiad contestants. National Academy of Sciences or American Academy of Arts and Sciences members, per field. Senators and congressmen. NBA players. Generals (in the US military). Billionaires. Groups of Leaders form roughly Dunbar-sized tribes: a Leader can personally get to know all the people at his level.

Ultra-elites: members of a group of a few thousand. PhDs from top-ten universities, by department. Chess grandmasters. Major league baseball players. TED speakers. Fashion models.

Elites: members of a group of tens of thousands. “Ultra high net worth individuals” owning more than $30 million in assets. Google software engineers. AIME qualifiers. Symphony orchestra musicians. Groups of Elites are about the size of the citizen population of classical Athens, or the number of Burning Man attendees. Too large to get to know everyone personally; small enough to govern by assembly and participate in collective rituals.

Aristocrats: members of a group of hundreds of thousands. Ivy League alumni. Doctors. Lawyers. Officers (in US military). People of IQ over 145. People with household incomes of over $1 million a year (the “0.1%”). Groups of Aristocrats are large enough to be professions, as in law or medicine, or classes, like the career military class or the socioeconomic upper class.

One-percenters: members of a group of a few million. Engineers. Programmers.  People of IQ over 130, or people who scored over 1500 on SAT’s (out of 1600). People who pass the Cognitive Reflection Test. People with over $1 million in assets, or household income over $200,000. If you are in a group of One-Percenters, it’s a whole world; you have little conception of what it might be like to be outside that group, and you may have never had a serious conversation with someone outside it.

 

Fun with BLS statistics

What do people in America do for a living?

What is a “normal” job, statistically?

What are the best-paying jobs?

Most of us don’t know, even though these are incredibly relevant facts for career choice, education, and having some idea of what kind of country you live in.  And even though all the statistics are available free to the public from the Bureau of Labor Statistics!

What Jobs Pay Best?

Doctors. Definitely doctors. The top ten highest mean annual wage occupations are all medical specialties. Anesthesiologists top the list, with an average salary of $235,070.

Obviously doctors are not the richest people in the US. The Forbes 400 consists largely of executives.  But “chief executive” as a profession actually ranks behind “psychiatrist.” The average CEO makes $178,400 a year.

Dentists, nurse anaesthetists, and petroleum engineers make over $150,000 a year. Managers of all sorts, as well as lawyers, range in the $120,000-$140,000s.

Air traffic controllers make about as much as physicists, at $118,000 a year.

Yep, you got that right: the average air traffic controller is slightly richer than the average physicist.

Physicists are the richest pure-science specialty, followed by astronomers and computer scientists ($110,000) and mathematicians ($103,000).  Actuaries, software engineers, computer hardware engineers, and nuclear, aerospace, and chemical engineers, cluster around the $100,000-110,000 range.

Bottom line: if you want a high-EV profession, be a doctor. Or a dentist — the pay is almost as good. The “professions” — medicine, law, engineering — are, in fact, high-paying, and sort by income in that order.  It is, obviously, good to be a manager; but still not as good as being a doctor. Going into the hard sciences is, as far as income goes, basically the same as going into engineering. It’s the bottom of the 6-figure range.  There are a few underappreciated jobs, like air traffic controllers, pilots, anaesthetists, pharmacists, actuaries, and optometrists, which aren’t generally given as much social status as doctors and lawyers, but pay comparably.

What Jobs Pay Worst?

Flipping burgers. It’s not just a punchline: fast food cooks are the lowest-paid occupation, at $18,870 a year.

For comparison purposes, the federal poverty line for a single person is given at $11,670, and for a family of four at $23,850. So a burger-flipper is only technically living in poverty if she supports at least two dependents. 15% of Americans live below the poverty line.  Since a fair number (19%) of people living alone are poor, this suggests that unemployment or underemployment is a bigger factor in poverty than low wages.

We have a lot of low-paid fast-food cooks and servers. Three million Americans work in fast-food preparation and service.

The lowest of the low-paid jobs, making under $30,000 a year, are service workers. Cooks, cashiers, desk clerks, maids, bartenders, parking lot attendants, manicurists.  When somebody waits on you in a commercial establishment, you’re looking at one of the poorest people who have jobs at all.

The other kind of ultra-low-paid jobs are laborers. Agricultural workers, graders and sorters, cleaners of vehicles and equipment, meat cutters and trimmers and meatpackers, building cleaning and pest control workers. Groundskeeping workers.  Not, it’s important to note, people who work in manufacturing and repair; most of those jobs are in the $30,000-$40,000 range.

As you get to the top of the <$30,000 range, you begin to see office workers. Office clerks (and there are two million of them!) get paid about $29,000 a year. Data entry. File clerks. Despite living in the age of computers, we still have lots of people whose jobs are low-level paperwork. And they’re very poorly paid.

This is the depressing side of the income scale.  Where are all the poor people? They’re in customer service, unskilled labor, or low-level office work.

Who is the Middle Class?

The median US household income is $51,000.  The average household is 2.55 people.  The median US salary is $48,872.  (This seems to imply that most wage earners support at least one dependent.)  So let’s look at jobs that pay around the median.

Firefighters, at $48,270. Social workers, at $48,370, as well as librarians, at $47,750, counselors, at $47,820, teachers, at $54,740, and clergy, at $47,540. Fine artists, at $50,900, and graphic designers, at $49,610.   Things like “mine cutting and channeling machine operators”, “aircraft cargo handling supervisors”, “tool and die makers”, “civil engineering technicians”, “derrick operators, oil and gas”, “explosive workers, ordnance handling experts, and blasters”, “railroad brake, signal, and switch operators”, and so on, get paid in the $48,000-51,000 range.  Basically, jobs that involve the skilled use of machinery, the actual making and operating of an industrial civilization.

Who is the middle class? “Teachers and firemen” isn’t far off, as stereotypes go.  It’s mostly unionized jobs, either in the “helping professions” or in manufacturing/industry.

How do you get a job like that?  For example, CNC programmers are pretty evenly split between people with associates’ degrees (36%), people with post-secondary certificates (31%), and people with college degrees (15%). You need to pass a licensing exam and spend several years as an apprentice.  Mining machine operators, on the other hand, mostly don’t even need a high school diploma. Tool and die makers need a post-secondary certificate but generally not a college degree. By contrast, you usually need a masters’ to be a counselor, for comparable pay.

Where do most people work?

Of the broad sectors defined by the BLS, the most common is “office and administrative support occupations.”

Who are these? Things like “data entry keyers”, “human resources assistants”, “shipping clerks”, “payroll and timekeeping clerks”, and so on. They make an average salary of $34,900, and they are mostly employed by government, banks, hospitals and medical practices.  A full 16% of employed Americans work in this sector.

The second most common sector is “sales and related occupations.”

Who are these? Everything from counter clerks to real estate brokers to sales engineers, but not management of sales departments.  The mean annual wage is $38,200 — most people in “sales” are clerks in stores (grocery stores, department stores, clothing stores, etc.) 14 million people work in sales altogether, around 11% of employed Americans.

The next most common sector is “food preparation and services”, at 8% of employed Americans.  The mean wage is $21,580.

By single occupation, the most common occupations in America are “retail sales workers”, “food and beverage serving workers”, and “information and record clerks.”  We are, more than anything else, a nation of shitty retail jobs.

We have a lot of school teachers (4 million), a lot of people working in construction (3.7 million), a lot of nurses (2.7 million) and health technicians (2.8 million).  But the most common occupations are very heavily weighted towards retail, service, unskilled labor, and low-level office work.

What about cool jobs?

Shockingly, there are only 3030 mathematicians. Maybe a lot of them are calling themselves something else, like the 89,740 “post-secondary math and computer teachers”, though that’s hardly how I’d describe my professors.  There are 24,950 statisticians, 24,380 computer scientists, 17,340 physicists, and 87,560 chemists.

By contrast, there are 1.4 million software developers and programmers. In my little bubble, it feels like almost all the smart people wind up as software engineers; by the numbers, it looks like this is more or less true. All non-software engineers combined only make up 1.5 million jobs.  I hear a lot of rhetoric about “Silicon Valley only does software, real atom-pushing engineering technology is lagging” — I don’t have a basis for evaluating the truth of that, but we definitely have a lot of people employed in software compared to the rest of engineering.

There are 87,240 artists, more than half of whom are animators and art directors; there are 420,130 designers;  there are 63,230 actors, 39,260 musicians and singers, 11,540 dancers, and 43,590 writers.  Writers don’t actually do so badly: average wage is $69,250.  For all the hand-wringing about the end of writing as a profession, it’s still a real job.

There are a ton of doctors (623,380) and almost as many therapists (600,650).  Therapists here refers to physical therapists, occupational therapists, speech therapists, and so on, not psychological counselors.  There are far more people lower on the totem pole: 2.8 million medical technicians, 2.7 million registered nurses, and 3.9 million “healthcare support occupations” (nurses’ aides, orderlies, etc.  These fall into the “shitty service jobs” category, average yearly income $28,300.)

There are 592,670 lawyers, and 27,190 judges.

Basically, when it comes to the arts and professions, doctors and lawyers are the most common as well as the best-paid, followed by engineers and programmers, and then scientists and artists.

What does the BLS tell you about what you should do for a living?

Of course, it depends on who you are and what resources are available to you. But here’s a few things that popped out to me.

1.) The most reliable way to make a high salary is to be a doctor.  There is absolutely no ambiguity on that point.

2.) Programming/engineering/hard science and management are the skills involved in most of the top-paid jobs.

3.) The best-paid job that doesn’t require a college degree is airline pilot. If you’re broke or you hate school, consider learning to fly.

4.) Writers and visual artists are not that poor, so long as they’re willing to work on commercial projects.

EDIT: Michael Vassar has questioned the numbers of doctors and lawyers.  It turns out the BLS numbers may be slight underestimates but aren’t too far off from other sources.

The Kaiser Foundation says there are 834,769 “professionally active physicians” in the US, as of 2012.  The Federation of State Medical Boards is giving the number 878,194 for licensed physicians as of 2012. We have roughly one physician for every 400 people, according to the World Bank.

The ABA gives 1,225,452 licensed lawyers.  Harvard Law School says the BLS numbers are lower because there are more people licensed to practice law than currently employed as attorneys.

All in all, I’m fairly confident that the number of “professionals” (doctors, lawyers, and engineers, including software engineers) is around 5 million, and likely not more than 10 million. It’s two or three percent of the population.

Taste and Consumerism

Peter Drucker, whose writings form the intellectual foundation behind the modern management corporation, defined “consumerism” as follows:

What consumerism demands of business is that it actually market.  It demands that business start out with the needs, the realities, the values of the customer.  It demands that business base its reward on its contribution to the customer. … It does not ask “What do we want to sell?” but “What does the customer want to buy?”  It does not say, “This is what our product or service does.” It says “These are the satisfactions the customer looks for, values, and needs.”

Peter Drucker, Management

A consumerist business, then, is like an optimization process, and customer feedback (in the form of sales, surveys, complaints, usage statistics, and so on) is its reward function.  A consumerist business is exquisitely sensitive to customer feedback, and adapts continually in order to better satisfy customers. The consumerist philosophy is antithetical to preconceived ideas about what the company “should” make.  Lean Startups, an extreme implementation of consumerist philosophy, don’t even start with a definite idea of what the product is; the company constantly evolves into selling whatever customers want to buy.

Another way of thinking about this: in a market, there are many possible voluntary trades that could happen.  A consumerist company tries to swim towards one of these trade points and slot itself into a convenient niche.  The whole purpose of trade is to produce win-win exchanges; “consumerism” just means being flexible enough to be willing to search through all the possibilities, instead of leaving opportunities unexploited. 

Yet another, more negative slant on consumerism, is that consumerism is the absence of taste.

A manager, according to Drucker, should not ask “What do we want to sell?”  But an artist always asks “What do I want to make?”

Computer scientist Richard Hamming famously said:

And I started asking, “What are the important problems of your field?” And after a week or so, “What important problems are you working on?” And after some more time I came in one day and said, “If what you are doing is not important, and if you don’t think it is going to lead to something important, why are you at Bell Labs working on it?”

A scientist, in other words, has to care what he’s working on.  Problems that are interesting, that have the potential to be world-changing.  Any good scientist is intrinsically motivated by the problem.  If you told Hamming you’d pay him a million dollars to crochet shawls all year, he’d laugh and refuse.  If he were the kind of person who could be induced to quit working on information theory, he wouldn’t be Hamming in the first place.

Ira Glass on creativity and taste:

All of us who do creative work … we get into it because we have good taste. But it’s like there’s a gap, that for the first couple years that you’re making stuff, what you’re making isn’t so good, OK? It’s not that great. It’s really not that great. It’s trying to be good, it has ambition to be good, but it’s not quite that good. But your taste — the thing that got you into the game — your taste is still killer, and your taste is good enough that you can tell that what you’re making is kind of a disappointment to you, you know what I mean?

J.D. Salinger on writing

You wrote down that you were a writer by profession. It sounded to me like the loveliest euphemism I’ve ever heard. When was writing ever your profession? It’s never been anything but your religion. Never…

If only you’d remember before ever you sit down to write that you’ve been a reader long before you were ever a writer. You simply fix that fact in your mind, then sit very still and ask yourself, as a reader, what piece of writing in all the world Buddy Glass would most want to read if he had his heart’s choice. The next step is terrible, but so simple I can hardly believe it as I write it. You just sit down shamelessly and write the thing yourself. I won’t even underline that. It’s too important to be underlined. 

Eric S. Raymond on software:

Every good work of software starts by scratching a developer’s personal itch.

There’s very clearly a tradition, across the creative disciplines, that a creator must be intrinsically motivated by love of the work and by the ambition to make something great.  Great by what standard?  Well, this is often informed by the standards of the professional community, but it’s heavily driven by the creator’s own taste.  She has some sense of what makes a great photograph, what makes a beautiful proof, what makes an ingenious design.  

Is taste universal? Is there some sense in which Beethoven’s 9th is “really” good — is there some algorithmic regularity in it, or some resonance with the human ear, something that makes its value more than a matter of opinion?  Maybe, and maybe not.  I’m inclined to be intrigued but skeptical of simple explanations of what humans find beautiful, like Schmidthuber’s notion of low Kolmogorov complexity.  My own speculation is that hidden symmetry or simplicity is also a fundamental principle of aesthetics: a perfect circle is all right, but an intricate and non-obvious pattern, which takes more attention to notice, is more interesting to the eye, because minds take pleasure in recognition.  

Whether there are some universal principles behind aesthetics or not, in practice aesthetics are mediated through individual taste. You cannot write a book by committee, or by optimizing around a dashboard of reader feedback stats.  You can’t write a proof that way either.  

Creative original work isn’t infinitely fungible and modifiable, like other commodities. The mindset of infinitely flexible responsiveness to feedback is extremely different from the mindset of focused creation of a particular thing.  The former involves lots of task switching; the latter involves blocks of uninterrupted time.  You can’t be a maker and a manager at the same time.  Managing, responding to feedback, being a “consumerist,” requires engaging your social brain: modeling people’s responses to what you do, and adapting accordingly.  Making things involves turning that part of your brain off, and engaging directly with physical objects and senses, or abstract concepts.

Creative work is relevant to businesses.  Design, for instance, matters. So does technological innovation.  But, for a consumerist business, the constraints of creative work are unwelcome limitations.  Makers want to make a particular thing, while the company as a whole needs to find any niche where it can be profitable.

Drucker defines “knowledge workers” as skilled experts, whose loyalty is stronger to their profession than to their company.  They’ll introduce themselves with “I’m a natural language processing guy”, not “I work for IBM.”  Drucker’s “knowledge workers” seem somewhat analogous to “makers.” A cynical view of his book Management is that it’s about how to organize and motivate knowledge workers without giving them any real power.  The NLP guy’s goal is to make a tool that does an excellent job at machine translation. The manager’s goal is to promote the growth and survival of the organization.  These goals are, ideally, aligned, but when they conflict, in a Druckerian organization, the manager’s goal has to take priority.

What this means is that makers, people with taste, have a few options.

1. Work for a manager in a successful company. You’ll have serious constraints on the type of work you do, and you won’t be able to capture much of its financial value, but your work will be likely to be implemented at a large scale out in the world, and you’ll have steady income.

2. Have a small lifestyle business that caters only to the few who share your taste.  You’ll never have much money, and you won’t have large-scale impact on the world, but you’ll be able to keep your aesthetic integrity absolutely.

3. Find a patron. (Universities are the classic example, but this applies to some organizations that are nominally companies as well. A hedge fund that has a supercomputer to model protein folding is engaging in patronage.  Family money is an edge case of patronage.)  A patron is a high-status individual or group that seeks to enhance its status by funding exceptional creators and giving them freedom in their work.  You can make a moderate amount of money, you’ll get a lot of creative freedom (but you’ll be uncertain how much or for how long) and you might be able to have quite a lot of impact. The main problem here is uncertainty, because patrons are selective and their gifts often have strings attached.

4. Start a business that bets hard on your taste.  If you’re Steve Jobs, or Larry Page, your personal vision coincides with market success. You can win big on all fronts: money, impact, and creative freedom.  The risk is, of course, that the overwhelming majority of people trying this strategy fail, and you’re likely to wind up with much less freedom than 1-3.

Howard Roark, the prototypical absolutist of personal taste, picked option 2: he made the buildings he liked, for the people who shared his taste in architecture, refused to engage in any marketing whatsoever, and was nearly broke most of the time.  In fact, Ayn Rand, who has a reputation as a champion of big business, is if anything a consistent advocate of a sort of Stoic retirement. You’d be happier and more virtuous if you gave up trying to “make it big,” and instead went to a small town to ply your craft.  “Making it”, in the sense of wealth or fame or power, means making yourself beholden to lots of people and losing your individuality. 

I’m not sure I’m that much of a hipster. I don’t think the obvious thing for a creative person to do is “retirement.”  Especially not if you care about scope.  If you’ve designed a self-driving car, you don’t want to make one prototype, you want a fleet of self-driving taxis on the streets of New York.  Even more so, if you’ve discovered a cure for a disease, you want it prescribed in hospitals everywhere, not just a home remedy for your family.

What I actually plan to do is something between 1 and 3 (there’s an emerging trend for tech companies to seem to straddle the line between patrons and employers, though I’m not certain what that looks like on the inside) and explore what it would take to do 4.  

Hungarian Mathematics Education

The Fasori Gimnazium in Budapest, while it was open between 1864 and 1952, might fairly be claimed to be the best high school in the world. It educated Eugene Wigner, John von Neumann, Edward Teller, Alfred Haar, and John Harsanyi.

So it might be useful to know what they were doing right.

Laszlo Ratz, who designed the curriculum, was the driving force behind the school.  He founded the high school math journal KoMaL, which presented challenging problems so students could write in solutions.  The journal is still in print; you can see sample problems here. Harsanyi and Erdos, along with other prominent mathematicians, were especially good at these competitions.

Ratz also cultivated personal relationships with his most talented students, inviting them to his house and giving them book recommendations.

Here is some biographical information about Ratz, which gives some insight into his ideas on curriculum.  

The basic principle of the Fasori Gimnazium was that students were presented with examples first, and rules for how to solve the problem only after they’d tried to figure it out for themselves.  They also practiced with real statistics, from things like national railway schedules and tables of wheat production. 

Ratz had a particular axe to grind about calculus: he insisted that the concept of derivatives be taught by starting with finite differences.  

Wigner’s recollections about high school noted that they learned Latin, poetry, German, French, botany and zoology, and physics from a history-of-science perspective.  The physics teacher had written his own textbook. He remembered Ratz as exceptionally friendly and encouraging, giving private lessons to von Neumann and lots of books to himself.

It’s hard to determine which, if any, of these things made the Fasori Gimnazium special. But it does point in some useful directions. One-on-one attention from exceptional teachers, a focus on problem-solving and examples, math contests.  It matches my intuition that you only really understand a mathematical concept when you’ve computed it by hand with examples.  

The Calderon-Zygmund Decomposition as Metaphor

The Calderon-Zygmund decomposition is a classic tool in harmonic analysis.

It’s also a part of a reframe of how I think since I started being immersed in this field.

The basic statement of the lemma is that all integrable functions can be decomposed into a “good” part, where the function is bounded by a small number, and a “bad” part, where the function can be large, but locally has average value zero; and we have a guarantee that the “bad” part is supported on a relatively small set.

Explicitly,

Let f \in \mathbb{R^n}, \int_{\mathbb{R}^n} |f(x)| dx < \infty, and let \alpha > 0.  Then there exists a countable collection of disjoint cubes Q_j such that for each $j$

\alpha < \frac{1}{|Q_j|} \int_{Q_j} |f(x)| dx < 2^n \alpha

(that is, the average value of f on the “bad” cubes is not too much bigger than \alpha)

\sum |Q_j| \le \frac{1}{\alpha} \int_{\mathbb{R}^n} |f(x)|dx

(that is, we have an upper bound on the size of the “bad” cubes)

and f(x) \le \alpha for almost all x not in the union of the Q_j. In other words, f is small outside the cubes, the total size of the cubes isn’t too big, and it’s not that big even on the cubes.

In particular, if we define

g(x) = f(x) outside the cubes, g(x) = \frac{1}{|Q_j|} \int_{Q_j} f(t) dt on each cube, and b(x) = f(x) - g(x), then b(x) = 0 outside the cubes, and has average value zero on each cube.  The “good” function g is bounded by \alpha; the “bad” function b is only supported on the cubes, and has average value zero on those cubes.

Why is this true? The basic sketch of the proof  involves taking a big grid of cubes, asking on each one if the average of f is less than \alpha or not; if not, the cube is a “bad” cube and we make it one of the Q_j, and if not, we keep subdividing, each cube being subdivided into 2^n daughter cubes.

The intuition here is that functions which are more or less regular (an integrable function has to decay at infinity and not be too singular at zero) can be split into a “good” part that’s either small or locally constant, and a “bad” part that can be wiggly, but only on small regions, and always with average value zero on those regions.

This is the basic principle behind multiscale decompositions.  You take a function on, say, the plane; you decompose it into a “gist” function which is constant on squares of size 1, and a “wiggle” function which is the difference. Then throw away the gist, look at the wiggle, look at squares of side-length 1/2, and again decompose it into a gist which is constant on squares and a wiggle which is everything else.  And keep going.  Your original function is going to be the sum of all the wiggles — or all the gists, depending on how you want to look at it.

But the nice thing about this is that you’re only using local information.  To compute f(x), you need to know what size-1 box x is in, for the first gist, and then which size-1/2 box, for the first wiggle, and then which size-1/4 box, for the second wiggle, and so on, but you only need to know the wiggles and gists in those boxes.  And if the value of f changed outside the box, the decomposition and approximate value of f(x) wouldn’t change.

So how does this reframe how you see things in the real world?

Well, there are endless debates about whether you “can” capture a complex phenomenon with a simple model.  Can human behavior “really” be reduced to an algorithm? Can you “really” describe economics or biology with equations?  Is this the “right” definition to capture this idea?

To my view, that’s a wrong question. The right question is always “How much information do I lose by making this simplifying approximation?”  A “natural” degree of roughness of your approximation is the turning point where more detail won’t give you much more accuracy.

Multiscale decompositions give you a way of thinking about the coarseness of approximations.

In regions where a function is almost constant, or varying slowly, one layer of approximation is pretty good. In regions where it fluctuates rapidly and at varying scales (think “more like a fractal”), you need more layers of approximation.  A function that has rapid decay in its wavelet coefficients (the “wiggles” shrink quickly) can be approximated more coarsely than a function with slow decay.  These are the functions where the “bad part” of the “bad part” of the “bad part” and so on (in the Calderon-Zygmund sense) remains fairly big rather than rapidly disappearing.  (Of course, since the “bad part” is restricted to cubes, you can compute this separately in each cube, and require a different level of accuracy in different parts of the domain of the function.)

Definitions are approximations. You can define a category by its prototypical, average member, and then define subcategories by how they differ from that average, and sub-sub-categories by how they differ from the average of the sub-categories.

The hierarchical structure allows you to be much more efficient; you can skip the extra detail when it’s not warranted.  In fact, there’s a fair amount of evidence that this is how the human brain structures information.

The language of harmonic analysis deals a lot with how to relate measures of regularity (basically, bounds on integrals, or measures of smoothness) with measures of coefficient decay (basically, how deep down the tree of successive approximations do you need to go to get a good estimate). Calderon-Zygmund decomposition is just one of the simpler cases.  But the basic principle of “nicer functions permit rougher approximations” is a really good framing device to dissolve questions about choosing definitions and models. Debates about “this model can never capture all the complexity of the real thing” vs. “this model is a useful simplification” should be replaced by debates about how amenable the phenomenon is to approximation, and which model gives you the most accurate picture relative to its simplicity.

How I Read: the Jointed Robot Metaphor

“All living beings, whether born from eggs, from the womb, from moisture, or spontaneously; whether they have form or do not have form; whether they are aware or unaware, whether they are not aware or not unaware, all living beings will eventually be led by me to the final Nirvana, the final ending of the cycle of birth and death. And when this unfathomable, infinite number of living beings have all been liberated, in truth not even a single being has actually been liberated.” The Diamond Sutra

What do you do when you read a passage like this?

If you’re not a Buddhist, does it read like nonsense?

Does it seem intuitively true or deep right away?

What I see when I read this is a lot of uncertainty.  What is a living being that does not have form?  What is Nirvana anyway, and could there be a meaning of it that’s not obviously incompatible with the laws of physics?  And what’s up with saying that everyone has been liberated and nobody has been liberated?

Highly metaphorical, associative ideas, the kind you see in poetry or religious texts or Continental philosophy, require a different kind of perception than you use for logical arguments or proofs.

The concept of steelmanning is relevant here. When you strawman an argument, you refute the weakest possible version; when you steelman an argument, you engage with the strongest possible version.   Strawmanning impoverishes your intellectual life. It does you no favors to spend your time making fun of idiots.  Steelmanning gives you a way to test your opinions against the best possible counterarguments, and a real possibility of changing your mind; all learning happens at the boundaries, and steelmanning puts you in contact with a boundary.

A piece of poetic language isn’t an argument, exactly, but you can do something like steelmanning here as well.

When I read something like the Diamond Sutra, my mental model is something like a robot or machine with a bunch of joints.

Each sentence or idea could mean a lot of different things. It’s like a segment with a ball-and-socket joint and some degrees of freedom.  Put in another idea from the text and you add another piece of the robot, with its own degrees of freedom, but there’s a constraint now, based on the relationship of those ideas to each other.  (For example: I don’t know what the authors mean by the word “form”, but I can assume they’re using it consistently from one chapter to another.)  And my own prior knowledge and past experiences also constrain things: if I want the Diamond Sutra to click into the machine called “Sarah’s beliefs,” it has to be compatible with materialism (or at least represent some kind of subjective mental phenomenon encoded in our brains, which are made of cells and atoms.)

If I read the whole thing and wiggle the joints around, sooner or later I’ll either get a sense of “yep, that works, I found an interpretation I can use” when things click into place, or “nope, that’s not actually consistent/meaningful” when I get some kind of contradiction.

I picture each segment of the machine as having a continuous range of motion. But the set of globally stable configurations of the whole machine is discrete. They click into place, or jam.

You can think of this with energy landscape or simulated-annealing metaphors. Or you can think of it with moduli space metaphors.

This gives me a way to think about mystical or hand-wavy notions that’s not just free-association or “it could mean anything”, which don’t give me enough structure.  There is structure, even when we’re talking about mysticism; concepts have relationships to other concepts, and some ways of fitting them together are kludgey while others are harmonious.

It can be useful to entertain ideas, to work out their consequences, before you accept or reject them.

And not just ideas. When I go to engage in a group activity like CFAR, the cognitive-science-based self-improvement workshop where I spent this weekend, I naturally fall into the state of provisionally accepting the frame of that group.  For the moment, I assumed that their techniques would work, engaged energetically with the exercises, and I’m waiting to evaluate the results objectively until after I’ve tried them.  My “machine” hasn’t clicked completely yet — there are still some parts of the curriculum I haven’t grokked or fit into place, and I obviously don’t know about the long-term effects on my life.  But I’m going to be wiggling the joints in the back of my mind until it does click or jam.  People who went into the workshop with a conventionally “skeptical” attitude, or who went in with something like an assumption that it could only mean one thing, tended to think they’d already seen the curriculum and it was mundane.

I’m not trying to argue for credulousness.  It’s more like a kind of radical doubt: being aware there are many possible meanings or models and that you may not have pinned down “the” single correct one yet.