Life Update

I’ve just started a job as a data scientist at Recursion Pharmaceuticals.  I’m using machine learning to find new drug compounds.

The basic model is:

  • take some rare diseases that are caused by single genes;
  • simulate these diseases cheaply and at scale with siRNA knockdowns;
  • detect (here’s the machine learning part) how images of sick cells look different from healthy cells
  • observe (machine learning again) which drugs make sick cells look like healthy cells
  • send the promising drugs on to in-vitro and in-vivo screens

This is basically my dream job.  I’ve been torn between math and biology since I was maybe 9; now I get to do both.  And I get to work towards precisely the problems I care about: curing diseases, getting as much purchase as possible out of computational methods in practical applications, reversing Eroom’s Law, etc.  I’m thrilled to be working at Recursion.

Due to company policy, I won’t be able to continue doing freelance lit review any more, at least not for paid projects; I expect to keep doing the occasional free project here and there.

I’ve also made a few updates in my views recently that I thought I’d share here.

  • The boost in my productivity and overall well-being from having a meaningful job, working with people I trust and respect on problems I care about, is enormous. Much more than I’d have expected. I am now much more sympathetic to messages like “too many people are trapped in bullshit jobs”, “pointless busywork in school is harmful”, “it’s bad to be alienated from one’s labor”, etc. I’m more bullish on things like self-employment, unschooling, quitting your job to pursue your passion, and so on; stagnation is a real cost to your soul.
    • I’m reminded of the theories of people like Gabriel Kolko, who said that the bigness of “big business” is an artifact of regulatory capture, in which large businesses are subsidized by the state. In this model, the “natural”, undistorted size of businesses would be smaller, and fewer things would be done that had no real purpose besides checking an officially-required box.  Pointless activity, under this model, is not “natural”; it’s usually forced.
    • I’m sort of playing with the idea of a philosophy of “makerism”, in which the good guys are simply the people who do self-evidently useful things. Building a house or preparing a meal is obviously Useful Work. As is discovering a drug or inventing a tool. In makerism, if you’d have trouble explaining to a precocious twelve-year-old why you’re doing a useful thing, there’s a chance that what you’re doing is bullshit. I’ve sort of poked at the idea of measures of awesomeness and the ecosystem of industry before.  The thing I’m trying to grope towards is productiveness. Not productivity, as in number of hours worked per day, or number of widgets produced per worker, but reaching towards usefulnessvalue, fruitfulnesssubstantialness, good-for-humans-ness.
  • My main update from job searching this time around (in mostly Silicon-Valley-based data science jobs) is that there is a thing called “fit” — how close the applicant’s background and skills are to what the employer is looking for — and the jobs you are an exact fit for will love you, and the ones you’re an imperfect fit for will reject you. For instance, it’s basically not worth it for me to even apply for jobs as a “data engineer”, because I’m not one. “Oh, it’s close to what I know and I can learn it on the job”? Nope. The right job is the one that’s dead center in the middle of your skillset.
    • Also, I had significantly better results applying to companies in the biomedical industry, I assume because I’ve done biomedical stuff in the past (systems-biology research in grad school, a personalized-medicine startup).  The takeaway here is that I expect you have the best shot in jobs that correspond well to your entire background, including things that you might classify as a “side interest”.  If you have a unique combination of skills, look for places that actively want that.
  •  Bay Area software companies seem mostly pretty sane, in that they do not hire the flagrantly unqualified. Don’t expect to bluff your way in.
  • Because there are so many people sharing stories about the opposite experience, I think it behooves me to share mine; I didn’t experience anything that I’d classify as sexism during my job search, even though nearly all my interviewers were male, and so were nearly all the data scientists at the companies where I applied. The closest thing was being told that I was too “nervous” by one interviewer, which is sort of gendered in a statistical sense, but is also legitimately true of me, and not true of all women.
  • I have noticed that a fair number of companies are “segregated”, in that all the engineers are Asian (and foreign-born) while all the managers are white. It seems to correlate really well with, for lack of a better word, “lameness” — companies that are stagnant, hierarchical, complacent, don’t have a strong engineering culture, etc.  I now consider racial glass ceilings to be a red flag.
  • Skills I wish I’d had: better memory for SQL syntax (yes, really), deep learning, computer vision, ETL pipelines
  • Skills I was glad I had: Spark, familiarity with the Python scientific computing & ML libraries, basic ML skills at the level of Hastie & Tibshirani, basic algorithms & data structures.
  • In technical interviews, a lot comes down to “fluency” or “execution” — can you solve simple math and programming problems correctly and quickly? are you checking for small errors? It’s very g-loaded, but I think there’s a skill of “turning your g on”, getting into “performance mode”, which I learned from years of being a math contest kid, and felt myself relearning as I went through the job search process. If you know what I’m talking about, focus on cultivating that, through repetitive practice of fairly-easy things with a high bar for accuracy, rather than studying super-advanced topics.

What’s Your Type: Identity and its Discontents

the-sea-lion-photo-u1

My type is Lisa Frank Sea Lion.

When I was a teenager, I had the intuition that third-wave feminism was a genre of feminine content.  A lot of the feminist books and magazines I came across had pink covers. A lot of them were about sex and relationships and clothes and pop culture — the same sorts of things I looked for in Seventeen magazine. I liked those topics; they gave me a deliciously wicked frisson; and I liked the kind of pop-feminist writing that was about Expressing Yourself; but I was obviously not a predominantly pink-flavored person. I was a serious person.

I am embarrassed to say that I never really appreciated the achievements of Rosalind Franklin until I was much older. I had grown up hearing about her as a “women in STEM” sermon.   I was a woman and I was a scientist, but I had decided that “women in STEM” was not my genre, or at least not so much that I would be in danger of being typecast.  The story of Watson and Crick was about DNA, but the story of Rosalind Franklin was about politics and unfairness and the HR-office side of a scientific career.  Obviously, DNA was more exciting to me at the time. It was only later that it clicked — if she independently discovered the double-helix structure, then she’s as much of a genius and pioneer as they were, arguably more so.  Her discovery belongs in the story of scientific progress, not on the shelf of books with pink covers.

In a liberal paradigm, things like feminism or anti-racism or LGBT rights or religious freedom are about liberating people.  You want to get rid of irrational prejudice and oppression so that people of any origin or creed can be free to do human stuff as they choose. The operative word is people. Sexual harassment, for instance, is wrong because it is an unjust harm to people.  None of this has anything to do with being pink-flavored or rainbow-flavored; you can be a middle-aged man with a dark suit and sober habits and speak out against injustice because it harms people, and you care about people, full stop.

The idea that feminism could be a flavor or a subculture or a genre is bizarre, if you look at it from the liberal paradigm.

But there’s also a market segmentation paradigm in which to think about this.

Market segmentation is a technique that marketers use to target products to certain demographics — and “products” include “content”, that is, books and articles and TV shows and so on.  And, with the rise of the internet and the abundance of consumer data, marketers have become very good at it.

Market segmentation involves identifying you with a type of person. A subculture, a demographic, a style, a flavor, a personality type. Cambridge Analytica, the internet marketing firm behind Trump’s success and the Brexit vote, categorizes people by their personality type in order to target political advertising at them. Marketers write profiles of a “typical” buyer of a product — a simplified bio of what kind of person they’re targeting.

“Red state” vs. “Blue state” is market segmentation. Personality types are market segmentation.  Exaggerated gender dimorphism — all women’s products are pink, all men’s products are black — is market segmentation. Subcultures (“nerd”, “goth”, “hipster”) are market segmentation.  Generations (Boomer, Gen X, Millennial) are market segmentation.

Statistical differences between groups of people obviously exist in the real world, but “identifying as” a category, exaggerating how much you match the category’s flavor and style, choosing a “type” to belong to, is a form of actively playing along with market segmentation, over and above whatever statistical differences exist.  One doesn’t “identify as” being born in 1988, but one does “identify as” a Millennial.

What flavor are you? What’s your type? What product is right for you?

There’s something irresistible about a personality quiz.  Tell me what type I belong to!  Tell me about myself!  It gratifies my vanity, and it helps me feel like I know my place in the world.

(I’m an INTP and a Gryffindor, natch.)

It took me a long time, and Dreyfus’ excellent commentary, to realize this, but Heidegger’s concept of Dasein, which literally translates to Being, is really better understood as the behavior of “identifying as.”

Dasein is what you do when you assert what it means to be human, what it means to be you, what it means to be a member of your community.  Dasein is self-definition.  And, in particular, self-definition with respect to a social context. Where do I fit in society? Who is my tribe? Who am I relative to other people? What’s my type?

“Identifying as” always includes an element of misdirection. Merely describing yourself factually (“I was born in 1988”) is not Dasein. Placing an emphasis, exaggerating, cartoonifying, declaring yourself for a team, is Dasein.  But when you identify as, you say “I am such-and-such”, as though you were merely describing. You’re aligning yourself with your flavor of choice, while at the same time declaring vehemently that you’re only describing the way things are.

Your identity, no matter what it is, is always sort of bullshit or arbitrary or performative.  It’s role-playing. It’s kind of like wearing a mask.

And, for people who like it, there’s a delight in “identifying-as”, of putting yourself in a category, of knowing your type.  It makes you feel simple, well-defined, and important.

I knew a psychologist once who worked with businesses, and loved giving his clients the Myers-Briggs personality test. He told me that the main reason he used it was not the particular personality breakdown, but the simple fact that it divided people into 16 types. People would get into workplace disputes that were basically dominance hierarchies, arguments over who’s right or who’s best or who’s in charge. And he would resolve those disputes by helping people understand that Alice is one Myers-Briggs type and Bob is another; not better, not worse, just different.  “There are 16 kinds of people in the world” allows everyone to feel special (“A type! Just for me!”) and defuses hierarchical tussles, because no one type is on top.

But, of course, there are problems with “identifying-as.”

Paul Graham’s essay “Keep Your Identity Small” observes that the very feature that my psychologist acquaintance liked about personality types — that no type is better than any other — as a problem that makes it impossible to assess merit when identities are in play.

For example, the question of the relative merits of programming languages often degenerates into a religious war, because so many programmers identify as X programmers or Y programmers. This sometimes leads people to conclude the question must be unanswerable—that all languages are equally good. Obviously that’s false: anything else people make can be well or badly designed; why should this be uniquely impossible for programming languages? And indeed, you can have a fruitful discussion about the relative merits of programming languages, so long as you exclude people who respond from identity.

Sometimes there are objective things that can be said about topics that people have chosen to build identities out of. Sometimes a programming language has strengths or weaknesses. Sometimes a government policy has benefits or harms. You might, in some circumstances, care about those objective, on-the-merits evaluations; maybe you want to achieve some goal and want to choose the best programming language for the job. You’re not going to be able to do that if the discussion gets taken over by identity; what people are doing when they’re identifying-as is self-expression or self-definition or self-assertion, which is lovely when you want it, but doesn’t answer any of your practical questions.  Unfortunately, people often do self-expression in the guise of answering your practical questions, and you may not know, or your interlocutor may not even know himself, that he’s really saying “I am a Lisp programmer!!” and not describing anything about the properties of Lisp.  One of the qualities of Dasein is that it’s very very stealthy, and it wants everything to be about Dasein, so it winds up muddying the waters, even when you don’t intend it to.

Coming back to the issue of politics, Dasein can mess up the attempt to solve social problems. If, when you say “sexual harassment”, people hear “feminist shibboleth”, then if they don’t identify as feminists, they may not actually notice the possibility that sexual harassment is a big problem that hurts a lot of human beings and that they might want to take seriously.  Sexual harassment gets perceived as a flag for pink-flavored people to wave, and if you’re not pink-flavored, you’re not the target market, so you don’t take it seriously.

If something matters generally, or is true objectively, regardless of subcultures, personality types, and tribes, then the identity mindset will be inadequate to deal with it.

Identity is obviously a really big part of the human experience. Heidegger thinks it’s essential and cannot be excised, and people who think they’ve achieved objectivity are fooling themselves. Without making that strong an absolute claim, I think it’s fair to say that identity is pervasive, and if you think it’s not an issue for you and have never considered it before, you should probably take a closer look and see how much it affects your life.

It’s also worth noting that Heidegger was a member of the Nazi Party, and that Nazism (as described in Mein Kampf) is all about how objectivity is terrible and how strong feelings of identity, specifically national and racial identity, are the best thing ever.  So there are some reasons to be suspicious of putting identity first at the expense of all other considerations.

Identity is always vivid, personal, flavorful.  It’s not “mere” fact, it’s alive with emphasis and exaggeration.  It’s never bland or dry.  I think that’s part of its appeal.  It makes you special, it makes you valid, it makes you distinctive.  It adds vim and verve to your self-image. It’s like all-caps and italics for your soul.

It may be dull in terms of information content (what it says is, always and forever, “I AM!!!”) but it’s never lacking in personal flair.

Most people I know who think about “identity” are rather like Paul Graham; they don’t have that strong a craving for it, and they’re frequently getting annoyed that other people are caught up in it. Or, they seek very specialized and cordoned-off ways to provide it for themselves: think of secular atheists who create rituals or highly independent introverts who contemplate the human need for community. I come at this from the opposite direction: I am a person who likes things hot-pink and in all caps, who always craves a higher emotional temperature, and who has been learning about how to navigate the fact that this is sometimes damaging and worth avoiding.

So, coming from that perspective, I’m genuinely unsure: do we want to channel identifying-as into safe, satisfying forms of pretend-play, or do we want to just have less of it?  To what extent is it even possible to channel or reduce it?

Strong AI Isn’t Here Yet

Epistemic Status: moderately confident. Thanks to Andrew Critch for a very fruitful discussion that clarified my views on this topic.  Some edits due to Thomas Colthurst.

I’ve heard a fair amount of discussion by generally well-informed people who believe that bigger and better deep learning systems, not fundamentally different from those which exist today, will soon become capable of general intelligence — that is, human-level or higher cognition.

I don’t believe this is true.

In other words, I believe that if we develop strong AI in some reasonably short timeframe (less than a hundred years from now or something like that), it will be due to some conceptual breakthrough, and not merely due to continuing to scale up and incrementally modify existing deep learning algorithms.

To be clear on what I mean by a “breakthrough”, I’m thinking of things like neural networks (1957) and backpropagation (1986) [ETA: actually dates back to 1974, from Paul Werbos’ thesis] as major machine learning advances, and types of neural network architecture such as LSTMs (1997), convolutional neural nets (1998), or neural Turing machines (2016) as minor advances.

I’ve spoken to people who think that we will not need even minor advances before we get to strong AI; I think this is very unlikely.

Predicate Logic and Probability

As David Chapman points out in Probability Theory Does Not Extend Logic, one of the important things humans can do is predicate calculus, also known as first-order logic. Predicate calculus allows you to use the quantifiers “for all” and “there exists” as well as the operators “and”, “or”, and “not.”

Predicate calculus makes it possible to make general claims like “All men are mortal”.  Propositional calculus, which consists only of “and”, “or”, and “not”, cannot make such statements; it is limited to statements like “Socrates is mortal” and “Plato is mortal” and “Socrates and Plato are men.”

Inductive reasoning is the process of making predictions from data. If you’ve seen 999 men who are mortal, Bayesian reasoning tells you that the 1000th man is also likely to be mortal. Deductive reasoning is the process of applying general principles: if you know that all men are mortal, you know that Socrates is mortal.  In human psychological development, according to Piaget, deductive reasoning is more difficult and comes later — people don’t learn it until adolescence.  Deductive reasoning depends on predicate calculus, not just propositional calculus.

It’s possible to view propositional calculus as an extension of probability theory. For instance, MIRI’s logical induction paper constructs a (not very efficient) algorithm for assigning probabilities to all sentences in a propositional logic language plus some axioms, such that the probabilities learn to approximate the true computed values faster than it would take to compute the truth of propositions.  For example, if we are given the axioms of first-order logic, the logical induction criterion gives us a probability distribution over all “worlds” consistent with those axioms. (A “world” is an assignment of Boolean truth values to sentences in propositional calculus.)

What’s not necessarily known is how to assign probabilities to sentences in predicate calculus in a way consistent with the laws of probability.

Part of why this is so difficult is because it touches on questions of ontology. To translate “All men are mortal” into probability theory, one has to define a sample space. What are “men”?  How many “men” are there? If your basic units of data are 64×64 pixel images, how are you going to divide that space up into “men”?  And if tomorrow you upgrade to 128×128 images, how can you be sure that when you construct your collection of “men” from the new data, that it’s consistent with the old collection of “men”?  And how do you set up your statements about “all men” so that none of them break when you change the raw data?

This is the problem I alluded to in Choice of Ontology.  A type of object that behaves properly under ontology changes is a concept, as opposed to a percept (a cluster of data points that are similar along some metric.)  Images that are similar in Euclidean distance to a stick-figure form a percept, but “man” is a concept. And I don’t think we know how to implement concepts in machine-learning language, and I think we might have to do so in order to “learn” predicate-logic statements.

Stuart Russell wrote in 2014,

An important consequence of uncertainty in a world of things: there will be uncertainty about what things are in the world. Real objects seldom wear unique identifiers or preannounce their existence like the cast of a play. In the case of vision, for example, the existence of objects must be inferred from raw data (pixels) that contain no explicit object references at all. If, however, one has a probabilistic model of the ways in which worlds can be composed of objects and of how objects cause pixel values, then inference can propose the existence of objects given only pixel values as evidence. Similar arguments apply to areas such as natural language understanding, web mining, and computer security.

The difference between knowing all the objects in advance and inferring their existence and identity from observation corresponds to an important but often overlooked distinction between closed-universe languages such as SQL and logic programs and open-universe languages such as full first-order logic.

How to deduce “things” or “objects” or “concepts” and then perform inference about them is a hard and unsolved conceptual problem.  Since humans do manage to reason about objects and concepts, this seems like a necessary condition for “human-level general AI”, even though machines do outperform humans at specific tasks like arithmetic, chess, Go, and image classification.

Neural Networks Are Probabilistic Models

A neural network is composed of nodes, which take as inputs values from their “parent” nodes, combine them according to the weights on the edges, transform them according to some transfer function, and then pass along a value to their “child” nodes. All neural nets, no matter the difference in their architecture, follow this basic format.

A neural network is, in a sense, a simplification of a Bayesian probability model. If you put probability distributions rather than single numbers on the edge weights, then the neural network architecture can be interpreted probabilistically. The probability of a target classification given the input data is given by a likelihood function; there’s a prior over the distribution of weights; and as data comes in, you can update to a posterior distribution over the weights, thereby “learning” the correct weights on the network.  Doing gradient descent on the weights (as you do in an ordinary neural network) finds the maximum likelihood values of the posterior distributions on the weights in the Bayesian network paradigm.

What this means is that neural networks are simplifications or restrictions of probabilistic models. If we don’t know how to solve a problem with a Bayesian network, then a fortiori we don’t know how to solve it with deep learning either (except for considerations of efficiency and scale — deep neural nets can be much larger and faster than Bayes nets.)

We don’t know how to assign and update probabilities on predicate statements using Bayes nets, in a coherent and general manner. So we don’t know how to do that with neural nets either, except to the degree that neural nets are simpler or easier to work with than general Bayes nets.

For instance, as Thomas Colthurst points out in the comments, message passing algorithms don’t provably work in general Bayes nets, but do work in feedforward neural nets, which don’t have cycles. It may be that neural nets provide a restricted domain in which modeling predicate statements probabilistically is more tractable. I would have to learn more about this.

Do You Feel Lucky?

If you believe that learning “concepts” or “objects” is necessary for general intelligence (either for reasons of predicate logic or otherwise), then in order to believe that current deep learning techniques are already capable of general intelligence, you’d have to believe that deep networks are going to figure out how to represent objects somehow under the hood, without human beings needing to have conceptual understanding of how that works.

Perhaps, in the process of training a robot to navigate a room, that robot will represent the concept of “chairs” and “tables” and even derive general claims like “objects fall down when dropped”, all via reinforcement learning.

I find myself skeptical of this.

In something like image recognition, where convolutional neural networks work very well, there’s human conceptual understanding of the world of vision going on under the hood. We know that natural 2-d images generally are fairly smooth, so expanding them in terms of a multiscale wavelet basis is efficient, and that’s pretty much what convnets do.  They’re also inspired by the structure of the visual cortex.  In some sense, researchers know some things about how image recognition works on an algorithmic level.

I suspect that, similarly, we’d have to have understanding of how concepts work on an algorithmic level in order to train conceptual learning.  I used to think I knew how they worked; now I think I was describing high-level percepts, and I really don’t know what concepts are.

The idea that you can throw a bunch of computing power at a scientific problem, without understanding of fundamentals, and get out answers, is something that I’ve become very skeptical of, based on examples from biology where bigger drug screening programs and more molecular biology understanding don’t necessarily lead to more successful drugs.  It’s not in-principle impossible that you could have enough data to overcome the problem of multiple hypothesis testing, but modern science doesn’t have a great track record of actually doing that.

Getting artificial intelligence “by accident” from really big neural nets seems unlikely to me in the same way that getting a cure for cancer “by accident” from combining huge amounts of “omics” data seems unlikely to me.

What I’m Not Saying

I’m not saying that strong AI is impossible in principle.

I’m not saying that strong AI won’t be developed, with conceptual breakthroughs.  Researchers are working on conceptually novel approaches like differentiable computing and program induction that might lead to machines that can learn concepts and predicates.

I’m not saying that narrow AI might not be a very big deal, economically and technologically and culturally.

I’m not trying to malign the accomplishments of people who work on deep learning. (I admire them greatly and am trying to get up to speed in the field myself, and think deep learning is pretty awesome.)

I’m saying that I don’t think we’re done.

 

 

More on Image Recognition Progress

In my post on AI progress, I picked a few benchmark tasks, to see how machine learning algorithms have improved over the past few decades. Obviously those aren’t a comprehensive list, so I thought I’d add a few more.

I also claimed in that post that image recognition was “slowing down”, because the rate of improvement in accuracy percent was diminishing. I’ve since been convinced that a more meaningful metric for performance in image classification (or any classification task where perfect accuracy means 100% correct) is the negative log of the error rate. Obviously, as we approach perfect classification, no matter how quickly we do so, the raw percent accuracy score must “flatten out” because it’s bounded above by 100%. Transforming it into a log scale means that an error that decayed exponentially to zero over time would look like “linear progress”, which seems more natural. “Linear progress” on a log scale, given a continuation of Moore’s law, also means something like linear returns to computing power — i.e. scaling and parallelization don’t present much in the way of an impediment.

From this dataset, I found (crowdsourced) papers and dates for performance on six image recognition benchmarks datasets, and graphed -log(error) over time.

screen-shot-2017-02-19-at-5-35-33-pmscreen-shot-2017-02-19-at-5-36-18-pmscreen-shot-2017-02-19-at-5-36-56-pmscreen-shot-2017-02-19-at-5-37-30-pmscreen-shot-2017-02-19-at-5-34-27-pm

With the possible exception of MNIST, all of these show a positive trend over time, and some are clearly linear. Most of these data points come from deep learning algorithms, except a few of the very earliest ones.

For reference, here’s the ImageNet performance data from the past post, but transformed to -log(error) instead of accuracy percent. It, too, looks linear.

screen-shot-2017-02-19-at-5-55-40-pm

The picture here looks quite similar to the performance over time of AI at chess and go, which looks roughly linear in Elo score (also a measure that is roughly logarithmic in the “raw” percent of games won.)  Progress since the advent of deep learning has been steady. Returns to computing power appear roughly linear in Elo score, and also roughly linear in -log(error).

What does this mean, as a bottom line for the future of AI?

In those areas where deep learning can be successful, it seems like scaling is not an insurmountable problem: if you put more computational resources in, you can get more performance out.  The curve’s not going to bend upward — deep learning algorithms don’t get smarter per GPU if you add more GPU’s — but, at least for the past few years, marginal returns to GPUs and training data have been more or less constant, not falling.  “Just do exactly what you’re doing, but more so” should yield steady improvements in narrow AI performance, at least for a while.

 

Freelancing Announcement

For those of you who don’t know, I do freelance research, somewhat similar to the lit review on this blog, for people who want, and I have been persuaded to set up a website where people can buy projects from me.

https://constantinresearch.net/

What I do as a freelancer is very much like what I used to do at MetaMed (at much lower cost to the customer). If you have a medical question, say “I have chronic fatigue syndrome, is there anything that actually works for that?” or “are the chemicals in my rug safe for children?” then I’ll head to the scientific literature, pick out as much relevant information as I can find, and write up a well-referenced, concrete, direct answer, to the best of what my ability and the current state of research can offer.

I can also do this for non-personal questions (my STD stats post is an example of a general-audience overview of a topic) and to some extent to non-medical questions (I’ve dabbled in lit review on social-science topics, though I’m less sure of my footing there.) I also have experience with statistics and data analysis, so where data is available I can do more quantitative analysis.

I am not a doctor; in a medical emergency you should find a professional, not me.

What I am useful for is information gathering and synthesis.  I can help people with the kind of research this guy did on himself — a medical student with a rare disease who figured out a new treatment that worked on him when the standard of care didn’t.

If this sounds good to you, click on the site and check it out!

Performance Trends in AI

Epistemic Status: moderately confident

Edit To Add: It’s been brought to my attention that I was wrong to claim that progress in image recognition is “slowing down”. As classification accuracy approaches 100%, obviously improvements in raw scores will be smaller, by necessity, since accuracy can’t exceed 100%. If you look at negative log error rates rather than raw accuracy scores, improvement in image recognition (as measured by performance on the ImageNet competition) is increasing roughly linearly over 2010-2016, with a discontinuity in 2012 with the introduction of deep learning algorithms.

Deep learning has revolutionized the world of artificial intelligence. But how much does it improve performance?  How have computers gotten better at different tasks over time, since the rise of deep learning?

In games, what the data seems to show is that exponential growth in data and computation power yields exponential improvements in raw performance. In other words, you get out what you put in. Deep learning matters, but only because it provides a way to turn Moore’s Law into corresponding performance improvements, for a wide class of problems.  It’s not even clear it’s a discontinuous advance in performance over non-deep-learning systems.

In image recognition, deep learning clearly is a discontinuous advance over other algorithms.  But the returns to scale and the improvements over time seem to be flattening out as we approach or surpass human accuracy.

In speech recognition, deep learning is again a discontinuous advance. We are still far away from human accuracy, and in this regime, accuracy seems to be improving linearly over time.

In machine translation, neural nets seem to have made progress over conventional techniques, but it’s not yet clear if that’s a real phenomenon, or what the trends are.

In natural language processing, trends are positive, but deep learning doesn’t generally seem to do better than trendline.

Chess

chesselo1

These are Elo ratings of the best computer chess engines over time.

There was a discontinuity in 2008, corresponding to a jump in hardware; this was the Rybka 2.3.1, a tree-search-based engine without any deep learning or indeed probabilistic elements. Apart from that, progress looks roughly linear.

Here again is the Swedish Chess Computer Association data on Elo scores over time:

chesselo2

Deep learning chess engines have only just recently been introduced; Giraffe, originated by Matthew Lai at Imperial College London, was created in 2015. It only has an Elo rating of 2412, about equivalent to late-90’s-era computer chess engines. (Of course, learning to predict patterns in good moves probabilistically from data is a more impressive achievement than brute-force computation, and it’s quite possible that deep-learning-based chess engines, once tuned over time, will improve.)

Go

(Figures from the Nature paper on AlphaGo.)

alphago.png

Fan Hui is a human player.  Alpha Go performed notably better than its predecessors Crazy Stone (2008, beat human players in mini go games), Pachi (2011), Fuego (2010), and GnuGo, all MCTS programs, but without deep learning or GPUs. AlphaGo uses much more hardware and more data.

Miles Brundage has argued that AlphaGo doesn’t represent that much of a surprise given the improvements in hardware and data (and effort).  He also graphed the returns in Elo rating to hardware by the AlphaGo team:

alphagovhardware

In other words, exponential growth in hardware produces only roughly linear (or even sublinear) growth in performance as measured by Elo score. To do better would require algorithmic innovation as well.

Arcade Games

Artificial Atari games are scored relative to a human professional playtester: (Computer score – random play)/(Human score – random play).

Compare to Elo scores: the ratio of expected scores for player A vs. player B is Q_A / Q_B, where Q_A = 10^(E_A/400), E_A being the Elo score.

Linear growth in Elo scores is equivalent to exponential growth in absolute scores.

Miles Brundage’s blog also offers a trend in Atari performance that looks exponential:

atari

This would, of course, still be plausibly linear in Elo score.

Superhuman performance at arcade games is already here:

ataribygame

This was a single reinforcement learner trained with a convolutional neural net over images of the game screen outputting behaviors (arrows).  Basically it’s dynamic programming, with a nonlinear approximation of the Q-function that estimates the quality of a move; in Deepmind’s case, that Q-function approximator is a convolutional neural net.  Apart from the convnet, Q-learning with function approximation has been around since the 90’s and Q-learning itself since 1989.

Interestingly enough, here’s a video of a computer playing Breakout:

https://www.youtube.com/watch?v=UXgU37PrIFM

It obviously doesn’t “know” the law of reflection as a principle, or it would place the bar near where the ball will eventually land, and it doesn’t.  There are erratic jerky movements that obviously could not in principle be optimal.  It does, however, find the optimal strategy of tunnelling through the bricks and hitting the ball behind the wall.  This is creative learning but not conceptual learning.

You can see the same phenomenon in a game of Pong:

https://www.youtube.com/watch?v=YOW8m2YGtRg

The learned agent performs much better than the hard-coded agent, but moves more jerkily and “randomly” and doesn’t know the law of reflection.  Similarly, the reports of AlphaGo producing “unusual” Go moves are consistent with an agent that can do pattern-recognition over a broader space than humans can, but which doesn’t find the “laws” or “regularities” that humans do.

Perhaps, contrary to the stereotype that contrasts “mechanical” with “outside-the-box” thinking, reinforcement learners can “think outside the box” but can’t find the box?

ImageNet

Image recognition as measured by ImageNet classification performance has improved dramatically with the rise of deep learning.

imagenet

There’s a dramatic performance improvement starting in 2012, corresponding to Geoffrey Hinton’s winning entry, followed by a leveling-off.  Plausibly accuracy is an S-shaped curve.

How does accuracy scale with processing power?

This paper from Baidu illustrates:

baiduscurve

The performance of a deep neural net follows an S-shaped curve over time spent training, but works faster with more GPUs.  How much faster?

baiduscaling

Each doubling in GPUs provides only a linear boost in speed.  At a given time interval for training (as one would have in a timed competition), this means that doubling the number of GPUs would result in a sublinear boost in accuracy.

MNIST

mnist

Using the performance data from Yann LeCun’s website, we can see that deep neural nets hugely improved MNIST digit recognition accuracy. The best algorithms of 1998, which were convolutional nets and boosted convolutional nets due to LeCun, had error rates of 0.7-0.8. Within 5 years, that had dropped to error rates of 0.4, within 10 years, to 0.39 (also a convolutional net), within 15 years, to 0.23, and within 20 years, to 0.21.  Clearly, performance on MNIST is leveling off; it took five years to halve and then 20 years to halve again.

As with ImageNet, we may be getting close to the limits of deep-learning performance (which may easily be human-level.)

Speech Recognition

Before the rise of deep learning, speech recognition was already progressing rapidly, though it was leveling off in conversational speech well above the 10% accuracy rate.

speech

Then, in 2011, the advent of context-dependent deep neural network hidden Markov models produced a discontinuity in performance:

speechdeep.png

More recently, accuracy has continued to progress:

Nuance, a dictation software company, shows steadily improving performance on word recognition through to the present day, with a plausibly exponential trend.

nuance

Baidu has progressed even faster, as of 2015, in speech recognition on Mandarin.

baiduspeech.png

As of 2016, the best performance on the NIST 2000 Switchboard set (of phone conversations) is due to Microsoft, with a word-error rate of 6.3%.

Translation

Machine translation is evaluated by BLEU score, which compares the translation to the reference via overlap in words or n-grams.  BLEU scores range from 0 to 1, with 1 being perfect translation.  As of 2012, Tilde’s  had BLEU scores in the 0.25-0.45 range, with Google and Microsoft performing similarly but worse.

In 2016, Google came out with a new neural-network-based version of its translation tool.  BLEU scores on English -> French and English -> German were 0.404 and 0.263 respectively. Human evaluations, however, rated the neural machine translations 60-87% better.

OpenMT, the machine translation contest, had top BLEU scores in 2012 of about 0.4 for Arabic-to-English, 0.32 for Chinese-to-English, 0.24 for Dari-to-English, 0.27 for Farsi-to-English, and 0.11 for Korean-to-English.

In 2008, Urdu-to-English had top BLEU scores of 0.32, Arabic-to-English scores of 0.48, and Chinese-to-English scores of 0.30.

This doesn’t correspond to an improvement in machine translation at all. Apart from Google’s improvement in human ratings, celebrated in this New York Times Magazine article, it’s unclear whether neural networks actually improve BLEU scores at all. On the other hand, scoring metrics may be an imperfect match to translation quality.

Natural Language Processing

The Association for Computational Linguistics Wiki has some numbers on state of the art performance for various natural language processing tasks.

SAT analogies have been becoming more accurate over time, roughly linearly, until the present day when they are roughly as accurate as the average US college applicant.  None of these are deep learning techniques.

satanalogies.png

Question answering (multiple choice of sentences that answer the question) has improved roughly steadily over time, with a discontinuity around 2006.  Neural nets did not start being used until 2014, but were not a discontinuous advance from the best models of 2013.

questions.png

Paraphrase identification (recognizing if one paragraph is a paraphrase of another) seems to have risen steadily over the past decade, with no special boost due to deep learning techniques; the top performance is not from deep learning but from matrix factorization.

paraphrase

On NLP tasks that have a long enough history to graph, there seems to be no clear indication that deep learning performs above trendline.

Trends relative to processing power and time

Performance/accuracy returns to processing power seem to differ based on problem domain.

In image recognition, we see sublinear returns to linear improvements in processing power, and gains leveling off over time as computers reach and surpass human-level performance. This may mean simply that image recognition is a nearly-solved problem.

In NLP, we see roughly linear improvements over time, and in machine translation, it’s unclear if we see any trends in improvements over time, both of which suggest sublinear returns to processing power, but this is not very confident.

In games, we see roughly linear returns to linear improvements in processing power, which means exponential improvements in performance over time (because of Moore’s law and increasing investment in AI).

This would suggest that far-superhuman abilities are more likely to be possible in game-like problem domains.

What does this imply about deep learning?

What we’re seeing here is that deep learning algorithms can provide improvements in narrow AI across many types of problem domains.

Deep learning provides discontinuous jumps relative to previous machine learning or AI performance trendlines in image recognition and speech recognition; it doesn’t in strategy games or natural language processing, and machine translation and arcade games are ambiguous (machine translation because metrics differ; arcade games because there is no pre-deep-learning comparison.)

A speculative thought: perhaps deep learning is best for problem domains oriented around sensory data? Images or sound, rather than symbols. If current neural net architectures, like convolutional nets, mimic the structure of the sensory cortex of the brain, which I think they do, one would expect this result.

Arcade games would be more analogous to the motor cortex, and perceptual control theory suggests that something similar to Q-learning may be going on in motor learning, though I’d have to learn more to be confident in that.  If mammalian motor learning turns out to look like Q-learning, I’d expect deep reinforcement learning to be especially good in arcade games and robotics, just as deep neural networks are especially good in visual and audio classification.

Deep learning hasn’t really proven itself better than trendline in strategy games (Go and chess) or in natural language tasks.

I might wonder if there are things humans can do with concepts and symbols and principles, the traditional tools of the “higher intellect”, the skills that show up on highly g-loaded tasks, that deep learning cannot do with current algorithms. Obviously hard-coding rules into an AI has grave limitations (the failure of such hard-coded systems was what caused several of the AI winters), but there may also be limitations to non-conceptual pattern recognition.  The continued difficulty of automating language-based tasks may be related to this issue.

Miles Brundage points out,

Progress so far has largely been toward demonstrating general approaches for building narrow systems rather than general approaches for building general systems. Progress toward the former does not entail substantial progress toward the latter. The latter, which requires transfer learning among other elements, has yet to have its Atari/AlphaGo moment, but is an important area to keep an eye on going forward, and may be especially relevant for economic/safety purposes.

I agree.  General AI systems, as far as I know, do not exist today, and the million-dollar question is whether they can be built with algorithms similar to those used today, or if there are further fundamental algorithmic advances that have yet to be discovered. So far, I think there is no empirical evidence from the world of deep learning to indicate that today’s deep learning algorithms are headed for general AI in the near future.  Discontinuous performance jumps in image recognition and speech recognition with the advent of deep learning are the most suggestive evidence, but it’s not clear whether those are above and beyond returns to processing power. And so far I couldn’t find any estimates of trends in cross-domain generalization ability.  Whether deep learning algorithms can be general-purpose is perhaps a more theoretical question; what we can say is that recent AI progress doesn’t offer any reason to suspect that they already are.

Life Extension Possibilities

Epistemic Status: Pretty confident

This is my first pass of a lit review of life-extension interventions apart from caloric restriction, with a focus on things that work in mammals (rather than fruit flies or other invertebrates.)

Intervention Longevity Increase
Ames dwarf mice 50%
PAPP-A knockout mice 38%
Irs knockout mice 32% (female only)
AC5 knockout mice 32%
Low methionine diet 30%
High dose rapamycin 25%
High dose vitamin E 15% females, 40% males
Lower core body temperature 12% males, 20% females
Low dose rapamycin 10-18%
NGDA 10% (male only)
Statins + ACE inhibitors 9%
Selegiline 7%
Metformin 4-5%

Bottom Lines

  • Low methionine diets (roughly, vegan diets) work really well at extending life in mice, and there’s a plausible mechanism (avoiding homocysteine buildup) that they might work in humans as well.  If it worked as well on humans as it does on mice, the average person would live to over 100.
  • Rapamycin extends life in mice by quite a lot. Unfortunately it’s a strong immunosuppressant, so isn’t very safe to use as a drug.
  • There’s a lot of evidence that the IGF/insulin signaling/growth hormone metabolic pathway is associated with aging and short lifespan, and that inhibiting genes on that pathway results in longer lifespan.  IGF-receptor-inhibiting or growth-hormone-inhibiting drugs could be studied for longevity, but haven’t yet.
  • The MAO inhibitor selegiline extends life in both mice and dogs.
  • Metformin seems to work, and is currently being studied in a human trial.
  • NDGA, an antioxidant derived from the creosote bush, might work, but it’s also toxic.
  • Sirtuin drugs and resveratrol don’t work.

Low methionine

60 Fischer rats fed a low-methionine diet lived 30% longer than control rats. The low-methionine rats grew significantly less as well.[1]

80 female mice fed a low-methionine diet lived longer than control mice, at p < 0.02; they also were lower in weight, lower in IGF, insulin, glucose, and thyroxine, had fewer cataracts, and experienced less loss of liver function in response to injected acetaminophen.[2]’

Some tumors are dependent on methionine to grow and will not kill methionine-starved mice as fast.[28]

Homocysteine is biosynthesized from methionine.  Homocysteine levels rise as we age and are associated with many diseases of aging, such as heart disease, cancer, stroke, Alzheimer’s, and presbyopia. Genetic conditions that cause homocysteinuria in younger people cause similar problems: vascular thrombosis, intellectual disability, lens disclocation.  Homocysteine levels are also associated with depression[32] and schizophrenia.[33]  Homocysteine is toxic and reacts to “homocysteinylate” many different kinds of proteins, rendering them ineffective.[29]  It might also cause its damage through oxidation, impaired methylation, or other chemical mechanisms.[30]  If you give a rabbit homocysteine injections, it’ll develop atherosclerosis.[31]

Children with homocysteinuria have been successfully treated with low-methionine diets.[34][35][36] This is now the standard treatment for patients with genetic homocysteinuria who don’t respond to vitamin B supplementation. A low-methionine diet in humans consists of abstaining from meat, fish, and dairy, instead getting protein from soy and vegetables, and making up the caloric deficit with fat.

Growth Hormone and IGF Inhibition

Rats which were heterozygous for an antisense growth-hormone transgene lived 7-10% longer than control rats. They were also smaller and had lower levels of IGF. [3]

Ames dwarf mice lack growth hormone, prolactin, and TSH, and live about 50% longer than normal mice due to a Prop1 mutation.[22]

Humans with Prop1 mutations lack growth hormone and so have short stature, hypothyroid, cortisol deficiency, and failure to go through puberty.[37]  Humans with growth hormone receptor deficiency in Ecuador had short stature and were obese but had a much lower incidence of cancer and diabetes, and greater insulin sensitivity, than their normal relatives.  They did not have higher longevity because they had higher rates of alcoholism and accidents.[38]

Female mice missing an IGF receptor (Irs1 -/-) live 32% longer on average; male Irs1 -/- mice have no change in longevity.  These mice are insulin resistant but have reduced fat mass despite eating more.[23]  A cohort of Ashkenazi Jewish centenarians had female offspring with 35% higher IGF1 and 2.5 centimeters shorter than age- and sex-matched controls.  The centenarians had many mutations in the IGF1 receptor gene. The centenarians with mutations had higher IGF1 and a trend towards shorter height than those without.[39]

Pegvisomant is a growth hormone receptor antagonist used to treat acromegaly; it could be investigated as an anti-aging therapy.  Somatostatin analogs such as octreotide and pasireotide could also be investigated; somatostatin inhibits the release of growth hormone.  There are also IGF receptor kinase inhibitors being investigated for antitumor properties, such as NVP-AEW-541.

Metformin

If started at 3 months of age (but not later), metformin increased mean lifespan of female SHR mice by 14%. It also delayed the onset of the first tumor by 22%.[4]

Metformin increases the mean lifespan of mice by 4-5%. Treated mice had lower cholesterol, lower LDL, and lower insulin.[7]

Rapamycin

If fed to mice near the end of lifespan (600 days), rapamycin extends mean lifespan by 14% for females and 9% for males.[5]  Rapamycin fed to mice starting at 9 months extends median survival by 10% in males and 18% in females.[6]  Rapamycin fed to Her/neu homozygous (cancer-prone) mice caused 4% extension in mean lifespan and 12.4% increase in maximum lifespan.  Rapamycin-treated mice were 25% less likely to develop tumors.[8]

High-dose rapamycin given to mice at 9 months extends life by 23% in males and 26% in females.[9]

Rapamycin increases the lifespan of Rb1+/- mice ( a model of neuroendocrine tumors) by inhibiting the incidence of neuroendocrine tumors.  Mean lifespan increased by 9% in females and 14% in males. Treated mice were significantly less likely to have thyroid tumors, and had smaller tumors of all kinds.[15]

NDGA

Nordihydroguaiaretic acid, an antioxidant derived from the creosote bush, increased mean lifespan by 12% in male but not female mice. Did not increase the proportion of extremely long-lived mice.[11]

NDGA increased median lifespan in male mice, but not female mice, by 8-10%.[12]

On the other hand, there have been reports of hepatitis and kidney damage from human consumption of NDGA or creosote.

High-dose Vitamin E

Male mice given tocopherol (an antioxidant) at a dose of 5g/kg of food from 28 weeks of age had 40% longer median lifespan than control, and 17% increased maximal lifespan; female mice given tocopherol had 15% increased median lifespan.[10]  Mice given tocopherol from 28 weeks and maintained in the cold (45 degrees Fahrenheit) lived 15% longer.[56]  On the other hand, high-dose vitamin E in humans, according to a meta-analysis, did not reduce all-cause mortality.[57]

Lower Core Body Temperature

Mice genetically engineered to overexpress the Hrct-UCP2 gene, which causes an 0.3-0.5 degree drop in core body temperature, had median lifespans increased by 12% in males and 20% in females.[13]  Lower core body temperature is one of the results of caloric restriction, and cooler humans tend to live longer and be less obese.[55]

Young Ovaries

Old mice transplanted with young mouse ovaries lived an average of 6% longer.[14]  In particular, mice ovariectomized before puberty and transplanted with ovaries at 11 months lived longer than intact mice, by 17%. Transplantation with ovaries at 11 months seems to shift the survival curve to the right, postponing aging.[54]

Selegiline

Male rats treated with deprenyl (aka selegiline, a Parkinson’s drug and MAO-B inhibitor) lived on average 35% longer than controls, according to a 1988 study.[16] However, later studies could never find an equally dramatic effect. Mice treated with selegiline starting at 18 months had no increase in survival.[17] Selegiline extends life in female but not male Syrian hamsters.[18] Fischer rats treated starting at 18 months with selegiline lived 7% longer.[19] Male Fischer rats treated starting at 12 months with selegiline lived 7% longer.[20] Female hamsters, but not male, treated with selegiline, lived significantly longer than controls.[24]

ACE Inhibitors

High dose ACE inhibition with ramipril doubled the lifespan of hypertensive rats, bringing it up to that of normal rats.[21] Statins + ramipril increased lifespan of long-lived mice by 9%.[53]

Ramipril is a standard drug for high blood pressure.

AC5 Knockout

Adenylyl cyclase 5 is primarily expressed in the heart and brain, and catalyzes the synthesis of cyclic AMP, an important second messenger which allows hormones to pass through the plasma membrane and activates protein kinases, in particular to regulate glucose and fat metabolism.

AC5 knockout mice have a median lifespan 32% longer than wild-type mice. Bones were less brittle, body weights were smaller, and GH levels were lower.[25]  AC5 knockout mice also have markedly attenuated responses to pain (heat, cold, mechanical, inflammation, and neuropathic.)[50]  The effects of morphine and mu or delta opioid receptor agonists are attenuated in AC5 knockout mice.[52] However, AC5 knockout mice had Parkinson’s-like motor symptoms.[51]

SIRT1 Activators

Sirtuin 1, determined by the SIRT1 gene, is downregulated in cells that have high insulin resistance, and increased in mice undergoing caloric restriction; mice with low levels of SIRT1 don’t live longer in response to caloric restriction, while mice with high levels mimic the caloric restriction phenotype. [49]

SRT1720, a SIRT1 activator, extends life by 8% in mice on a standard diet, and by 21.7% in mice fed a high-fat diet (who are generally shorter-lived).  SRT1720 also reduces the incidence of cataracts, improves glucose tolerance, and lowers LDL and cholesterol.[26]  SRT1720 reduces liver lipid accumulation in strains of mice bred for obesity and insulin resistance, and preserved liver function.[45]

A phase I trial of SRT1720 in elderly human volunteers found that it was safe and well-tolerated and reduced cholesterol, LDL, and triglycerides over the course of a month of treatment.[46]

However, a subsequent trial found that SRT1720 does not in fact activate SIRT except when SIRT is attached to a fluorophore (used for imaging), so it may be an artifact. This study also found that SRT1720 had no effect on glucose tolerance in mouse models of diabetes.[47]

The putative SIRT1 activator SRT2104 did not affect insulin or glucose in a randomized trial of type II diabetes.[48]

Investigation of the sirtuin drugs has shut down, due to these failures to replicate.

PAPP-A Knockout

Mice missing pregnancy-associated plasma protein A live 38% longer than control mice, not associated with changes in serum glucose, cholesterol, or dietary intake. Wild-type mice had many more tumors than knockout mice. (70% of wild-type vs. 15% of knockout had tumors.)[27]  Knockout mice are smaller than wild-type, and consume less food, though similar as a proportion of bodyweight; they also show more spontaneous physical activity. Knockout mice are not significantly different from wild-type in terms of insulin sensitivity, fasting glucose, or insulin levels.[42]  PAPP-A knockout mice do not demonstrate as much thymic atrophy in old age as wild-type mice: more immature thymus cells, more new T cells, less IGF1 expression, easier to activate T cells.  IGF-1 promotes differentiation of T cells, so releasing it slower could keep the thymus young longer.[43]  PAPP-A knockout and wild-type mice both gain similar amounts of subcutaneous fat on high-fat diets, but the knockout mice gain significantly less visceral fat; PAPP-A is most highly expressed in mesenteric fat.[44] PAPP-A may have some tissue-specific effects on promoting IGF-axis activity, without altering metabolism that much across the board.

PAPP-A encodes a metalloproteinase that cleaves insulin-like growth factor binding proteins.  These IGFBPs are inhibitors of IGF activity, and if you cleave them, the ability to inhibit IGF diminishes; so PAPP-A knockouts make IGF less bioavailable.[40]  PAPP-A is expressed in unstable atherosclerotic plaques but not in stable ones; serum PAPP-A levels are higher in patients with unstable angina or acute myocardial infarction than in patients with stable angina or controls, by about a factor of two.[41]

Dogs

Selegiline

80% of dogs receiving selegiline, compared to 39% of elderly (age 10-15) dogs receiving placebo, survived to the end of the two-year study.[65]

Ovaries

Female dogs who had their ovaries removed lived no longer than male dogs, while dogs with ovaries were twice as likely as male dogs to achieve “exceptional” longevity (>13 years).[66]

IGF and Weight

IGF is positively correlated with weight, and negatively correlated with age, in dogs across various breeds.  Larger dogs live less long. [67]

Humans

FOXO3A Mutation

Homozygous minor mutations in the FOXO3A gene were associated with a 2.75 odds ratio of being in a cohort of long-lived men, compared to controls.  They were 29% more likely to be “healthy” at baseline (free of cardiovascular disease, cancer, stroke, Parkinson’s, and diabetes, able to pass a walking and a cognitive test). The mutations were 85% more common in people who lived to more than 100 than in people who died at 72-74.[58]  A German sample of long-lived people found that minor alleles were 1.53x as common in centenarians than controls.[59]

Insulin-like growth factor signaling inhibits FOXO3 activity, while oxidative stress activates FOXO3.  FOXO3 represses the mTOR pathway and promotes DNA repair.  It is also anti-inflammatory: suppresses IL-2 and IL-6, reduces proliferation of T cells and lymphocytes, reduces inflammation.[60]

FOXO3 is activated by AMPK.[61] You can do this via metformin in vitro — meanwhile changing glioma precursor cells into non-tumor cells.[62]  You can also do it with AICAR, an AMP analogue that stimulates AMPK.[63]  Note that AICAR reduces triglycerides, increases HDL, lowers blood pressure, and reverses insulin resistance in mice.[64]

Unsupported Musings

I don’t think antioxidants generally have come out looking too good for anti-aging, and there are a lot of counterexamples to the “aging is oxidative damage” hypothesis.

I think the growth-hormone-and-insulin-signaling cluster of life extension techniques and mutations is probably a real thing, and matches well to an explanation for why caloric restriction works. It also makes sense evolutionarily; in times of food abundance you want to reproduce, while in times of food scarcity you just want to survive the season, so it would make sense if you had two hormonal modes, “reproductive mode” and “survival mode.”

I also think there’s probably an mTOR mechanism, possibly just due to cancer, that explains the effectiveness of both rapamycin and the significance of the FOXO3 genes.  AMPK, which is produced by exercise, is upstream of both the mTOR stuff and the insulin-signaling stuff; this would explain why both exercise and metformin seem to be helpful for longevity.

References

[1]Orentreich, Norman, and JAYA ZIMMERMAN. “Low methionine ingestion by rats extends life span.” Age (days) 1050 (1993): 1300.

[2]Miller, Richard A., et al. “Methionine‐deficient diet extends mouse lifespan, slows immune and lens aging, alters glucose, T4, IGF‐I and insulin levels, and increases hepatocyte MIF levels and stress resistance.” Aging cell 4.3 (2005): 119-125.

[3]Shimokawa, Isao, et al. “Life span extension by reduction in growth hormone-insulin-like growth factor-1 axis in a transgenic rat model.” The American journal of pathology 160.6 (2002): 2259-2265.

[4]Anisimov, Vladimir N., et al. “If started early in life, metformin treatment increases life span and postpones tumors in female SHR mice.” Aging (Albany NY) 3.2 (2011): 148-157.

[5]Harrison, David E., et al. “Rapamycin fed late in life extends lifespan in genetically heterogeneous mice.” nature 460.7253 (2009): 392-395.

[6]Miller, Richard A., et al. “Rapamycin, but not resveratrol or simvastatin, extends life span of genetically heterogeneous mice.” The Journals of Gerontology Series A: Biological Sciences and Medical Sciences (2010): glq178.

[7]Martin-Montalvo, Alejandro, et al. “Metformin improves healthspan and lifespan in mice.” Nature communications 4 (2013).

[8]Anisimov, Vladimir N., et al. “Rapamycin extends maximal lifespan in cancer-prone mice.” The American journal of pathology 176.5 (2010): 2092-2097.

[9]Miller, Richard A., et al. “Rapamycin‐mediated lifespan increase in mice is dose and sex dependent and metabolically distinct from dietary restriction.” Aging cell 13.3 (2014): 468-477.

[10]Navarro, Ana, et al. “Vitamin E at high doses improves survival, neurological performance, and brain mitochondrial function in aging male mice.” American Journal of Physiology-Regulatory, Integrative and Comparative Physiology 289.5 (2005): R1392-R1399.

[11]Strong, Randy, et al. “Nordihydroguaiaretic acid and aspirin increase lifespan of genetically heterogeneous male mice.” Aging cell 7.5 (2008): 641-650.

[12]Harrison, David E., et al. “Acarbose, 17‐α‐estradiol, and nordihydroguaiaretic acid extend mouse lifespan preferentially in males.” Aging cell 13.2 (2014): 273-282.

[13]Conti, Bruno, et al. “Transgenic mice with a reduced core body temperature have an increased life span.” Science 314.5800 (2006): 825-828.

[14]Mason, Jeffrey B., et al. “Transplantation of young ovaries to old mice increased life span in transplant recipients.” The Journals of Gerontology Series A: Biological Sciences and Medical Sciences 64.12 (2009): 1207-1211.

[15]Livi, Carolina B., et al. “Rapamycin extends life span of Rb1+/-mice by inhibiting neuroendocrine tumors.” Aging (Albany NY) 5.2 (2013): 100-110.

[16]Knoll, Joseph. “The striatal dopamine dependency of life span in male rats. Longevity study with (−) deprenyl.” Mechanisms of ageing and development 46.1 (1988): 237-262.

[17]Ingram, Donald K., et al. “Chronic treatment of aged mice with L-deprenyl produces marked striatal MAO-B inhibition but no beneficial effects on survival, motor performance, or nigral lipofuscin accumulation.” Neurobiology of aging 14.5 (1993): 431-440.

[18]Stoll, S., et al. “Chronic treatment of Syrian hamsters with low-dose selegiline increases life span in females but not males.” Neurobiology of aging 18.2 (1997): 205-211.

[19]Kitani, K., et al. “Chronic treatment of (-) deprenyl prolongs the life span of male Fischer 344 rats. Further evidence.” Life sciences 52.3 (1993): 281-288.

[20]Bickford, P. C., et al. “Long-term treatment of male F344 rats with deprenyl: assessment of effects on longevity, behavior, and brain function.” Neurobiology of aging 18.3 (1997): 309-318.

[21]Linz, Wolfgang, et al. “Long-term ACE inhibition doubles lifespan of hypertensive rats.” Circulation 96.9 (1997): 3164-3172.

[22]Bartke, Andrzej, et al. “Longevity: extending the lifespan of long-lived mice.” Nature 414.6862 (2001): 412-412.

[23]Selman, Colin, et al. “Evidence for lifespan extension and delayed age-related biomarkers in insulin receptor substrate 1 null mice.” The FASEB Journal 22.3 (2008): 807-818.

[24]Stoll, S., et al. “Chronic treatment of Syrian hamsters with low-dose selegiline increases life span in females but not males.” Neurobiology of aging 18.2 (1997): 205-211.

[25]Yan, Lin, et al. “Type 5 adenylyl cyclase disruption increases longevity and protects against stress.” Cell 130.2 (2007): 247-258.

[26]Mitchell, Sarah J., et al. “The SIRT1 activator SRT1720 extends lifespan and improves health of mice fed a standard diet.” Cell reports 6.5 (2014): 836-843.

[27]Conover, Cheryl A., and Laurie K. Bale. “Loss of pregnancy‐associated plasma protein A extends lifespan in mice.” Aging cell 6.5 (2007): 727-729.

[28]Hoffman, Robert M. “Methioninase: a therapeutic for diseases related to altered methionine metabolism and transmethylation: cancer, heart disease, obesity, aging, and Parkinson’s disease.” Human cell 10 (1997): 69-80.

[29]Krumdieck, Carlos L., and Charles W. Prince. “Mechanisms of homocysteine toxicity on connective tissues: implications for the morbidity of aging.” The Journal of nutrition 130.2 (2000): 365S-368S.

[30]Perna, Alessandra F., et al. “Possible mechanisms of homocysteine toxicity.” Kidney International 63 (2003): S137-S140.

[31]McCully, Kilmer S., and Bruce D. Ragsdale. “Production of arteriosclerosis by homocysteinemia.” The American journal of pathology 61.1 (1970): 1.

[32]Tolmunen, Tommi, et al. “Association between depressive symptoms and serum concentrations of homocysteine in men: a population study.” The American journal of clinical nutrition 80.6 (2004): 1574-1578.

[33]Applebaum, Julia, et al. “Homocysteine levels in newly admitted schizophrenic patients.” Journal of psychiatric research 38.4 (2004): 413-416.

[34]Perry, ThomasL, et al. “Treatment of homocystinuria with a low-methionine diet, supplemental cystine, and a methyl donor.” The Lancet 292.7566 (1968): 474-478.

[35]Kolb, Felix O., Jerry M. Earll, and Harold A. Harper. ““Disappearance” of cystinuria in a patient treated with prolonged low methionine diet.” Metabolism 16.4 (1967): 378-381.

[36]Sardharwalla, I. B., et al. “Homocystinuria: a study with low-methionine diet in three patients.” Canadian Medical Association Journal 99.15 (1968): 731.

[37]Reynaud, Rachel, et al. “A familial form of congenital hypopituitarism due to a PROP1 mutation in a large kindred: phenotypic and in vitro functional studies.” The Journal of Clinical Endocrinology & Metabolism 89.11 (2004): 5779-5786.

[38]Guevara-Aguirre, Jaime, et al. “Growth hormone receptor deficiency is associated with a major reduction in pro-aging signaling, cancer, and diabetes in humans.” Science translational medicine 3.70 (2011): 70ra13-70ra13.

[39]Suh, Yousin, et al. “Functionally significant insulin-like growth factor I receptor mutations in centenarians.” Proceedings of the National Academy of Sciences 105.9 (2008): 3438-3442.

[40]Lawrence, James B., et al. “The insulin-like growth factor (IGF)-dependent IGF binding protein-4 protease secreted by human fibroblasts is pregnancy-associated plasma protein-A.” Proceedings of the National Academy of Sciences 96.6 (1999): 3149-3153.

[41]Bayes-Genis, Antoni, et al. “Pregnancy-associated plasma protein A as a marker of acute coronary syndromes.” New England Journal of Medicine 345.14 (2001): 1022-1029.

[42]Conover, Cheryl A., et al. “Metabolic consequences of pregnancy-associated plasma protein-A deficiency in mice: exploring possible relationship to the longevity phenotype.” Journal of Endocrinology 198.3 (2008): 599-605.

[43]Vallejo, Abbe N., et al. “Resistance to age-dependent thymic atrophy in long-lived mice that are deficient in pregnancy-associated plasma protein A.” Proceedings of the National Academy of Sciences 106.27 (2009): 11252-11257.

[44]Conover, Cheryl A., et al. “Preferential impact of pregnancy-associated plasma protein-A deficiency on visceral fat in mice on high-fat diet.” American Journal of Physiology-Endocrinology and Metabolism 305.9 (2013): E1145-E1153.

[45]Yamazaki, Yu, et al. “Treatment with SRT1720, a SIRT1 activator, ameliorates fatty liver with reduced expression of lipogenic enzymes in MSG mice.” American Journal of Physiology-Endocrinology and Metabolism 297.5 (2009): E1179-E1186.

[46]Libri, Vincenzo, et al. “A pilot randomized, placebo controlled, double blind phase I trial of the novel SIRT1 activator SRT2104 in elderly volunteers.” PLoS One 7.12 (2012): e51395.

[47]Pacholec, Michelle, et al. “SRT1720, SRT2183, SRT1460, and resveratrol are not direct activators of SIRT1.” Journal of Biological Chemistry 285.11 (2010): 8340-8351.

[48]Baksi, Arun, et al. “A phase II, randomized, placebo‐controlled, double‐blind, multi‐dose study of SRT2104, a SIRT1 activator, in subjects with type 2 diabetes.” British journal of clinical pharmacology 78.1 (2014): 69-77.

[49]Cantó, Carles, and Johan Auwerx. “Caloric restriction, SIRT1 and longevity.” Trends in Endocrinology & Metabolism 20.7 (2009): 325-331.

[50]Kim, K‐S., et al. “Markedly attenuated acute and chronic pain responses in mice lacking adenylyl cyclase‐5.” Genes, Brain and Behavior 6.2 (2007): 120-127.

[51]Iwamoto, Tamio, et al. “Motor dysfunction in type 5 adenylyl cyclase-null mice.” Journal of Biological Chemistry 278.19 (2003): 16936-16940.

[52]Kim, Kyoung-Shim, et al. “Adenylyl cyclase type 5 (AC5) is an essential mediator of morphine action.” Proceedings of the National Academy of Sciences of the United States of America 103.10 (2006): 3908-3913.

[53]Spindler, Stephen R., Patricia L. Mote, and James M. Flegal. “Combined statin and angiotensin-converting enzyme (ACE) inhibitor treatment increases the lifespan of long-lived F1 male mice.” AGE 38.5-6 (2016): 379-391.

[54]Cargill, Shelley L., et al. “Age of ovary determines remaining life expectancy in old ovariectomized mice.” Aging cell 2.3 (2003): 185-190.

[55]Waalen, Jill, and Joel N. Buxbaum. “Is older colder or colder older? The association of age with body temperature in 18,630 individuals.” The Journals of Gerontology Series A: Biological Sciences and Medical Sciences 66.5 (2011): 487-492.

[56]Banks, Ruth, John R. Speakman, and Colin Selman. “Vitamin E supplementation and mammalian lifespan.” Molecular nutrition & food research 54.5 (2010): 719-725.

[57]Miller, Edgar R., et al. “Meta-analysis: high-dosage vitamin E supplementation may increase all-cause mortality.” Annals of internal medicine 142.1 (2005): 37-46.

[58]Willcox, Bradley J., et al. “FOXO3A genotype is strongly associated with human longevity.” Proceedings of the National Academy of Sciences 105.37 (2008): 13987-13992.

[59]Flachsbart F, Caliebe A, Kleindorp R, Blanché H, von Eller-Eberstein H, Nikolaus S, Schreiber S, Nebel A (Feb 2009). “Association of FOXO3A variation with human longevity confirmed in German centenarians”. Proceedings of the National Academy of Sciences of the United States of America. 106 (8): 2700–5.

[60]Morris, Brian J., et al. “FOXO3: a major gene for human longevity-a mini-review.” Gerontology 61.6 (2015): 515-525.

[61]Greer, Eric L., et al. “The energy sensor AMP-activated protein kinase directly regulates the mammalian FOXO3 transcription factor.” Journal of Biological Chemistry 282.41 (2007): 30107-30119.

[62]Sato, Atsushi, et al. “Glioma‐Initiating Cell Elimination by Metformin Activation of FOXO3 via AMPK.” Stem cells translational medicine 1.11 (2012): 811-824.

[63]Li, Xiao-Nan, et al. “Activation of the AMPK-FOXO3 pathway reduces fatty acid–induced increase in intracellular reactive oxygen species by upregulating thioredoxin.” Diabetes 58.10 (2009): 2246-2257.

[64]Buhl, Esben S., et al. “Long-term AICAR administration reduces metabolic disturbances and lowers blood pressure in rats displaying features of the insulin resistance syndrome.” Diabetes 51.7 (2002): 2199-2206.

[65]Ruehl, W. W., et al. “Treatment with L-deprenyl prolongs life in elderly dogs.” Life sciences 61.11 (1997): 1037-1044.

[66]Waters, David J., et al. “Exploring mechanisms of sex differences in longevity: lifetime ovary exposure and exceptional longevity in dogs.” Aging Cell 8.6 (2009): 752-755.

[67]Greer, Kimberly A., Larry M. Hughes, and Michal M. Masternak. “Connecting serum IGF-1, body size, and age in the domestic dog.” Age 33.3 (2011): 475-483.

[68]Arteaga, Silvia, Adolfo Andrade-Cetto, and René Cárdenas. “Larrea tridentata (Creosote bush), an abundant plant of Mexican and US-American deserts and its metabolite nordihydroguaiaretic acid.” Journal of ethnopharmacology 98.3 (2005): 231-239.

Reply to Criticism on my EA Post

My previous post, “EA Has A Lying Problem”, received a lot of criticism, and I’d like to address some of it here.

I was very impressed by what I learned about EA discourse norms from preparing this post and responding to comments on it. I’m appreciating anew that this is a community where people really do the thing of responding directly to arguments, updating on evidence, and continuing to practice discourse instead of collapsing into verbal fights.  I’m going to try to follow that norm in this post.

Structurally, what I did in my previous post was

  • quote some EAs making comments on forums and Facebook
  • interpret what I think is the attitude behind those quotes
  • claim that the quotes show a pervasive problem in which the EA community doesn’t value honesty enough.

There are three possible points of failure to this argument:

  • The quotes don’t mean what I took them to mean
  • The views I claimed EAs hold are not actually bad
  • The quotes aren’t evidence of a broader problem in EA.

There’s also a possible prudential issue: that I may have, indeed, called attention to a real problem, but that my tone was too extreme or my language too sloppy, and that this is harmful.

I’m going to address each of these possibilities separately.

Possibility 1: The quotes don’t mean what I took them to mean

Case 1: Ben Todd’s Quotes on Criticism

I described Ben Todd as asking people to consult with EA orgs before criticizing them, and as heavily implying that it’s more useful for orgs to prioritize growth over engaging with the kind of highly critical people who are frequent commenters on EA debates.

I went on to claim that this underlying attitude is going to favor growth over course correction, and prioritize “movement-building” by gaining appeal among uncritical EA fans, while ignoring real concerns.

I said,

Essentially, this maps to a policy of “let’s not worry over-much about internally critiquing whether we’re going in the right direction; let’s just try to scale up, get a bunch of people to sign on with us, move more money, grow our influence.”  An uncharitable way of reading this is “c’mon, guys, our marketing doesn’t have to satisfy you, it’s for the marks!”  

This is a pretty large extrapolation from Todd’s actual comments, and I think I was putting words in his mouth that are much more extreme than anything he’d agree with. The quotes I pulled didn’t come close to proving that Todd actually wants to ignore criticism and pander to an uncritical audience.  It was wrong of me to give the impression that he’s deliberately pursuing a nefarious strategy.

And in the comments, he makes it clear that this wasn’t his intent and that he’s actually made a point of engaging with criticism:

Hi Sarah,

The 80,000 Hours career guide says what we think. That’s true even when it comes to issues that could make us look bad, such as our belief in the importance of the risks from artificial intelligence, or when are issues could be offputtingly complex, such as giving now vs. giving later and the pros and cons of earning to give. This is the main way we engage with users, and it’s honest.

As an organisation, we welcome criticism, and we post detailed updates on our progress, including our mistakes:

https://80000hours.org/about/credibility/evaluations/

https://80000hours.org/about/credibility/evaluations/mistakes/

I regret that my comments might have had the effect of discouraging important criticism.

My point was that public criticism has costs, which need to be weighed against the benefits of the criticism (whether or not you’re an act utilitarian). In extreme cases, organisations have been destroyed by memorable criticism that turned out to be false or unfounded. These costs, however, can often be mitigated with things like talking to the organisation first – this isn’t to seek permission, but to do things like check whether they’ve already written something on the topic, and whether your understanding of the issues is correct. For instance, GiveWell runs their charity reviews past the charity before posting, but that doesn’t mean their reports are always to the charity’s liking. I’d prefer a movement where people bear these considerations in mind as well, but it seems like they’re often ignored.

None of this is to deny that criticism is often extremely valuable.

I think this is plainly enough to show that Ben Todd is not anti-criticism. I’m also impressed that 80,000 Hours has a “mistakes page” in which they describe past failures (which is an unusual and especially praiseworthy sign of transparency in an organization.)

Todd did, in his reply to my post, reiterate that he thinks criticism should face a high burden of proof because “organisations have been destroyed by memorable criticism that turned out to be false or unfounded.” I’m not sure this is a good policy; Ben Hoffman articulates some problems with it here.

But I was wrong to conflate this with an across-the-board opposition to criticism.  It’s probably fairer to say that Todd opposes adversarial criticism and prefers cooperative or friendly criticism (for example, he thinks critics should privately ask organizations to change their policies rather than publicly lambasting them for having bad policies.)

I still think this is a mistake on his part, but when I framed it as “EA Leader says criticizing EA orgs is harmful to the movement”, I was exaggerating for effect, and I probably shouldn’t have done that.

Case 2: Robert Wiblin on Promises

I quoted Robert Wiblin on his interpretation of the Giving What We Can pledge, and interpreted Wiblin’s words to mean that he doesn’t think the pledge is morally binding.

I think this is pretty clear-cut and I interpreted Wiblin correctly.

The context there was that Alyssa Vance, in the original post, had said that many people might rationally choose not to take the pledge because unforeseen financial circumstances might make it inadvisable in future. She said that Wiblin had previously claimed that this was not a problem, because he didn’t view the pledge as binding on his future self:

pledge taker Rob Wiblin said that, if he changed his mind about donating 10% every year being the best choice, he would simply un-take the pledge.”  

Wiblin doesn’t think that “maybe I won’t be able to afford to give 10% of my income in future” is a good enough reason for people to choose not to pledge 10% of their lifetime income, because if they ever did become poor, they could just stop giving.

Some commenters claimed that Wiblin doesn’t have a cavalier attitude towards promises, he just thinks that in extreme cases it’s okay to break them.  In the Jewish ritual law, it’s permissible to break a commandment if it’s necessary to save a human life, but that doesn’t mean that the overall attitude to the commandments is casual.

However, I think it does imply a cavalier attitude towards promises to say that you shouldn’t hesitate to make them on the grounds that you might not want to keep them.  If you don’t think, before making a lifelong pledge, that people should think “hmm, am I prepared to make this big a commitment?” and in some cases answer “no”, then you clearly don’t think that the pledge is a particularly strong commitment.

Case 3: Robert Wiblin on Autism

Does Robert Wiblin actually mean it as a pejorative when he speculates that maybe the reason some people are especially hesitant to commit to the GWWC pledge is that they’re on the autism spectrum?

Some people (including the person he said it to, who is autistic), didn’t take it as a negative.  And, in principle, if we aren’t biased against disabled people, “autistic” should be a neutral descriptive word, not a pejorative.

But we do live in a society where people throw around “autistic” as an insult to refer to anybody who supposedly has poor social skills, so in context, Wiblin’s comment does have a pejorative connotation.

Moreover, Wiblin was using the accusation of autism as a reason to dismiss the concerns of people who are especially serious about keeping promises. It’s equivalent to saying “your beliefs are the result of a medical condition, so I don’t have to take them seriously.”  He’s medicalizing the beliefs of those who disagree with him.  Even if his opponents are autistic, if he respected them, he’d take their disagreement seriously.

Case 4: Jacy Reese on Evidence from Intuition

I quoted Jacy Reese responding to criticism about his cost-effectiveness estimates by saying that the evidence base in favor of leafleting includes his intuition and studies that are better described as “evidence against the effectiveness of leafleting.”

His, and ACE’s, defense of the leafleting studies as merely “weak evidence” for leafleting, is a matter of public record in many places. He definitely believes this.

Does he really think that his intuition is evidence, or did he just use ambiguous wording? I don’t know, and I’d be willing to concede that this isn’t a big deal.

Possibility 2: The views I accused EAs of holding are not actually bad.

Case 1: Dishonesty for the greater good might sometimes be worthwhile.

A number of people in the comments to my previous post are making the argument that I need to weigh the harms of dishonest or misleading information against its benefits.

First of all, the fact that people are making these arguments at least partly belies the notion that all EAs oppose lying across the board; I’ll say more about the prevalence of these views in the community in the next section.

Holly Elmore:

What if, for the sake of argument, it *was* better to persuade easy marks to take the pledge and give life-saving donations than to persuade fewer people more gently and (as she perceives it) respectfully? How many lives is extra respect worth? She’s acting like this isn’t even an argument.

This is a more general problem I’ve had with Sarah’s writing and medical ethics in general– the fixation on meticulously informed consent as if it’s the paramount moral issue.

Gleb Tsipursky:

If you do not lie, that’s fine, but don’t pretend that you care about doing the most good, please. Just don’t. You care about being as transparent and honest as possible over doing the most good.

I’m including Gleb here, even though he’s been kicked out of the EA community, because he is saying the same things as Holly Elmore, who is a respected member of the community.  There may be more EAs out there sharing the same views.

So, cards on the table: I am not an act-utilitarian. I am a eudaimonistic virtue ethicist. What that means is that I believe:

  • The point of behaving ethically is to have a better life for yourself.
  • Dishonesty will predictably damage your life.
  • If you find yourself tempted to be dishonest because it seems like a good idea, you should instead trust that the general principle of “be honest” is more reliable than your guess that lying is a good idea in this particular instance.

(Does this apply to me and my lapses in honesty?  YOU BET.  Whenever it seems like a good idea at the time for me to deceive, I wind up suffering for it later.)

I also believe consent is really important.

I believe that giving money to charitable causes is a non-obligatory personal decision, while respecting consent to a high standard is not.

Are these significant values differences with many EAs? Yes, they are.

I wasn’t honest enough in my previous post about this, and I apologize for that. I should have owned my beliefs more candidly.

I also exaggerated for effect in my previous post, and that was misleading, and I apologize for that. Furthermore, in the comments, I claimed that I intended to exaggerate for effect; that was an emotional outburst and isn’t what I really believe. I don’t endorse dishonesty “for a good cause”, and on occasions  when I’ve gotten upset and yielded to temptation, it has always turned out to be a bad idea that came back to bite me.

I do think that even if you are a more conventional utilitarian, there are arguments in favor of being honest always and not just when the local benefits outweigh the costs.

Eliezer Yudkowsky and Paul Christiano have written about why utilitarians should still have integrity.

One way of looking at this is rule-utilitarianism: there are gains from being known to be reliably trustworthy.

Another way of looking at this is the comment about “financial bubbles” I made in my previous post.  If utilitarians take their best guess about what action is the Most Good, and inflate public perceptions of its goodness so that more people will take that action, and encourage the public to further inflate perceptions of its goodness, then errors in people’s judgments of the good will expand without bound.  A highly popular bad idea will end up dominating most mindshare and charity dollars. However, if utilitarians critique each other’s best guesses about which actions do the most good, then bad ideas will get shot down quickly, to make room for good ones.

Case 2: It’s better to check in with EA orgs before criticizing them

Ben Todd, and some people in the comments, have argued that it’s better to run critical blog posts by EA orgs before making those criticisms public.  This rhymes a little with the traditional advice to admonish friends in private but praise them in public, in order to avoid causing them to lose face.  The idea seems to be that public criticism will be less accurate and also that it will draw negative attention to the movement.

Now, some of the issues I criticized in my blog post have also been brought up by others, both publicly and privately, which is where I first heard about them. But I don’t agree with the basic premise in the first place.

Journalists check quotes with sources, that’s true, and usually get quotes from the organizations they’re reporting on. But bloggers are not journalists, first of all.  A blog post is more like engaging in an extended conversation than reporting the news. Some of that conversation is with EA orgs and their leaders — this post, and the discussions it pulls from, are drawn from discussions about writings of various degrees of “officialness” coming from EA orgs.  I think the public record of discussion is enough of a “source” for this purpose; we know what was said, by whom, and when, and there’s no ambiguity about whether the comments were really made.

What we don’t necessarily know without further discussion is what leaders of EA orgs mean, and what they say behind closed doors. It may be that their quotes don’t represent their intent.  I think this is the gist of what people saying “talk to the orgs in private” mean — if we talked to them, we’d understand that they’re already working on the problem, or that they don’t really have the problematic views they seem to have, etc.

However, I think this is an unfair standard.  “Talk to us first to be sure you’re getting the real story” is extra work for both the blogger and the EA org (do you really have to check in with GWWC every time you discuss the pledge?)

And it’s trying to massage the discussion away from sharp, adversarial critics. A journalist who got his stories about politics almost entirely from White House sources, and relied very heavily on his good relationship with the White House, would have a conflict of interest and would probably produce biased reporting. You don’t want all the discussion of EA to be coming from people who are cozy with EA orgs.  You don’t necessarily want all discussion to be influenced by “well, I talked to this EA leader, and I’m confident his heart’s in the right place.”

There’s something valuable about having a conversation going on in public. It’s useful for transparency and it’s useful for common knowledge. EA orgs like GiveWell and 80K are unusually transparent already; they’re engaging in open dialogue with their readers and donors, rather than funneling all communication through a narrow, PR-focused information stream.  That’s a remarkable and commendable choice.

But it’s also a risky one; because they’re talking a lot, they can incur reputational damage if they’re quoted unfavorably (as I did in my previous post).  So they’re asking us, the EA and EA-adjacent community, to do some work in guarding their reputation.

I think this is not necessarily a fair thing to expect from everyone discussing an EA topic. Some people are skeptical of EA as a whole, and thus don’t have a reason to protect its reputation. Some people, like Alyssa in her post on the GWWC pledge, aren’t even accusing an org of doing anything wrong, just discussing a topic of interest to EAs like “who should and shouldn’t take the pledge?” She couldn’t reasonably have foreseen that this would be perceived as an attack on GWWC’s reputation.

I think, if an EA org says or does something in public that people find problematic, they should expect to be criticized in public, and not necessarily get a chance to check over the criticism first.

Possibility 3: The quotes I pulled are not strong evidence of a big problem in EA.

I picked quotes that I and a few friends had noticed offhand as unusually bad.  So, obviously, it’s not the same thing as a survey of EA-wide attitudes.

On the other hand, “picking egregious examples” is a perfectly fine way to suggest that there may be a broader trend. If you know that a few people in a Presidential administration have made racist remarks, for instance, it’s not out of line to suggest that the administration has a racism problem.

So, I stand behind cherry-picked examples as a way to highlight trends, in the context of “something suspicious is going on here, maybe we should pay attention to it.”

The fact that people are, in response to my post, defending the practice of lying for the greater good is also evidence that these aren’t entirely isolated cases.

Of course, it’s possible that the quotes I picked aren’t egregiously bad, but I think I’ve covered my views on that in the previous two sections.

I think that, given the Intentional Insights scandal, it’s reasonable to ask the question “was this just one guy, or is the EA community producing a climate that shelters bullshit artists?”  And I think there’s enough evidence to be suspicious that the latter is true.

Possibility 4: My point stands, but my tactics were bad

I did not handle this post like a pro.

I used the title “EA Has a Lying Problem”, which is inflammatory, and also (worse, in my view), not quite the right word. None of the things I quoted were lies. They were defenses of dishonest behavior, or, in Ben Todd’s case, what I thought was a bias against transparency and open debate. I probably should have called it “dishonesty” rather than “lying.”

In general, I think I was inflammatory in a careless rather than a pointed way. I do think it’s important to make bad things look bad, but I didn’t take care to avoid discrediting a vast swath of innocents, and that was wrong of me.

Then, I got emotional in the comments section, and expressed an attitude of “I’m a bad person who does bad things on purpose”, which is rude, untrue, and not a good look on me.

I definitely think these were mistakes on my part.

It’s also been pointed out to me that I could have raised my criticisms privately, within EA orgs, rather than going public with a potentially reputation-damaging post (damaging to my own reputation or to the EA movement.)

I don’t think that would have been a good idea in my case.

When it comes to my own reputation, for better or for worse I’m a little reckless.  I don’t have a great deal of ability to consciously control how I’m perceived — things tend to slip out impulsively — so I try not to worry about it too much.  I’ll live with how I’m judged.

When it comes to EA’s reputation, I think it’s possible I should have been more careful. Some of the organizations I’ve criticized have done really good work promoting causes I care about.  I should have thought of that, and perhaps worded my post in a way that produced less risk of scandal.

On the other hand, I never had a close relationship with any EA orgs, and I don’t think internal critique would have been a useful avenue for me.
In general, I think I want to sanity-check my accusatory posts with more beta readers in future.  My blog is supposed to represent a pretty close match to my true beliefs, not just my more emotional impulses, and I should be more circumspect before posting stuff.

EA Has A Lying Problem

I am currently writing up a response to criticism of this post and will have it up shortly.

Why hold EA to a high standard?

“Movement drama” seems to be depressingly common  — whenever people set out to change the world, they inevitably pick fights with each other, usually over trivialities.  What’s the point, beyond mere disagreeableness, of pointing out problems in the Effective Altruism movement? I’m about to start some movement drama, and so I think it behooves me to explain why it’s worth paying attention to this time.

Effective Altruism is a movement that claims that we can improve the world more effectively with empirical research and explicit reasoning. The slogan of the Center for Effective Altruism is “Doing Good Better.”

This is a moral claim (they say they are doing good) and a claim of excellence (they say that they offer ways to do good better.)

EA is also a proselytizing movement. It tries to raise money, for EA organizations as well as for charities; it also tries to “build the movement”, increase attendance at events like the EA Global conference, get positive press, and otherwise get attention for its ideas.

The Atlantic called EA “generosity for nerds”, and I think that’s a fair assessment. The “target market” for EA is people like me and my friends: young, educated, idealistic, Silicon Valley-ish.

The origins of EA are in academic philosophy. Peter Singer and Toby Ord were the first to promote the idea that people have an obligation to help the developing world and reduce animal suffering, on utilitarian grounds.  The leaders of the Center for Effective Altruism, Giving What We Can, 80,000 Hours, The Life You Can Save, and related EA orgs, are drawn heavily from philosophy professors and philosophy majors.

What this means, first of all, is that we can judge EA activism by its own standards. These people are philosophers who claim to be using objective methods to assess how to do good; so it’s fair to ask “Are they being objective? Are they doing good? Is their philosophy sound?”  It’s admittedly hard for young organizations to prove they have good track records, and that shouldn’t count against them; but honesty, transparency, and sound arguments are reasonable to expect.

Second of all, it means that EA matters.  I believe that individuals and small groups who produce original thinking about big-picture issues have always had outsize historical importance. Philosophers and theorists who capture mindshare have long-term influence.  Young people with unusual access to power and interest in “changing the world” stand a good chance of affecting what happens in the coming decades.

So it matters if there are problems in EA. If kids at Stanford or Harvard or Oxford are being misled or influenced for the worse, that’s a real problem. They actually are, as the cliche goes, “tomorrow’s leaders.” And EA really seems to be prominent among the ideologies competing for the minds of the most elite and idealistic young people.  If it’s fundamentally misguided or vulnerable to malfeasance, I think that’s worth talking about.

Lying for the greater good

Imagine that you are a perfect act-utilitarian. You want to produce the greatest good for the greatest number, and, magically, you know exactly how to do it.

Wouldn’t a pretty plausible course of action be “accumulate as much power and resources as possible, so you can do even more good”?

Taken to an extreme, this would look indistinguishable from the actions of someone who just wants to acquire as much power as possible for its own sake.  Actually building Utopia is always something to get around to later; for now you have to build up your strength, so that the future utopia will be even better.

Lying and hurting people in order to gain power can never be bad, because you are always aiming at the greater good down the road, so anything that makes you more powerful should promote the Good, right?

Obviously, this is a terrible failure mode. There’s a reason J.K. Rowling gave her Hitler-like figure Grindelwald the slogan “For the Greater Good.”  Ordinary, children’s-story morality tells us that when somebody is lying or hurting people “for the greater good”, he’s a bad guy.

A number of prominent EA figures have made statements that seem to endorse lying “for the greater good.”  Sometimes these statements are arguably reasonable, taken in isolation. But put together, there starts to be a pattern.  It’s not quite storybook-villain-level, but it has something of the same flavor.

There are people who are comfortable sacrificing honesty in order to promote EA’s brand.  After all, if EA becomes more popular, more people will give to charity, and that charity will do good, and that good may outweigh whatever harm comes from deception.

The problem with this reasoning should be obvious. The argument would work just as well if EA did no good at all, and only claimed to do good.

Arbitrary or unreliable claims of moral superiority function like bubbles in economic markets. If you never check the value of a stock against some kind of ground-truth reality, if everyone only looks at its current price and buys or sells based on that, we’ll see prices being inflated based on no reason at all.  If you don’t insist on honesty in people’s claims of “for the greater good”, you’ll get hijacked into helping people who aren’t serving the greater good at all.

I think it’s worth being suspicious of anybody who says “actually, lying is a good idea” and has a bunch of intelligence and power and moral suasion on their side.

It’s a problem if a movement is attracting smart, idealistic, privileged young people who want to “do good better” and teaching them that the way to do the most good is to lie.  It’s arguably even more of a problem than, say, lobbyists taking young Ivy League grads under their wing and teaching them to practice lucrative corruption.  The lobbyists are appealing to the most venal among the youthful elite.  The nominally-idealistic movement is appealing to the most ethical, and corrupting them.

The quotes that follow are going to look almost reasonable. I expect some people to argue that they are in fact reasonable and innocent and I’ve misrepresented them. That’s possible, and I’m going to try to make a case that there’s actually a problem here; but I’d also like to invite my readers to take the paranoid perspective for a moment. If you imagine mistrusting these nice, clean-cut, well-spoken young men, or mistrusting Something that speaks through them, could you see how these quotes would seem less reasonable?

Criticizing EA orgs is harmful to the movement

In response to an essay on the EA forums criticizing the Giving What We Can pledge (a promise to give 10% of one’s income to charity), Ben Todd, the CEO  of 80,000 Hours, said:

Topics like this are sensitive and complex, so it can take a long time to write them up well. It’s easy to get misunderstood or make the organisation look bad.

At the same time, the benefits might be slight, because (i) it doesn’t directly contribute to growth (if users have common questions, then add them to the FAQ and other intro materials) or (ii) fundraising (if donors have questions, speak to them directly).

Remember that GWWC is getting almost 100 pledges per month atm, and very few come from places like this forum. More broadly, there’s a huge number of pressing priorities. There’s lots of other issues GWWC could write about but hasn’t had time to as well.

If you’re wondering whether GWWC has thought about these kinds of questions, you can also just ask them. They’ll probably respond, and if they get a lot of requests to answer the same thing, they’ll probably write about it publicly.

With figuring out strategy (e.g. whether to spend more time on communication with the EA community or something else) GWWC writes fairly lengthy public reviews every 6-12 months.

He also said:

None of these criticisms are new to me. I think all of them have been discussed in some depth within CEA.

This makes me wonder if the problem is actually a failure of communication. Unfortunately, issues like this are costly to communicate outside of the organisation, and it often doesn’t seem like the best use of time, but maybe that’s wrong.

Given this, I think it also makes sense to run critical posts past the organisation concerned before posting. They might have already dealt with the issue, or have plans to do so, in which posting the criticism is significantly less valuable (because it incurs similar costs to the org but with fewer benefits). It also helps the community avoid re-treading the same ground.

In other words: the CEO of 80,000 Hours thinks that people should “run critical posts past the organization concerned before posting”, but also thinks that it might not be worth it for GWWC to address such criticisms because they don’t directly contribute to growth or fundraising, and addressing criticisms publicly might “make the organization look bad.”

This cashes out to saying “we don’t want to respond to your criticism, and we also would prefer you didn’t make it in public.”

It’s normal for organizations not to respond to every criticism — the Coca-Cola company doesn’t have to respond to every internet comment that says Coke is unhealthy — but Coca-Cola’s CEO doesn’t go around shushing critics either.

Todd seems to be saying that the target market of GWWC is not readers of the EA forum or similar communities, which is why answering criticism is not a priority. (“Remember that GWWC is getting almost 100 pledges per month atm, and very few come from places like this forum.”) Now, “places like this forum” seems to mean communities where people identify as “effective altruists”, geek out about the details of EA, spend a lot of time researching charities and debating EA strategy, etc.  Places where people might question, in detail, whether pledging 10% of one’s income to charity for life is actually a good idea or not.  Todd seems to be implying that answering the criticisms of these people is not useful — what’s useful is encouraging outsiders to donate more to charity.

Essentially, this maps to a policy of “let’s not worry over-much about internally critiquing whether we’re going in the right direction; let’s just try to scale up, get a bunch of people to sign on with us, move more money, grow our influence.”  An uncharitable way of reading this is “c’mon, guys, our marketing doesn’t have to satisfy you, it’s for the marks!”  Jane Q. Public doesn’t think about details, she doesn’t nitpick, she’s not a nerd; we tell her about the plight of the poor, she feels moved, and she gives.  That’s who we want to appeal to, right?

The problem is that it’s not quite fair to Jane Q. Public to treat her as a patsy rather than as a peer.

You’ll see echoes of this attitude come up frequently in EA contexts — the insinuation that criticism is an inconvenience that gets in the way of movement-building, and movement-building means obtaining the participation of the uncritical.

In responding to a criticism of a post on CEA fundraising, Ben Todd said:

I think we should hold criticism to a higher standard, because criticism has more costs. Negative things are much more memorable than positive things. People often remember criticism, perhaps just on a gut level, even if it’s shown to be wrong later in the thread.

This misses the obvious point that criticism of CEA has costs to CEA, but possibly has benefits to other people if CEA really has flaws.  It’s a sort of “EA, c’est moi” narcissism: what’s good for CEA is what’s good for the Movement, which is what’s good for the world.

Keeping promises is a symptom of autism

In the same thread criticizing the Giving What We Can pledge, Robert Wiblin, the director of research at 80,000 Hours, said:

Firstly: I think we should use the interpretation of the pledge that produces the best outcome. The use GWWC and I apply is completely mainstream use of the term pledge (e.g. you ‘pledge’ to stay with the person you marry, but people nonetheless get divorced if they think the marriage is too harmful to continue).

A looser interpretation is better because more people will be willing to participate, and each person gain from a smaller and more reasonable push towards moral behaviour. We certainly don’t want people to be compelled to do things they think are morally wrong – that doesn’t achieve an EA goal. That would be bad. Indeed it’s the original complaint here.

Secondly: An “evil future you” who didn’t care about the good you can do through donations probably wouldn’t care much about keeping promises made by a different kind of person in the past either, I wouldn’t think.

Thirdly: The coordination thing doesn’t really matter here because you are only ‘cooperating’ with your future self, who can’t really reject you because they don’t exist yet (unlike another person who is deciding whether to help you).

One thing I suspect is going on here is that people on the autism spectrum interpret all kinds of promises to be more binding than neurotypical people do (e.g. https://www.reddit.com/r/aspergers/comments/46zo2s/promises/). I don’t know if that applies to any individual here specifically, but I think it explains how some of us have very different intuitions. But I expect we will be able to do more good if we apply the neurotypical intuitions that most people share.

Of course if you want to make it fully binding for yourself, then nobody can really stop you.

In other words: Rob Wiblin thinks that promising to give 10% of income to charity for the rest of your life, which the Giving What We Can website describes as “a promise, or oath, to be made seriously and with every expectation of keeping it”, does not literally mean committing to actually do that. It means that you can quit any time you feel like it.

He thinks that you should interpret words with whatever interpretation will “do the most good”, instead of as, you know, what the words actually mean.

If you respond to a proposed pledge with “hm, I don’t know, that’s a really big commitment”, you must just be a silly autistic who doesn’t understand that you could just break your commitment when it gets tough to follow!  The movement doesn’t depend on weirdos like you, it needs to market to normal people!

I don’t know whether to be more frustrated with the ableism or the pathologization of integrity.

Once again, there is the insinuation that the growth of EA depends on manipulating the public — acquiring the dollars of the “normal” people who don’t think too much and can’t keep promises.

Jane Q. Public is stupid, impulsive, and easily led.  That’s why we want her.

“Because I Said So” is evidence

Jacy Reese, a prominent animal-rights-focused EA, responded to some criticism of Animal Charity Evaluators’ top charities on Facebook as follows:

Just to note, we (or at least I) agree there are serious issues with our leafleting estimate and hope to improve it in the near future. Unfortunately, there are lots of things that fit into this category and we just don’t have enough research staff time for all of them.

I spent a good part of 2016 helping make significant improvements to our outdated online ads quantitative estimate, which now aggregates evidence from intuition, experiments, non-animal-advocacy social science, and veg pledge rates to come up with the “veg-years per click” estimate. I’d love to see us do something similar with the leafleting estimate, and strongly believe we should keep trying, rather than throwing our hands in the air and declaring weak evidence is “no evidence.”

For context here, the “leafleting estimate” refers to the rate at which pro-vegan leaflets cause people to eat less meat (and hence the impact of leafleting advocacy at reducing animal suffering.)  The studies ACE used to justify the effectiveness of leafleting actually showed that leafleting was ineffective: an uncontrolled study of 486 college students shown a pro-vegetarianism leaflet found that only one student (0.2%) went vegetarian, while a controlled study conducted by ACE itself found that consumption of animal products was no lower in the leafleted group than the control group.  The criticisms of ACE’s leafleting estimate were not merely that it was flawed, but that it literally fabricated numbers based on a “hypothetical.”  ACE publishes “top charities” that it claims are effective at saving animal lives; the leafleting effectiveness estimates are used to justify why people should give money to certain veganism-activist charities.  A made-up reason to support a charity isn’t “weak evidence”, it’s lying.

In that context, it’s exceptionally shocking to hear Reese talking about “evidence from intuition,” which is…not evidence.

Reese continues:

Intuition is certainly evidence in this sense. If I have to make quick decisions, like in the middle of a conversation where I’m trying to inspire someone to help animals, would I be more successful on average if I flipped a coin for my responses or went with my intuition?

But that’s not the point.  Obviously, my intuition is valuable to me in making decisions on the fly.  But my intuition is not a reason why anybody else should follow my lead. For that, I’d have to give, y’know, reasons.

This is what the word “objectivity” means. It is the ability to share data between people, so that each can independently judge for themselves.

Reese is making the same kind of narcissistic fallacy we saw before. Reese is forgetting that his readers are not Jacy Reese and therefore “Jacy Reese thinks so” is not a compelling reason to them.  Or perhaps he’s hoping that his donors can be “inspired” to give money to organizations run by his friends, simply because he tells them to.

In a Facebook thread on Harrison Nathan’s criticism of leafleting estimates, Jacy Reese said:

I have lots of demands on my time, and like others have said, engaging with you seems particularly unlikely to help us move forward as a movement and do more for animals.

Nobody is obligated to spend time replying to anyone else, and it may be natural to get a little miffed at criticism, but I’d like to point out the weirdness of saying that criticism doesn’t “help us move forward as a movement.”  If a passenger in your car says “hey, you just missed your exit”, you don’t complain that he’s keeping you from moving forward. That’s the whole point. You might be moving in the wrong direction.

In the midst of this debate somebody commented,

“Sheesh, so much grenade throwing over a list of charities!  I think it’s a great list!”

This is a nice, Jane Q. Public, kind of sentiment.  Why, indeed, should we argue so much about charities? Giving to charity is a nice thing to do.  Why can’t we all just get along and promote charitable giving?

The point is, though — it’s giving to a good cause that’s a praiseworthy thing to do.  Giving to an arbitrary cause is not a good thing to do.

The whole point of the “effective” in “Effective Altruism” is that we, ideally, care about whether our actions actually have good consequences or not. We’d like to help animals or the sick or the poor, in real life. You don’t promote good outcomes if you oppose objectivity.

So what? The issue of exploitative marketing

These are informal comments by EAs, not official pronouncements.  And the majority of discussion of EA topics I’ve seen is respectful, thoughtful, and open to criticism.  So what’s the big deal if some EAs say problematic things?

There are some genuine scandals within the EA movement that pertain to deceptive marketing.  Intentional Insights, a supposed “EA” organization led by history professor Gleb Tsipursky, used astroturfing, paid for likes and positive comments, made false claims about his social media popularity, falsely claimed affiliation with other EA organizations, and may have required his employees to “volunteer” large amounts of unpaid labor for him.

To their credit, CEA repudiated Intentional Insights; Will McAskill’s excellent post on the topic argued that EA needs to clarify shared values and guard against people co-opting the EA brand to do unethical things.  One of the issues he brought up was

People engaging in or publicly endorsing ‘ends justify the means’ reasoning (for example involving plagiarism or dishonesty)

which is a perfect description of Tsipursky’s behavior.

I would argue that the problem goes beyond Tsipursky.  ACE’s claims about leafleting, and the way ACE’s advocates respond to criticism about it, are very plausibly examples of dishonesty defended with “ends justify the means” rhetoric.

More subtly, the most central effective altruism organizations and the custodians of the “Effective Altruism” brand are CEA and its offshoots (80,000 Hours and Giving What We Can), which are primarily focused on movement-building. And sometimes the way they do movement-building risks promoting an exploitative rather than cooperative relationship with the public.

What do I mean by that?

When you communicate cooperatively with a peer, you give them “news they can use.”  Cooperative advertising is a benefit to the consumer — if I didn’t know that there are salons in my neighborhood that specialize in cutting curly hair, then you, as the salon, are helping me by informing me about your services. If you argue cooperatively in favor of an action, you are telling your peer “hey, you might succeed better at your goals if you did such-and-such,” which is helpful information. Even making a request can be cooperative; if you care about me, you might want to know how best to make me happy, so when I express my preferences, I’m offering you helpful information.

When you communicate exploitatively with someone, you’re trying to gain from their weaknesses. Some of the sleazier forms of advertising are good examples of exploitation; if you make it very difficult to unsubscribe from your service, or make spammy websites whose addresses are misspellings of common website names, or make the “buy” button large and the “no thanks” button tiny, you’re trying to get money out of people’s forgetfulness or clumsiness rather than their actual interest in your product.  If you back a woman into an enclosed space and try to kiss her, you’re trying to get sexual favors as a result of her physical immobility rather than her actual willingness.

Exploitativeness is treating someone like a mark; cooperativeness is treating them like a friend.

A remarkable amount of EA discourse is framed cooperatively.  It’s about helping each other figure out how best to do good.  That’s one of the things I find most impressive about the EA movement — compared to other ideologies and movements, it’s unusually friendly, exploratory, and open to critical thinking.

However, if there are signs that EA orgs, as they grow and professionalize, are deliberately targeting growth among less-critical, less-intellectually-engaged, lower-integrity donors, while being dismissive towards intelligent and serious critics, which I think some of the discussions I’ve quoted on the GWWC pledge suggest, then it makes me worry that they’re trying to get money out of people’s weaknesses rather than gaining from their strengths.

Intentional Insights used the traditional tactics of scammy, lowest-common-denominator marketing. To a sophisticated reader, their site would seem lame, even if you didn’t know about their ethical lapses. It’s buzzwordy, clickbaity, and unoriginal.  And this isn’t an accident, any more than it’s an accident that spam emails have poor grammar. People who are fussy about quality aren’t the target market for exploitative marketing. The target market for exploitative marketing is and always has been the exceptionally unsophisticated.  Old people who don’t know how to use the internet; people too disorganized to cancel their subscriptions; people too impulsive to resist clicking on big red buttons; sometimes even literal bots.

The opposite approach, if you don’t want to drift towards a pattern of exploitative marketing, is to target people who seek out hard-to-fake signals of quality.  In EA, this would mean paying attention to people who have high standards in ethics and accuracy, and treating them as the core market, rather than succumbing to the temptation to farm metrics of engagement from whomever it’s easiest to recruit in the short-term.

Using “number of people who sign the GWWC pledge” as a metric of engagement in EA is nowhere near as shady as paying for Facebook likes, but I think there’s a similar flavor of exploitability between them.  You don’t want to be measuring how good you are at “doing good” by counting how many people make a symbolic or trivial gesture.  (And the GWWC pledge isn’t trivial or symbolic for most people…but it might become so if people keep insisting it’s not meant as a real promise.)

EAs can fight the forces of sleaze by staying cooperative — engaging with those who make valid criticisms, refusing the temptation to make strong but misleading advertising claims, respecting rather than denigrating people of high integrity, and generally talking to the public like we’re reasonable people.

CORRECTION

A previous version of this post used the name and linked to a Facebook comment by a volunteer member of an EA organization. He isn’t an official employee of any EA organization, and his views are his own, so he viewed this as an invasion of his privacy, and he’s right. I’ve retracted his name and the link.

Transcranial Direct Current Stimulation

Epistemic status: rough-draft, I wouldn’t be surprised if my conclusions reversed

tDCS consists of a pair of sponge electrodes on the head, through which constant current is placed, at about 0.029-0.08 mA per square centimeter. Locations vary based on the intended effect of the treatment.  Extending treatment is usually done by prolonging duration rather than increasing intensity, as higher currents cause more cutaneous pain. When done correctly, the stimulation is painless, and therefore can be compared to sham stimulation as a control.[1]

Bottom lines: there are some serious methodological flaws in tCDS studies.  “Sham” stimulation isn’t a perfect control, so some significant proportion of the effect may be placebo. And there’s quite significant variation in how much a given application of current increases the evoked potential in the brain.  Also, almost all the studies are quite small.

Given that, though, the effect sizes on working memory are quite good — comparable or better to the best nootropics (caffeine, modafinil, and amphetamine.)

As a treatment for depression, tCDS looks less impressive; aggregating the best-quality studies gives no net effect compared to sham stimulation.

As a treatment for chronic pain, tCDS looks quite good, though there’s not very many studies.

Cognition

A study of 15 healthy females found a slight improvement on a working memory task from anodal stimulation to the DLPFC, but not from cathodal stimulation of the DLPFC, stimulation of M1, or sham stimulation.  Cohen’s d is 0.66. [2]

18 patients with Parkinson’s given 1-2 mA of tCDS to the DLPFC for 20 min found a 20% increase in correct answers on a 3-back task compared to sham stimulation for 2 mA.  Stimulation with 1 mA improved accuracy by only 5%. Stimulation of M1 had a significant improvement of reaction time but not accuracy. Cohen’s d is 3.5.[3]

32 patients given sham, anodal, or cathodal stimulation of the DLPFC or M1 found that accuracy on a word-memorization task was significantly better with anodal DLPFC stimulation than sham (88% correct vs. 80% correct), while cathodal stimulation was worse than sham.  Sham and M1 stimulation were similar.  Cohen’s d was 3.5. [7]

18 subjects given a verbal-associative task with anodal DLPFC tCDS vs sham or cathodal tCDS significantly improved mean scores (9 vs. 7 out of 12 correct). There was no effect on verbal fluency scores (a test of how many unique words one can produce in a short timespan).  Cohen’s d was 0.8.[8]

12 patients given a 2-back task with sham, anodal DLPFC tCDS, or transcranial random noise stimulation found a significant improvement in speed but not accuracy for 2-back anodal tCDS vs. sham.  Cohen’s d was 0 for accuracy, 0.36 for speed.[9]

10 Alzheimer’s patients treated with tCDS on the DLPFC and left temporal cortex found significantly more correct responses with tCDS vs sham on a memorization task (30 vs. 35 correct responses out of 55) but no improvement in Stroop or digit span tests.  Cohen’s d was 1.[12]

16 Parkinson’s patients given tCDS to the DLPFC significantly improved phonemic verbal fluency relative to sham and TPC stimulation (p < 0.002) but did not improve semantic verbal fluency.[17]

In a study of 12 healthy subjects given a naming task, reaction times were decreased with anodal tCDS to the DLPFC and increased with cathodal tCDS to the DLPFC.[18]

15 healthy subjects given a 3-back working memory task given anodal tCDS to the DLPFC significantly improved accuracy with tCDS vs sham (80% correct vs 69% correct).  Cohen’s d of 0.87.[18]

10 stroke patients given a 2-back working memory task, treated with anodal DLPFC tCDS or sham, found significant improvement in accuracy in anodal but not sham groups. Cohen’s d of 2.4.[19]

28 patients with major depression given a 2-back task given tCDS or sham on the DLPFC found a significant improvement in accuracy with the active version vs. sham: 58% vs 42% correct, p = 0.04, Cohen’s d about 4.[20]

58 healthy subjects given working memory training had an effect size of DLPFC tCDS vs. sham of 1.5 on digit span (p = 0.025) , 1.35 for Stroop accuracy, 1.3 on the CVLT, no effect on Raven’s.[21]

30 healthy older adults were given sham or real anodal tCDS to the left DLPFC and given a 3-back test; there was no significant effect of stimulation on working memory performance.[24]

37 patients with temporal lobe epilepsy had no improvement in working or episodic memory from anodal tCDS to the left DLPFC.[25]

Mean Cohen’s d for working memory accuracy, weighted by sample size: 1.5.

A meta-analysis of 16 studies of anodal DLPFC tCDS found a mean effect size of 0.14 for accuracy and 0.15 for reaction time.[22]

I’m not certain why I’m getting such different numbers, except that my “review” seems to have included different studies than the meta-analysis did.  If you averaged the results, you’d still get a mean effect size of 0.75, which corresponds to a strong effect.

Speech

10 aphasic stroke patients treated with anodal tCDS over Wernicke’s area vs. sham: significantly improved accuracy on a picture-naming task (40% vs 20% correct before training, and 70% vs 50% after training.)  Anodal tCDS also improved mean reaction time (1.8 sec vs 2.5 sec.)  The improvement persisted 3 weeks after treatment.[6]

In 10 healthy subjects, anodal tCDS over Broca’s area vs sham increased verbal fluency: mean number of words were 22 vs. 16, and mean number of syllables was 15 vs. 14.  There was no effect when the tCDS was switched to the right-hemisphere analogue of Broca’s area.[10]

Pain

A study of 17 patients with central pain due to traumatic spinal cord injury, given 2 mA of tCDS to the motor cortex M1 or sham tCDS found a significant improvement of pain scores — from a 7 (out of 9) to a 4.  The effects of consecutive sessions were cumulative.  There was no significant effect of treatment on anxiety or cognitive function.

32 female patients with fibromyalgia were treated with sham tCDS, tCDS of M1, or tCDS of the DLPFC.  M1 stimulation worked, sham and DLPFC did not. Out of a subjective improvement scale (where 2 is “much improvement”,  3 is “minimal improvement”, and 4 is “no change”, the group treated with M1 tCDS was at 2.5 and the sham group was at 3.5; the DLPFC group was at 3.  This was 2 mA, 20 min/day, for 5 days.[4]

41 female patients with fibromyalgia treated with tCDS on M1, DLPFC, or sham found that M1 stimulation significantly improved pain scores compared to DLPFC or sham: from about a 6 (which was baseline) to a 4.  No significant effect on depression scores.[11]

A meta-analysis of tCDS for chronic pain found a pooled effect size of 2.29 on pain symptoms.[23]

Depression

A meta-analysis of the use of tCDS in depression (directed to the DLPFC) found that the mean effect on depressive symptoms was significant: a Hedges’ g score of 0.743, significant at a p-value of 0.006. There’s a bit of a bias in the data: the Fregni and Boggio labs had significantly larger effect sizes than the other labs, and there was significant heterogeneity in results. Only a minority of patients (10-30%) were responders.  The average reduction in symptom severity was about 30%.[5]

Another meta-analysis of 6 RCTs of tCDS in depression found no significant effect of tCDS vs. sham on response rates or remission rates for depression.[13]

Blinding issues

There are more frequent reports of itching and burning with real than sham tCDS, suggesting that blinding may not be sufficient.[14]  Participants are able to guess more accurately than chance whether they are in the active or sham treatment.[15]

Other problems

The MEP (electrical activity change) due to tCDS is extremely variable both between individuals and within the same individual. The MEP effect of tCDS can be abolished by moving or thinking while the current is being administered.[16]

DIY

If you want to zap your brain, there are a variety of places that sell tCDS devices.

The Brain Stimulator is $59.95

The foc.us  stimulator is $249, plus headsets and cables.

The Apex is $139.99

The Fisher Wallace Stimulator is $699.

Soterix Medical makes the standard clinical-use device, for investigational use only.

And, of course, a lot of people make DIY versions.

Safety issue to keep in mind: high voltage to your brain is not good. Anything above 2 mA is outside the range of what’s been studied and probably a bad idea.  If it hurts your skin, it’s too strong. A TENS unit is too strong.  A 9-volt battery is too strong. Do not do the thing.

References

[1]Nitsche, Michael A., et al. “Transcranial direct current stimulation: state of the art 2008.” Brain stimulation 1.3 (2008): 206-223.

[2]Fregni, Felipe, et al. “Anodal transcranial direct current stimulation of prefrontal cortex enhances working memory.” Experimental brain research 166.1 (2005): 23-30.

[3]Boggio, Paulo S., et al. “Effects of transcranial direct current stimulation on working memory in patients with Parkinson’s disease.” Journal of the neurological sciences 249.1 (2006): 31-38.

[4]Fregni, Felipe, et al. “A randomized, sham‐controlled, proof of principle study of transcranial direct current stimulation for the treatment of pain in fibromyalgia.” Arthritis & Rheumatism 54.12 (2006): 3988-3998.

[5]Kalu, U. G., et al. “Transcranial direct current stimulation in the treatment of major depression: a meta-analysis.” Psychological medicine 42.09 (2012): 1791-1800.

[6]Fiori, Valentina, et al. “Transcranial direct current stimulation improves word retrieval in healthy and nonfluent aphasic subjects.” Journal of Cognitive Neuroscience 23.9 (2011): 2309-2323.

[7]Javadi, Amir Homayoun, and Vincent Walsh. “Transcranial direct current stimulation (tDCS) of the left dorsolateral prefrontal cortex modulates declarative memory.” Brain stimulation 5.3 (2012): 231-241.

[8]Cerruti, Carlo, and Gottfried Schlaug. “Anodal transcranial direct current stimulation of the prefrontal cortex enhances complex verbal associative thought.” Journal of Cognitive Neuroscience 21.10 (2009): 1980-1987.

[9]Mulquiney, Paul G., et al. “Improving working memory: exploring the effect of transcranial random noise stimulation and transcranial direct current stimulation on the dorsolateral prefrontal cortex.” Clinical Neurophysiology 122.12 (2011): 2384-2389.

[10]Cattaneo, Z., A. Pisoni, and C. Papagno. “Transcranial direct current stimulation over Broca’s region improves phonemic and semantic fluency in healthy individuals.” Neuroscience 183 (2011): 64-70.

[11]Valle, Angela, et al. “Efficacy of anodal transcranial direct current stimulation (tDCS) for the treatment of fibromyalgia: results of a randomized, sham-controlled longitudinal clinical trial.” Journal of pain management 2.3 (2009): 353.

[12]Boggio, Paulo S., et al. “Temporal cortex direct current stimulation enhances performance on a visual recognition memory task in Alzheimer disease.” Journal of Neurology, Neurosurgery & Psychiatry 80.4 (2009): 444-447.

[13]Berlim, Marcelo T., Frederique Van den Eynde, and Z. Jeff Daskalakis. “Clinical utility of transcranial direct current stimulation (tDCS) for treating major depression: a systematic review and meta-analysis of randomized, double-blind and sham-controlled trials.” Journal of psychiatric research 47.1 (2013): 1-7.

[14]Kessler, Sudha Kilaru, et al. “Differences in the experience of active and sham transcranial direct current stimulation.” Brain stimulation 5.2 (2012): 155-162.

[15]O’connell, Neil E., et al. “Rethinking clinical trials of transcranial direct current stimulation: participant and assessor blinding is inadequate at intensities of 2mA.” PloS one 7.10 (2012): e47514.

[16]Horvath, Jared Cooney, Olivia Carter, and Jason D. Forte. “Transcranial direct current stimulation: five important issues we aren’t discussing (but probably should be).” Frontiers in systems neuroscience 8 (2014): 2.

[17]Pereira, Joana B., et al. “Modulation of verbal fluency networks by transcranial direct current stimulation (tDCS) in Parkinson’s disease.” Brain stimulation 6.1 (2013): 16-24.

[18]Ohn, Suk Hoon, et al. “Time-dependent effect of transcranial direct current stimulation on the enhancement of working memory.” Neuroreport 19.1 (2008): 43-47.

[19]Jo, Jung Mi, et al. “Enhancing the working memory of stroke patients using tDCS.” American Journal of Physical Medicine & Rehabilitation 88.5 (2009): 404-409.

[20]Oliveira, Janaina F., et al. “Acute working memory improvement after tDCS in antidepressant-free patients with major depressive disorder.” Neuroscience letters 537 (2013): 60-64.

[21]Richmond, Lauren L., et al. “Transcranial direct current stimulation enhances verbal working memory training performance over time and near transfer outcomes.” Journal of Cognitive Neuroscience (2014).

[22]Hill, Aron T., Paul B. Fitzgerald, and Kate E. Hoy. “Effects of anodal transcranial direct current stimulation on working memory: a systematic review and meta-analysis of findings from healthy and neuropsychiatric populations.” Brain stimulation 9.2 (2016): 197-208.

[23]Luedtke, Kerstin, et al. “Transcranial direct current stimulation for the reduction of clinical and experimentally induced pain: a systematic review and meta-analysis.” The Clinical journal of pain 28.5 (2012): 452-461.

[24]Nilsson, Jonna, Alexander V. Lebedev, and Martin Lövdén. “No significant effect of prefrontal tDCS on working memory performance in older adults.” Frontiers in aging neuroscience 7 (2015).

[25]Liu, Anli, et al. “Exploring the efficacy of a 5-day course of transcranial direct current stimulation (TDCS) on depression and memory function in patients with well-controlled temporal lobe epilepsy.” Epilepsy & Behavior 55 (2016): 11-20.