Information Processing

Pessimism of the Intellect, Optimism of the Will     Archive   Favorite posts   Twitter: @steve_hsu

Sunday, August 30, 2015

Jiujitsu renaissance

John Danaher discusses his coaching philosophy. Danaher trained UFC champions Georges St. Pierre and Chris Weidman, among others.

Danaher student Garry Tonon on wrestling and jiujitsu. He's shown rolling with AJ Agazarm, a former All-Big10 wrestler and no-gi BJJ world champ.

These fights are from a no-time-limit submission tournament a few years ago, featuring the top brown belts in the world. Some of the matches lasted over an hour, others ended after only 5 or 10 minutes.  I like this style of competition much more than fighting for points.

Thursday, August 27, 2015

Trump on carried interest and hedge funds: "They didn't build this country."

Say what you want about Trump, he's one of the only candidates who isn't beholden to oligarch campaign contributors. Below he goes after the crazy tax break that hedge fund managers enjoy.
Bloomberg: ... “I know a lot of bad people in this country that are making a hell of a lot of money and not paying taxes,” Trump said in an interview with Time, in apparent reference to hedge fund and private equity fund managers. “The tax law is totally screwed up.”

"They're paying nothing and it's ridiculous," he added on CBS a few days later. “The hedge fund guys didn't build this country. These are guys that shift paper around and they get lucky." He went on: “They’re energetic, they’re very smart. But a lot of them, it’s like they’re paper pushers. They make a fortune, they pay no tax... The hedge funds guys are getting away with murder.”

Trump was apparently referring to carried interest. Most hedge funds and private equity funds are structured as partnerships where the fund managers serve as general partners and the investors as limited partners. Carried interest represents the fund managers’ share of the income generated by the fund, which is typically 20 percent of the fund’s profits at the end of the year. For most funds, this share of the profits, called an “incentive fee,” makes up most of the fund managers’ income, and, depending on the size and performance of the fund, it can stretch into the hundreds of millions of dollars. It’s largely what pays for 40,000 square foot mansions in Greenwich, Conn., and major league baseball teams and $100 million works of art. Under current tax rules, much of that incentive fee income is taxed at the long-term capital gains rate of 20 percent. If it was taxed as ordinary income, the top rate would be 39.6 percent. For hedge fund managers, the carried interest tax provision is something of a third rail, the one thing that unites them in furious opposition.

Monday, August 24, 2015

Man and Superman

These are some of my favorite panels from Frank Miller's graphic novel The Dark Knight Returns (1986). See also I Love Jack Kirby. Click for larger versions.

Education and Achievement Gaps

This recent talk by Harvard economist and education researcher Roland Fryer reviews studies of student incentives, charter schools, best educational practices, and their effects on achievement gaps.  Audio  Slides (the features in the image below are not clickable).

A very recent preprint on a study of parental incentives:
Parental Incentives and Early Childhood Achievement: A Field Experiment in Chicago Heights

Roland G. Fryer, Jr.
Harvard University and NBER

Steven D. Levitt
University of Chicago and NBER

John A. List
University of Chicago and NBER

August 2015

This article describes a randomized field experiment in which parents were provided financial incentives to engage in behaviors designed to increase early childhood cognitive and executive function skills through a parent academy. Parents were rewarded for attendance at early childhood sessions, completing homework assignments with their children, and for their child’s demonstration of mastery on interim assessments. This intervention had large and statistically significant positive impacts on both cognitive and non-cognitive test scores of Hispanics and Whites, but no impact on Blacks. These differential outcomes across races are not attributable to differences in observable characteristics (e.g. family size, mother’s age, mother’s education) or to the intensity of engagement with the program. Children with above median (pre-treatment) non cognitive scores accrue the most benefits from treatment.

Saturday, August 22, 2015

Now go train jiujitsu: choked out terrorist edition

Spencer Stone (left) is a blue belt at Gracie Lisboa. He choked out the terrorist gunman on the Amsterdam-Paris train yesterday.
NYTimes: ... Alek Skarlatos, a specialist in the National Guard from Oregon vacationing in Europe with a friend in the Air Force, Airman First Class Spencer Stone and another American, Anthony Sadler, looked up and saw the gunman. Mr. Skarlatos, who was returning from a deployment in Afghanistan, looked over at the powerfully built Mr. Stone, a martial arts enthusiast. “Let’s go, go!” he shouted.

... In the train carriage, Mr. Stone was the first to act, jumping up at the command of Mr. Skarlatos. He sprinted through the carriage toward the gunman, running “a good 10 meters to get to the guy,” Mr. Skarlatos said. Mr. Stone was unarmed; his target was visibly bristling with weapons.

With Mr. Skarlatos close behind, Mr. Stone grabbed the gunman’s neck, stunning him. But the gunman fought back furiously, slashing with his blade, slicing Mr. Stone in the neck and hand and nearly severing his thumb. Mr. Stone did not let go.

The gunman “pulled out a cutter, started cutting Spencer,” Mr. Norman, the British consultant, told television interviewers. “He cut Spencer behind the neck. He nearly cut his thumb off.”

Mr. Skarlatos grabbed the gunman’s Luger pistol and threw it to the side. Incongruously, the gunman yelled at the men to return it, even as Mr. Stone was choking him. A train conductor rushed up and grabbed the gunman’s left arm, Mr. Norman recalled.

... Mr. Stone, wounded and bleeding, kept the suspect in a chokehold. “Spencer Stone is a very strong guy,” Mr. Norman said. The suspect passed out.

Wednesday, August 19, 2015

Lackeys of the plutocracy?

This essay is an entertaining read, if somewhat wrong headed. See here for an earlier post that discusses Steve Pinker's response to Deresiewicz's earlier article Don’t Send Your Kid to the Ivy League.
The Neoliberal Arts (Harpers): ... Now that the customer-service mentality has conquered academia, colleges are falling all over themselves to give their students what they think they think they want. Which means that administrators are trying to retrofit an institution that was designed to teach analytic skills — and, not incidentally, to provide young people with an opportunity to reflect on the big questions — for an age that wants a very different set of abilities. That is how the president of a top liberal-arts college can end up telling me that he’s not interested in teaching students to make arguments but is interested in leadership. That is why, around the country, even as they cut departments, starve traditional fields, freeze professorial salaries, and turn their classrooms over to adjuncts, colleges and universities are establishing centers and offices and institutes, and hiring coordinators and deanlets, and launching initiatives, and creating courses and programs, for the inculcation of leadership, the promotion of service, and the fostering of creativity. Like their students, they are busy constructing a parallel college. What will happen to the old one now is anybody’s guess.

So what’s so bad about leadership, service, and creativity? What’s bad about them is that, as they’re understood on campus and beyond, they are all encased in neoliberal assumptions. Neoliberalism, which dovetails perfectly with meritocracy, has generated a caste system: “winners and losers,” “makers and takers,” “the best and the brightest,” the whole gospel of Ayn Rand and her Übermenschen. That’s what “leadership” is finally about. There are leaders, and then there is everyone else: the led, presumably — the followers, the little people. Leaders get things done; leaders take command. When colleges promise to make their students leaders, they’re telling them they’re going to be in charge. ...

We have always been, in the United States, what Lionel Trilling called a business civilization. But we have also always had a range of counterbalancing institutions, countercultural institutions, to advance a different set of values: the churches, the arts, the democratic tradition itself. When the pendulum has swung too far in one direction (and it’s always the same direction), new institutions or movements have emerged, or old ones have renewed their mission. Education in general, and higher education in particular, has always been one of those institutions. But now the market has become so powerful that it’s swallowing the very things that are supposed to keep it in check. Artists are becoming “creatives.” Journalism has become “the media.” Government is bought and paid for. The prosperity gospel has arisen as one of the most prominent movements in American Christianity. And colleges and universities are acting like businesses, and in the service of businesses.

What is to be done? Those very same WASP aristocrats — enough of them, at least, including several presidents of Harvard and Yale — when facing the failure of their own class in the form of the Great Depression, succeeded in superseding themselves and creating a new system, the meritocracy we live with now. But I’m not sure we possess the moral resources to do the same. The WASPs had been taught that leadership meant putting the collective good ahead of your own. But meritocracy means looking out for number one, and neoliberalism doesn’t believe in the collective. As Margaret Thatcher famously said about society, “There’s no such thing. There are individual men and women, and there are families.” As for elite university presidents, they are little more these days than lackeys of the plutocracy, with all the moral stature of the butler in a country house.

Neoliberalism disarms us in another sense as well. For all its rhetoric of freedom and individual initiative, the culture of the market is exceptionally good at inculcating a sense of helplessness. So much of the language around college today, and so much of the negative response to my suggestion that students ought to worry less about pursuing wealth and more about constructing a sense of purpose for themselves, presumes that young people are the passive objects of economic forces. That they have no agency, no options. That they have to do what the market tells them. A Princeton student literally made this argument to me: If the market is incentivizing me to go to Wall Street, he said, then who am I to argue?

I have also had the pleasure, over the past year, of hearing from a lot of people who are pushing back against the dictates of neoliberal education: starting high schools, starting colleges, creating alternatives to high school and college, making documentaries, launching nonprofits, parenting in different ways, conducting their lives in different ways. I welcome these efforts, but none of them address the fundamental problem, which is that we no longer believe in public solutions. We only believe in market solutions, or at least private-sector solutions: one-at-a-time solutions, individual solutions.

The worst thing about “leadership,” the notion that society should be run by highly trained elites, is that it has usurped the place of “citizenship,” the notion that society should be run by everyone together. Not coincidentally, citizenship — the creation of an informed populace for the sake of maintaining a free society, a self-governing society — was long the guiding principle of education in the United States. ...

Crossfit Games 2015

Some great highlights.

Friday, August 14, 2015

Pinker on bioethics

Progress in biomedical research is slow enough. It does not need to be slowed down even further.
Boston Globe: A POWERFUL NEW technique for editing genomes, CRISPR-Cas9, is the latest in a series of advances in biotechnology that have raised concerns about the ethics of biomedical research and inspired calls for moratoria and new regulations. Indeed, biotechnology has moral implications that are nothing short of stupendous. But they are not the ones that worry the worriers.

... A truly ethical bioethics should not bog down research in red tape, moratoria, or threats of prosecution based on nebulous but sweeping principles such as “dignity,” “sacredness,” or “social justice.” Nor should it thwart research that has likely benefits now or in the near future by sowing panic about speculative harms in the distant future. These include perverse analogies with nuclear weapons and Nazi atrocities, science-fiction dystopias like “Brave New World’’ and “Gattaca,’’ and freak-show scenarios like armies of cloned Hitlers, people selling their eyeballs on eBay, or warehouses of zombies to supply people with spare organs. Of course, individuals must be protected from identifiable harm, but we already have ample safeguards for the safety and informed consent of patients and research subjects.

Some say that it’s simple prudence to pause and consider the long-term implications of research before it rushes headlong into changing the human condition. But this is an illusion.

First, slowing down research has a massive human cost. Even a one-year delay in implementing an effective treatment could spell death, suffering, or disability for millions of people.

Second, technological prediction beyond a horizon of a few years is so futile that any policy based on it is almost certain to do more harm than good. Contrary to confident predictions during my childhood, the turn of the 21st century did not bring domed cities, jetpack commuting, robot maids, mechanical hearts, or regularly scheduled flights to the moon. This ignorance, of course, cuts both ways: few visionaries foresaw the disruptive effects of the World Wide Web, digital music, ubiquitous smartphones, social media, or fracking. ...

Tuesday, August 11, 2015

Explain it to me like I'm five years old

An MIT Technology Review reporter interviewed me yesterday about my Nautilus Magazine article Super-Intelligent Humans Are Coming. I had to do the interview by gchat because my voice is recovering from a terrible cold and too much yakking with brain scientists at the Allen Institute in Seattle.

I realized I need to find an explanation for the thesis of the article which is as simple as possible -- so that MIT graduates can understand it ;-)

Let me know what you think of the following.
1. Cognitive ability is highly heritable. At least half the variance is genetic in origin.

2. It is influenced by many (probably thousands) of common variants (see GCTA estimates of heritability due to common SNPs). We know there are many because the fewer there are the larger the (average) individual effect size of each variant would have to be. But then the SNPs would be easy to detect with small sample size.

Recent studies with large sample sizes detected ~70 SNP hits, but would have detected many more if effect sizes were consistent with, e.g., only hundreds of causal variants in total.

3. Since these are common variants the probability of having the negative variant -- with (-) effect on g score -- is not small (e.g., like 10% or more).

4. So each individual is carrying around many hundreds (if not thousands) of (-) variants.

5. As long as effects are roughly additive, we know that changing ALL or MOST of these (-) variants into (+) variants would push an individual many standard deviations (SDs) above the population mean. Such an individual would be far beyond any historical figure in cognitive ability. 
Given more details we can estimate the average number of (-) variants carried by individuals, and how many SDs are up for grabs from flipping (-) to (+). As is the case with most domesticated plants and animals, we expect that the existing variation in the population allows for many SDs of improvement (see figure below).
For references and more detailed explanation, see On the Genetic Architecture of Cognitive Ability and Other Heritable Traits.

Monday, August 10, 2015


I watched this on the flight back from Asia. It's a kid movie but it operates at more than one level. The girl robot Athena is really fun.

Saturday, August 08, 2015

Caltech crushes Harvard, MIT, and all the rest

A few years ago I posted a list of number of Nobel prizes aggregated by undergraduate institution of the winner. A social science researcher who reads this blog got interested in the topic and has compiled much more complete information, which he is preparing to publish.

He reports that the school with the most Nobel + Fields + Turing prizes, normalized to size of (undergraduate) alumni population, is Caltech, which leads both Harvard and MIT (the next highest ranked schools) by a factor of 3 or 4. Caltech beats Michigan by a factor of ~50, and Ohio State (typical of good public flagships) by a factor of ~500!

To obtain a higher statistics measurement of exceptional achievement, he aggregated living members of the National Academy of Science, National Academy of Engineering, and Institute of Medicine, and normalized to size of alumni population over the last 100 years or so. Caltech again comes out first, beating both Harvard and MIT by a factor of about 1.5. Caltech beats Yale and Princeton by a factor of ~4, and Stanford by a factor of ~5. Swarthmore and Amherst are the leading liberal arts colleges. (See list below.) Caltech beats very good public universities by factors ~100 and more typical public universities by factors ~1000.

Berkeley is the best public university in both the Nobel+ and National Academies rankings. Berkeley is roughly tied with Stanford in Nobels+ per alum, but behind in academicians per capita.

As you might expect, correlation of rank order in these lists with average SAT score is pretty high. Likelihood ratios of ~500 or 1000 for high end achievement suggest that 1. psychometric scores used in college admissions have significant validity and 2. high end achievement is correlated to unusually high ability: two schools with very different mean SAT have very different population fractions above some threshold, such as +3 SD. For example at Caltech perhaps half the students are above +3 SD in ability, whereas at an average university only 1 in ~500 are at that level, leading to ratios as large as 100 or 1000!
Colleges ranked by per capita production of National Academy (Science, Engineering, Medicine) members:

California Institute of Technology
Massachusetts Institute of Technology
Harvard University
Swarthmore College
Yale University
Princeton University
Amherst College
Stanford University
Oberlin College
Columbia University
Haverford College
Cooper Union
Dartmouth College
See also Annals of Psychometry: IQs of eminent scientists, and Vernon Smith at Caltech.


Correction! The original post quoted results using an estimate of alumni population derived from recent US News data. However, some schools have changed over time in enrollment, so more precise estimates are required. The lists below use graduation numbers reported to IPEDS from 1966-2013 and probably yield more accurate rankings than what was reported above. The main difference on the Nobel+ list is that the University of Chicago jumps to #3 and MIT falls several notches. On the NAS/NAE/IOM list MIT is #2 and Harvard #3.

Undergraduate Institution | Nobel+ | Bachelor's degrees awarded (1966-2013) | Prize per capita ratio

California Institute of Technology 11 9348 0.001176722

Harvard University 34 81553 0.000416907

University of Chicago 15 37171 0.000403540

Swarthmore College 5 15825 0.000315956

Columbia University 20 68982 0.000289931

Massachusetts Institute of Technology 14 52891 0.000264695

Yale University 13 60107 0.000216281

Amherst College 4 18716 0.000213721

[ For comparison: Penn State and Ohio State ~ 0.0000028 and 0.0000026 ; many schools have zero Nobel+ winners. ]

Undergraduate Institution | NAS+NAE+IOM | Bachelor's degrees awarded (1966-2013) | ratio

California Institute of Technology 78 9348 0.0083440308

Massachusetts Institute of Technology 255 52891 0.0048212361

Harvard University 326 81553 0.0039974005

Swarthmore College 49 15825 0.0030963665

Princeton University 109 50633 0.0021527462

Amherst College 35 18716 0.0018700577

Yale University 112 60107 0.0018633437

University of Chicago 56 37171 0.0015065508

Stanford University 117 79683 0.0014683182

[ For comparison, Arizona State and Florida State  ~ 0.000013 ; University of Georgia ~ 0.000008 ]

Deep Learning in Nature

When I travel I often carry a stack of issues of Nature and Science to read (and then discard) on the plane.

The article below is a nice review of the current state of the art in deep neural networks. See earlier posts Neural Networks and Deep Learning 1 and 2, and Back to the Deep.
Deep learning
Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Nature 521, 436–444 (28 May 2015) doi:10.1038/nature14539 
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
The article seems to give a somewhat, er, compressed, version of the history of the field. See these comments by Schmidhuber:
Machine learning is the science of credit assignment. The machine learning community itself profits from proper credit assignment to its members. The inventor of an important method should get credit for inventing it. She may not always be the one who popularizes it. Then the popularizer should get credit for popularizing it (but not for inventing it). Relatively young research areas such as machine learning should adopt the honor code of mature fields such as mathematics: if you have a new theorem, but use a proof technique similar to somebody else's, you must make this very clear. If you "re-invent" something that was already known, and only later become aware of this, you must at least make it clear later.

As a case in point, let me now comment on a recent article in Nature (2015) about "deep learning" in artificial neural networks (NNs), by LeCun & Bengio & Hinton (LBH for short), three CIFAR-funded collaborators who call themselves the "deep learning conspiracy" (e.g., LeCun, 2015). They heavily cite each other. Unfortunately, however, they fail to credit the pioneers of the field, which originated half a century ago. All references below are taken from the recent deep learning overview (Schmidhuber, 2015), except for a few papers listed beneath this critique focusing on nine items.

1. LBH's survey does not even mention the father of deep learning, Alexey Grigorevich Ivakhnenko, who published the first general, working learning algorithms for deep networks (e.g., Ivakhnenko and Lapa, 1965). A paper from 1971 already described a deep learning net with 8 layers (Ivakhnenko, 1971), trained by a highly cited method still popular in the new millennium. Given a training set of input vectors with corresponding target output vectors, layers of additive and multiplicative neuron-like nodes are incrementally grown and trained by regression analysis, then pruned with the help of a separate validation set, where regularisation is used to weed out superfluous nodes. The numbers of layers and nodes per layer can be learned in problem-dependent fashion.

2. LBH discuss the importance and problems of gradient descent-based learning through backpropagation (BP), and cite their own papers on BP, plus a few others, but fail to mention BP's inventors. BP's continuous form was derived in the early 1960s (Bryson, 1961; Kelley, 1960; Bryson and Ho, 1969). Dreyfus (1962) published the elegant derivation of BP based on the chain rule only. BP's modern efficient version for discrete sparse networks (including FORTRAN code) was published by Linnainmaa (1970). Dreyfus (1973) used BP to change weights of controllers in proportion to such gradients. By 1980, automatic differentiation could derive BP for any differentiable graph (Speelpenning, 1980). Werbos (1982) published the first application of BP to NNs, extending thoughts in his 1974 thesis (cited by LBH), which did not have Linnainmaa's (1970) modern, efficient form of BP. BP for NNs on computers 10,000 times faster per Dollar than those of the 1960s can yield useful internal representations, as shown by Rumelhart et al. (1986), who also did not cite BP's inventors. [ THERE ARE 9 POINTS IN THIS CRITIQUE ]

... LBH may be backed by the best PR machines of the Western world (Google hired Hinton; Facebook hired LeCun). In the long run, however, historic scientific facts (as evident from the published record) will be stronger than any PR. There is a long tradition of insights into deep learning, and the community as a whole will benefit from appreciating the historical foundations. 
One very striking aspect of the history of deep neural nets, which is acknowledged both by Schmidhuber and LeCun et al., is that the subject was marginal to "mainstream" AI and CS research for a long time, and that new technologies (i.e., GPUs) were crucial to its current flourishing in terms of practical results. The theoretical results, such as they are, appeared decades ago! It is clear that there are many unanswered questions concerning guarantees of optimal solutions, the relative merits of alternative architectures, use of memory networks, etc.

Some additional points:

1. Prevalence of saddle points over local minima in high dimensional geometries: apparently early researchers were concerned about incomplete optimization of DNNs due to local minima in parameter space. But saddle points are much more common in high dimensional spaces and local minima have turned out not to be a big problem.

2. Optimized neural networks are similar in important ways to biological (e.g., monkey) brains! When monkeys and ConvNet are shown the same pictures, the activation of high-level units in the ConvNet explains half of the variance of random sets of 160 neurons in the monkey's inferotemporal cortex.

Some comments on the relevance of all this to the quest for human-level AI from an earlier post:
.. evolution has encoded the results of a huge environment-dependent optimization in the structure of our brains (and genes), a process that AI would have to somehow replicate. A very crude estimate of the amount of computational power used by nature in this process leads to a pessimistic prognosis for AI even if one is willing to extrapolate Moore's Law well into the future. [ Moore's Law (Dennard scaling) may be toast for the next decade or so! ] Most naive analyses of AI and computational power only ask what is required to simulate a human brain, but do not ask what is required to evolve one. I would guess that our best hope is to cheat by using what nature has already given us -- emulating the human brain as much as possible.

If indeed there are good (deep) generalized learning architectures to be discovered, that will take time. Even with such a learning architecture at hand, training it will require interaction with a rich exterior world -- either the real world (via sensors and appendages capable of manipulation) or a computationally expensive virtual world. Either way, I feel confident in my bet that a strong version of the Turing test (allowing, e.g., me to communicate with the counterpart over weeks or months; to try to teach it things like physics and watch its progress; eventually for it to teach me) won't be passed until at least 2050 and probably well beyond.
Relevant remarks from Schmidhuber:
[Link] ...Ancient algorithms running on modern hardware can already achieve superhuman results in limited domains, and this trend will accelerate. But current commercial AI algorithms are still missing something fundamental. They are no self-referential general purpose learning algorithms. They improve some system’s performance in a given limited domain, but they are unable to inspect and improve their own learning algorithm. They do not learn the way they learn, and the way they learn the way they learn, and so on (limited only by the fundamental limits of computability). As I wrote in the earlier reply: "I have been dreaming about and working on this all-encompassing stuff since my 1987 diploma thesis on this topic." However, additional algorithmic breakthroughs may be necessary to make this a practical reality.
[Link] The world of RNNs is such a big world because RNNs (the deepest of all NNs) are general computers, and because efficient computing hardware in general is becoming more and more RNN-like, as dictated by physics: lots of processors connected through many short and few long wires. It does not take a genius to predict that in the near future, both supervised learning RNNs and reinforcement learning RNNs will be greatly scaled up. Current large, supervised LSTM RNNs have on the order of a billion connections; soon that will be a trillion, at the same price. (Human brains have maybe a thousand trillion, much slower, connections - to match this economically may require another decade of hardware development or so). In the supervised learning department, many tasks in natural language processing, speech recognition, automatic video analysis and combinations of all three will perhaps soon become trivial through large RNNs (the vision part augmented by CNN front-ends). The commercially less advanced but more general reinforcement learning department will see significant progress in RNN-driven adaptive robots in partially observable environments. Perhaps much of this won’t really mean breakthroughs in the scientific sense, because many of the basic methods already exist. However, much of this will SEEM like a big thing for those who focus on applications. (It also seemed like a big thing when in 2011 our team achieved the first superhuman visual classification performance in a controlled contest, although none of the basic algorithms was younger than two decades:

So what will be the real big thing? I like to believe that it will be self-referential general purpose learning algorithms that improve not only some system’s performance in a given domain, but also the way they learn, and the way they learn the way they learn, etc., limited only by the fundamental limits of computability. I have been dreaming about and working on this all-encompassing stuff since my 1987 diploma thesis on this topic, but now I can see how it is starting to become a practical reality. Previous work on this is collected here:
See also Solomonoff universal induction. I don't believe that completely general purpose learning algorithms have to become practical before we achieve human-level AI. Humans are quite limited, after all! When was the last time you introspected to learn about the way you learn you learn ...? Perhaps it is happening "under the hood" to some extent, but not in maximum generality; we have hardwired limits.
Do we really need Solomonoff? Did Nature make use of his Universal Prior in producing us? It seems like cheaper tricks can produce "intelligence" ;-)

Tuesday, August 04, 2015

Seattle: quantum thermalization and genomic prediction

I'll be at the Institute for Nuclear Theory at the University of Washington tomorrow to discuss quantum thermalization in heavy ion collisions. Some brief slides.

On Thursday I'll be at the Allen Institute for Brain Science to give a talk (video and slides):
Title:  Genetic Architecture and Predictive Modeling of Quantitative Traits

Abstract: I discuss the application of Compressed Sensing (L1-penalized optimization or LASSO) to genomic prediction. I show that matrices comprised of human genomes are good compressed sensors, and that LASSO applied to genomic prediction exhibits a phase transition as the sample size is varied. When the sample size crosses the phase boundary complete identification of the subspace of causal variants is possible. For typical traits of interest (e.g., with heritability ~ 0.5), the phase boundary occurs at N ~ 30s, where s (sparsity) is the number of causal variants. I give some estimates of sparsity associated with complex traits such as height and cognitive ability, which suggest s ~ 10k. In practical terms, these results imply that powerful genomic prediction will be possible for many complex traits once ~ 1 million genotypes are available for analysis.

Sunday, August 02, 2015

Brooklyn with palm trees

Third wave coffee in Niles Canyon.

Saturday, August 01, 2015

Crossing the Pacific

So long, Hong Kong...

Foo Camp!

Someone is mining ether!

Tuesday, July 28, 2015

HaploSNPs and missing heritability

By constructing haplotypes using adjacent SNPs the authors arrive at a superior set of genetic variables with which to compute genetic similarity. These haplotypes tag rare variants and seem to recover a significant chunk of heritability not accounted for by common SNPs.

See also ref 32: Yang, J. et al. Estimation of genetic variance from imputed sequence variants reveals negligible missing heritability for human height and body mass index. Nature Genetics, submitted
Haplotypes of common SNPs can explain missing heritability of complex diseases (

While genome-wide significant associations generally explain only a small proportion of the narrow-sense heritability of complex disease (h2), recent work has shown that more heritability is explained by all genotyped SNPs (hg2). However, much of the heritability is still missing (hg2 < h2). For example, for schizophrenia, h2 is estimated at 0.7-0.8 but hg2 is estimated at ~0.3. Efforts at increasing coverage through accurately imputed variants have yielded only small increases in the heritability explained, and poorly imputed variants can lead to assay artifacts for case-control traits. We propose to estimate the heritability explained by a set of haplotype variants (haploSNPs) constructed directly from the study sample (hhap2). Our method constructs a set of haplotypes from phased genotypes by extending shared haplotypes subject to the 4-gamete test. In a large schizophrenia data set (PGC2-SCZ), haploSNPs with MAF > 0.1% explained substantially more phenotypic variance (hhap2 = 0.64 (S.E. 0.084)) than genotyped SNPs alone (hg2 = 0.32 (S.E. 0.029)). These estimates were based on cross-cohort comparisons, ensuring that cohort-specific assay artifacts did not contribute to our estimates. In a large multiple sclerosis data set (WTCCC2-MS), we observed an even larger difference between hhap2 and hg2, though data from other cohorts will be required to validate this result. Overall, our results suggest that haplotypes of common SNPs can explain a large fraction of missing heritability of complex disease, shedding light on genetic architecture and informing disease mapping strategies.

The excerpt below is my response to an excellent comment by Gwern:
Your summary is correct, AFAIU. Below is a bit more detail about the 4 gamete test, which differentiates between a recombination event (which breaks the haploblock for descendants of that individual; recombination = scrambling due to sexual reproduction) and a simple mutation at that locus. The goal is to impute identical blocks of DNA that are tagged by SNPs on standard chips.
Algorithm to generate haploSNPs 
... Given two alleles at the haploSNPs and two at the mismatch SNP, a maximum of four possible allelic combinations can be observed. If all four combinations are observed, this indicates that a recombination event is required to explain the mismatch, and the haploSNP will be terminated. If, however, only three combinations are observed, the mismatch may be explained by a mutation on the shared haplotype background. These mismatches are ignored and the haploSNP is extended further. We note that this approach can produce a very large number of haploSNPs and very long haploSNPs that could tag signals of cryptic relatedness. ...

>> This estimated heritability is much closer to the full-strength twin study estimates, showing that a lot of the 'missing' heritability is lurking in the rarer SNPs << 
This was already suspected by some researchers (including me), but the haploSNP results provide support for the hypothesis. It means that, e.g., with whole genomes we could potentially recover nearly all the predictive power implied by classical h2 estimates ...

Sunday, July 26, 2015

Greetings from HK

Meetings with BGI, HKUST, and financiers. Will stop in SV and Seattle (Allen Institute) on the way back.

Thursday, July 23, 2015

Drone Art

I saw this video at one of the Scifoo sessions on drones. Beautiful stuff!

I find this much more pleasing than fireworks. The amount of waste and debris generated by a big fireworks display is horrendous.

Monday, July 20, 2015

What is medicine’s 5 sigma?

Editorial in the Lancet, reflecting on the Symposium on the Reproducibility and Reliability of Biomedical Research held April 2015 by the Wellcome Trust.
What is medicine’s 5 sigma?

... much of the [BIOMEDICAL] scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, [BIOMEDICAL] science has taken a turn towards darkness. As one participant put it, “poor methods get results”. The Academy of Medical Sciences, Medical Research Council, and Biotechnology and Biological Sciences Research Council have now put their reputational weight behind an investigation into these questionable research practices. The apparent endemicity of bad research behaviour is alarming. In their quest for telling a compelling story, scientists too often sculpt data to fit their preferred theory of the world. ...

One of the most convincing proposals came from outside the biomedical community. Tony Weidberg is a Professor of Particle Physics at Oxford. ... the particle physics community ... invests great effort into intensive checking and rechecking of data prior to publication. By filtering results through independent working groups, physicists are encouraged to criticise. Good criticism is rewarded. The goal is a reliable result, and the incentives for scientists are aligned around this goal. Weidberg worried we set the bar for results in biomedicine far too low. In particle physics, significance is set at 5 sigma—a p value of 3 × 10–7 or 1 in 3·5 million (if the result is not true, this is the probability that the data would have been as extreme as they are). The conclusion of the symposium was that something must be done ...
I once invited a famous evolutionary theorist (MacArthur Fellow) at Oregon to give a talk in my institute, to an audience of physicists, theoretical chemists, mathematicians and computer scientists. The Q&A was, from my perspective, friendly and lively. A physicist of Hungarian extraction politely asked the visitor whether his models could ever be falsified, given the available field (ecological) data. I was shocked that he seemed shocked to be asked such a question. Later I sent an email thanking the speaker for his visit and suggesting he come again some day. He replied that he had never been subjected to such aggressive and painful attack and that he would never come back. Which community of scientists is more likely to produce replicable results?

See also Medical Science? and Is Science Self-Correcting?

To answer the question posed in the title of the post / editorial, an example of a statistical threshold which is sufficient for high confidence of replication is the p < 0.5 x 10^{-8} significance requirement in GWAS. This is basically the traditional p < 0.05 threshold corrected for multiple testing of 10^6 SNPs. Early "candidate gene" studies which did not impose this correction have very low replication rates. See comment below for what this implies about the validity of priors based on biological intuition.

I discuss this a bit with John Ioannidis in the video below.

Sunday, July 19, 2015

Technically Sweet

Regular readers will know that I've been interested in the so-called Teller-Ulam mechanism used in thermonuclear bombs. Recently I read Kenneth Ford's memoir Building the H Bomb: A Personal History. Ford was a student of John Wheeler, who brought him to Los Alamos to work on the H-bomb project. This led me to look again at Richard Rhodes's Dark Sun: The Making of Hydrogen Bomb. There is quite a lot of interesting material in these two books on the specific contributions of Ulam and Teller, and whether the Soviets came up with the idea themselves, or had help from spycraft. See also Sakharov's Third Idea and F > L > P > S.

The power of a megaton device is described below by a witness to the Soviet test.
The Soviet Union tested a two-stage, lithium-deuteride-fueled thermonuclear device on November 22, 1955, dropping it from a Tu-16 bomber to minimize fallout. It yielded 1.6 megatons, a yield deliberately reduced for the Semipalatinsk test from its design yield of 3 MT. According to Yuri Romanov, Andrei Sakharov and Yakov Zeldovich worked out the Teller-Ulam configuration in conversations together in early spring 1954, independently of the US development. “I recall how Andrei Dmitrievich gathered the young associates in his tiny office,” Romanov writes, “… and began talking about the amazing ability of materials with a high atomic number to be an excellent reflector of high-intensity, short-pulse radiation.” ...

Victor Adamsky remembers the shock wave from the new thermonuclear racing across the steppe toward the observers. “It was a front of moving air that you could see that differed in quality from the air before and after. It came, it was really terrible; the grass was covered with frost and the moving front thawed it, you felt it melting as it approached you.” Igor Kurchatov walked in to ground zero with Yuli Khariton after the test and was horrified to see the earth cratered even though the bomb had detonated above ten thousand feet. “That was such a terrible, monstrous sight,” he told Anatoli Alexandrov when he returned to Moscow. “That weapon must not be allowed ever to be used.”
The Teller-Ulam design uses radiation pressure (reflected photons) from a spherical fission bomb to compress the thermonuclear fuel. The design is (to quote Oppenheimer) "technically sweet" -- a glance at the diagram below should convince anyone who understands geometrical optics!

In discussions of human genetic engineering (clearly a potentially dangerous future technology), the analogy with nuclear weapons sometimes arises: what role do moral issues play in the development of new technologies with the potential to affect the future of humanity? In my opinion, genetic engineering of humans carries nothing like the existential risk of arsenals of Teller-Ulam devices. Genomic consequences will play out over long (generational) timescales, leaving room for us to assess outcomes and adapt accordingly. (In comparison, genetic modification of viruses, which could lead to pandemics, seems much more dangerous.)
It is my judgment in these things that when you see something that is technically sweet, you go ahead and do it and you argue about what to do about it only after you have had your technical success. -- Oppenheimer on the Teller-Ulam design for the H-bomb.
What is technically sweet about genomics? (1) the approximate additivity (linearity) of the genetic architecture of key traits such as human intelligence (2) the huge amounts of extant variance in the human population, enabling large improvements (3) matrices of human genomes are good compressed sensors, and one can estimate how much data is required to "solve" the genetic architecture of complex traits. See, e.g., Genius (Nautilus Magazine) and Genetic architecture and predictive modeling of quantitative traits.

More excerpts from Dark Sun below.

Enthusiasts of trans-generational epigenetics would do well to remember the danger of cognitive bias and the lesson of Lysenko. Marxian notions of heredity are dangerous because, although scientifically incorrect, they appeal to our egalitarian desires.
A commission arrived in Sarov one day to make sure everyone agreed with Soviet agronomist Trofim Lysenko's Marxian notions of heredity, which Stalin had endorsed. Sakharov expressed his belief in Mendelian genetics instead. The commission let the heresy pass, he writes, because of his “position and reputation at the Installation,” but the outspoken experimentalist Lev Altshuler, who similarly repudiated Lysenko, did not fare so well ...
The transmission of crucial memes from Szilard to Sakharov, across the Iron Curtain.
Andrei Sakharov stopped by Victor Adamsky's office at Sarov one day in 1961 to show him a story. It was Leo Szilard's short fiction “My Trial as a War Criminal,” one chapter of his book The Voice of the Dolphins, published that year in the US. “I'm not strong in English,” Adamsky says, “but I tried to read it through. A number of us discussed it. It was about a war between the USSR and the USA, a very devastating one, which brought victory to the USSR. Szilard and a number of other physicists are put under arrest and then face the court as war criminals for having created weapons of mass destruction. Neither they nor their lawyers could make up a cogent proof of their innocence. We were amazed by this paradox. You can't get away from the fact that we were developing weapons of mass destruction. We thought it was necessary. Such was our inner conviction. But still the moral aspect of it would not let Andrei Dmitrievich and some of us live in peace.” So the visionary Hungarian physicist Leo Szilard, who first conceived of a nuclear chain reaction crossing a London street on a gray Depression morning in 1933, delivered a note in a bottle to a secret Soviet laboratory that contributed to Andrei Sakharov's courageous work of protest that helped bring the US-Soviet nuclear arms race to an end.

Thursday, July 16, 2015

Frontiers in cattle genomics

A correspondent updates us on advances in genomic cattle breeding. See also Genomic Prediction: No Bull and It's all in the gene: cows.
More than a million cattle in the USDA dairy GWAS system (updated with new breeding value predictions weekly), as cost per marker drops exponentially:
The NM$ (Net Merit in units of dollars) utility function for selection is more and more sophisticated (able to avoid bad trade-offs from genetic correlations):
Cheap genotyping has allowed mass testing of cows, and made it practical to use dominance in models and to match up semen and cow for dominance synergies and heterosis (the dominance component is small compared to the additive one, as usual: for milk yield 5-7% dominance variance, 21-35% additive):
[Note: additive heritability for the traits cattle breeders work on is significantly lower than for cognitive ability.]
Matching mates to reduce inbreeding (without specific markers for dominance effects) by looking at predicted ROH:
Identifying recessive lethals and severe diseases:
For humans, see Genetic architecture and predictive modeling of quantitative traits.

Monday, July 13, 2015

Productive Bubbles

These slides are from one of the best sessions I attended at scifoo. Bill Janeway's perspective was both theoretical and historical, but in addition we had Sam Altman of Y Combinator to discuss Airbnb and other examples of 2 way market platforms (Uber, etc.) that may be enjoying speculative bubbles at the moment.

See also Andrew Odlyzko (Caltech '71 ;-) on British railway manias for specific cases of speculative funding of useful infrastructure: herehere and here.

Friday, July 10, 2015

Rustin Cohle: True Detective S1 (HBO)

"I think human consciousness is a tragic misstep in evolution. We became too self-aware. Nature created an aspect of nature separate from itself. We are creatures that should not exist by natural law. We are things that labor under the illusion of having a self; an accretion of sensory experience and feeling, programmed with total assurance that we are each somebody, when in fact everybody is nobody."
"To realize that all your life—you know, all your love, all your hate, all your memory, all your pain—it was all the same thing. It was all the same dream. A dream that you had inside a locked room. A dream about being a person. And like a lot of dreams there's a monster at the end of it."
More quotes. More video.

Matthew McConaughey on the role:

McConaughey as Wooderson in Dazed and Confused:

Monday, July 06, 2015

I call this progress

The tail of the (green) 2000 curve seems slightly off to me: ~10 million individuals with >$100k annual income? (~ $400k per annum for a family of four; but there are many more than 10 million "one percenters" in the US/Europe/Japan/China/etc.)

Via Roger Chen.

Astrophysical Constraints on Dark Energy v2

This is v2 of a draft we posted earlier in the year. The new version has much more detail on whether rotation curve measurements of an isolated dwarf galaxy might be able to constrain the local dark energy density. As we state in the paper (c is the local dark energy density):
In Table V, we simulate the results of measurements on v 2 (r) with corresponding error of 1%. We take ρ0 ∼ 0.2 GeV cm−3 and Rs ∼ 0.795 kpc for the dwarf galaxies. We vary the number of satellites N and their (randomly generated) orbital radii. For example, at 95% confidence level, one could bound c to be positive using 5 satellites at r ∼ 1 − 10 kpc. In order to bound c close to its cosmological value, one would need, e.g., at least 5 satellites at r ∼ 10 − 20 kpc or 10 satellites at r ∼ 5 − 15 kpc. 
... In Table VI, we simulate the results from measurements on v2(r), assuming that the corresponding error is 5%. Again, we take ρ0 ∼ 0.2 GeV cm3 and Rs ∼ 0.795 kpc for the dwarf galaxies. The table indicates that even at the sensitivity of 5%, one could rule out (at 95% confidence level) any Λ that is significantly larger than 1.58×1084 GeV2 by using, e.g., 5 satellites at r ∼ 1−10 kpc. The very existence of satellites of dwarf galaxies (even those close to the Milky Way, and hence subject to significant tidal forces that limit r) provides an upper limit on the local dark energy density, probably no more than an order of magnitude larger than the cosmological value.  
Since we are not real astronomers, it is unclear to us whether measurements of the type described above are pure science fiction or something possible, say, in the next 10-20 years. Multiple conversations with astronomers (and referees) have failed to completely resolve this issue. Note that papers in reference [11] (Swaters et al.) report velocity measurements for satellites of dwarf galaxies at radii ~ 10 kpc with existing technology.
Astrophysical Constraints on Dark Energy

Chiu Man Ho, Stephen D. H. Hsu
(Submitted on 23 Jan 2015 (v1), last revised 3 Jul 2015 (this version, v2))

Dark energy (i.e., a cosmological constant) leads, in the Newtonian approximation, to a repulsive force which grows linearly with distance and which can have astrophysical consequences. For example, the dark energy force overcomes the gravitational attraction from an isolated object (e.g., dwarf galaxy) of mass 107M⊙ at a distance of 23 kpc. Observable velocities of bound satellites (rotation curves) could be significantly affected, and therefore used to measure or constrain the dark energy density. Here, {\it isolated} means that the gravitational effect of large nearby galaxies (specifically, of their dark matter halos) is negligible; examples of isolated dwarf galaxies include Antlia or DDO 190.

Blog Archive