Information Processing: 07/2015

Tuesday, July 28, 2015

HaploSNPs and missing heritability

By constructing haplotypes using adjacent SNPs the authors arrive at a superior set of genetic variables with which to compute genetic similarity. These haplotypes tag rare variants and seem to recover a significant chunk of heritability not accounted for by common SNPs.

See also ref 32: Yang, J. et al. Estimation of genetic variance from imputed sequence variants reveals negligible missing heritability for human height and body mass index. Nature Genetics, submitted

Haplotypes of common SNPs can explain missing heritability of complex diseases (http://dx.doi.org/10.1101/022418)

While genome-wide significant associations generally explain only a small proportion of the narrow-sense heritability of complex disease (h2), recent work has shown that more heritability is explained by all genotyped SNPs (hg2). However, much of the heritability is still missing (hg2 < h2). For example, for schizophrenia, h2 is estimated at 0.7-0.8 but hg2 is estimated at ~0.3. Efforts at increasing coverage through accurately imputed variants have yielded only small increases in the heritability explained, and poorly imputed variants can lead to assay artifacts for case-control traits. We propose to estimate the heritability explained by a set of haplotype variants (haploSNPs) constructed directly from the study sample (hhap2). Our method constructs a set of haplotypes from phased genotypes by extending shared haplotypes subject to the 4-gamete test. In a large schizophrenia data set (PGC2-SCZ), haploSNPs with MAF > 0.1% explained substantially more phenotypic variance (hhap2 = 0.64 (S.E. 0.084)) than genotyped SNPs alone (hg2 = 0.32 (S.E. 0.029)). These estimates were based on cross-cohort comparisons, ensuring that cohort-specific assay artifacts did not contribute to our estimates. In a large multiple sclerosis data set (WTCCC2-MS), we observed an even larger difference between hhap2 and hg2, though data from other cohorts will be required to validate this result. Overall, our results suggest that haplotypes of common SNPs can explain a large fraction of missing heritability of complex disease, shedding light on genetic architecture and informing disease mapping strategies.

The excerpt below is my response to an excellent comment by Gwern:

Your summary is correct, AFAIU. Below is a bit more detail about the 4 gamete test, which differentiates between a recombination event (which breaks the haploblock for descendants of that individual; recombination = scrambling due to sexual reproduction) and a simple mutation at that locus. The goal is to impute identical blocks of DNA that are tagged by SNPs on standard chips.

Algorithm to generate haploSNPs

... Given two alleles at the haploSNPs and two at the mismatch SNP, a maximum of four possible allelic combinations can be observed. If all four combinations are observed, this indicates that a recombination event is required to explain the mismatch, and the haploSNP will be terminated. If, however, only three combinations are observed, the mismatch may be explained by a mutation on the shared haplotype background. These mismatches are ignored and the haploSNP is extended further. We note that this approach can produce a very large number of haploSNPs and very long haploSNPs that could tag signals of cryptic relatedness. ...

>> This estimated heritability is much closer to the full-strength twin study estimates, showing that a lot of the 'missing' heritability is lurking in the rarer SNPs <<

This was already suspected by some researchers (including me), but the haploSNP results provide support for the hypothesis. It means that, e.g., with whole genomes we could potentially recover nearly all the predictive power implied by classical h2 estimates ...

Sunday, July 26, 2015

Greetings from HK

Meetings with BGI, HKUST, and financiers. Will stop in SV and Seattle (Allen Institute) on the way back.

Thursday, July 23, 2015

Drone Art

I saw this video at one of the Scifoo sessions on drones. Beautiful stuff!

I find this much more pleasing than fireworks. The amount of waste and debris generated by a big fireworks display is horrendous.

Monday, July 20, 2015

What is medicine’s 5 sigma?

Editorial in the Lancet, reflecting on the Symposium on the Reproducibility and Reliability of Biomedical Research held April 2015 by the Wellcome Trust.

What is medicine’s 5 sigma?

... much of the [BIOMEDICAL] scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, [BIOMEDICAL] science has taken a turn towards darkness. As one participant put it, “poor methods get results”. The Academy of Medical Sciences, Medical Research Council, and Biotechnology and Biological Sciences Research Council have now put their reputational weight behind an investigation into these questionable research practices. The apparent endemicity of bad research behaviour is alarming. In their quest for telling a compelling story, scientists too often sculpt data to fit their preferred theory of the world. ...

One of the most convincing proposals came from outside the biomedical community. Tony Weidberg is a Professor of Particle Physics at Oxford. ... the particle physics community ... invests great effort into intensive checking and rechecking of data prior to publication. By filtering results through independent working groups, physicists are encouraged to criticise. Good criticism is rewarded. The goal is a reliable result, and the incentives for scientists are aligned around this goal. Weidberg worried we set the bar for results in biomedicine far too low. In particle physics, significance is set at 5 sigma—a p value of 3 × 10–7 or 1 in 3·5 million (if the result is not true, this is the probability that the data would have been as extreme as they are). The conclusion of the symposium was that something must be done ...

I once invited a famous evolutionary theorist (MacArthur Fellow) at Oregon to give a talk in my institute, to an audience of physicists, theoretical chemists, mathematicians and computer scientists. The Q&A was, from my perspective, friendly and lively. A physicist of Hungarian extraction politely asked the visitor whether his models could ever be falsified, given the available field (ecological) data. I was shocked that he seemed shocked to be asked such a question. Later I sent an email thanking the speaker for his visit and suggesting he come again some day. He replied that he had never been subjected to such aggressive and painful attack and that he would never come back. Which community of scientists is more likely to produce replicable results?

See also Medical Science? and Is Science Self-Correcting?

To answer the question posed in the title of the post / editorial, an example of a statistical threshold which is sufficient for high confidence of replication is the p < 0.5 x 10^{-8} significance requirement in GWAS. This is basically the traditional p < 0.05 threshold corrected for multiple testing of 10^6 SNPs. Early "candidate gene" studies which did not impose this correction have very low replication rates. See comment below for what this implies about the validity of priors based on biological intuition.

I discuss this a bit with John Ioannidis in the video below.

Sunday, July 19, 2015

Technically Sweet

Regular readers will know that I've been interested in the so-called Teller-Ulam mechanism used in thermonuclear bombs. Recently I read Kenneth Ford's memoir Building the H Bomb: A Personal History. Ford was a student of John Wheeler, who brought him to Los Alamos to work on the H-bomb project. This led me to look again at Richard Rhodes's Dark Sun: The Making of Hydrogen Bomb. There is quite a lot of interesting material in these two books on the specific contributions of Ulam and Teller, and whether the Soviets came up with the idea themselves, or had help from spycraft. See also Sakharov's Third Idea and F > L > P > S.

The power of a megaton device is described below by a witness to the Soviet test.

The Soviet Union tested a two-stage, lithium-deuteride-fueled thermonuclear device on November 22, 1955, dropping it from a Tu-16 bomber to minimize fallout. It yielded 1.6 megatons, a yield deliberately reduced for the Semipalatinsk test from its design yield of 3 MT. According to Yuri Romanov, Andrei Sakharov and Yakov Zeldovich worked out the Teller-Ulam configuration in conversations together in early spring 1954, independently of the US development. “I recall how Andrei Dmitrievich gathered the young associates in his tiny office,” Romanov writes, “… and began talking about the amazing ability of materials with a high atomic number to be an excellent reflector of high-intensity, short-pulse radiation.” ...

Victor Adamsky remembers the shock wave from the new thermonuclear racing across the steppe toward the observers. “It was a front of moving air that you could see that differed in quality from the air before and after. It came, it was really terrible; the grass was covered with frost and the moving front thawed it, you felt it melting as it approached you.” Igor Kurchatov walked in to ground zero with Yuli Khariton after the test and was horrified to see the earth cratered even though the bomb had detonated above ten thousand feet. “That was such a terrible, monstrous sight,” he told Anatoli Alexandrov when he returned to Moscow. “That weapon must not be allowed ever to be used.”

The Teller-Ulam design uses radiation pressure (reflected photons) from a spherical fission bomb to compress the thermonuclear fuel. The design is (to quote Oppenheimer) "technically sweet" -- a glance at the diagram below should convince anyone who understands geometrical optics!

In discussions of human genetic engineering (clearly a potentially dangerous future technology), the analogy with nuclear weapons sometimes arises: what role do moral issues play in the development of new technologies with the potential to affect the future of humanity? In my opinion, genetic engineering of humans carries nothing like the existential risk of arsenals of Teller-Ulam devices. Genomic consequences will play out over long (generational) timescales, leaving room for us to assess outcomes and adapt accordingly. (In comparison, genetic modification of viruses, which could lead to pandemics, seems much more dangerous.)

It is my judgment in these things that when you see something that is technically sweet, you go ahead and do it and you argue about what to do about it only after you have had your technical success. -- Oppenheimer on the Teller-Ulam design for the H-bomb.

What is technically sweet about genomics? (1) the approximate additivity (linearity) of the genetic architecture of key traits such as human intelligence (2) the huge amounts of extant variance in the human population, enabling large improvements (3) matrices of human genomes are good compressed sensors, and one can estimate how much data is required to "solve" the genetic architecture of complex traits. See, e.g., Genius (Nautilus Magazine) and Genetic architecture and predictive modeling of quantitative traits.

More excerpts from Dark Sun below.

Enthusiasts of trans-generational epigenetics would do well to remember the danger of cognitive bias and the lesson of Lysenko. Marxian notions of heredity are dangerous because, although scientifically incorrect, they appeal to our egalitarian desires.

A commission arrived in Sarov one day to make sure everyone agreed with Soviet agronomist Trofim Lysenko's Marxian notions of heredity, which Stalin had endorsed. Sakharov expressed his belief in Mendelian genetics instead. The commission let the heresy pass, he writes, because of his “position and reputation at the Installation,” but the outspoken experimentalist Lev Altshuler, who similarly repudiated Lysenko, did not fare so well ...

The transmission of crucial memes from Szilard to Sakharov, across the Iron Curtain.

Andrei Sakharov stopped by Victor Adamsky's office at Sarov one day in 1961 to show him a story. It was Leo Szilard's short fiction “My Trial as a War Criminal,” one chapter of his book The Voice of the Dolphins, published that year in the US. “I'm not strong in English,” Adamsky says, “but I tried to read it through. A number of us discussed it. It was about a war between the USSR and the USA, a very devastating one, which brought victory to the USSR. Szilard and a number of other physicists are put under arrest and then face the court as war criminals for having created weapons of mass destruction. Neither they nor their lawyers could make up a cogent proof of their innocence. We were amazed by this paradox. You can't get away from the fact that we were developing weapons of mass destruction. We thought it was necessary. Such was our inner conviction. But still the moral aspect of it would not let Andrei Dmitrievich and some of us live in peace.” So the visionary Hungarian physicist Leo Szilard, who first conceived of a nuclear chain reaction crossing a London street on a gray Depression morning in 1933, delivered a note in a bottle to a secret Soviet laboratory that contributed to Andrei Sakharov's courageous work of protest that helped bring the US-Soviet nuclear arms race to an end.

Thursday, July 16, 2015

Frontiers in cattle genomics

A correspondent updates us on advances in genomic cattle breeding. See also Genomic Prediction: No Bull and It's all in the gene: cows.

More than a million cattle in the USDA dairy GWAS system (updated with new breeding value predictions weekly), as cost per marker drops exponentially: https://www.cdcb.us/Genotype/cur_freq.html

The NM$ (Net Merit in units of dollars) utility function for selection is more and more sophisticated (able to avoid bad trade-offs from genetic correlations): http://www.ars.usda.gov/research/publications/publications.htm?SEQ_NO_115=310013

Cheap genotyping has allowed mass testing of cows, and made it practical to use dominance in models and to match up semen and cow for dominance synergies and heterosis (the dominance component is small compared to the additive one, as usual: for milk yield 5-7% dominance variance, 21-35% additive): http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0103934

[Note: additive heritability for the traits cattle breeders work on is significantly lower than for cognitive ability.]

Matching mates to reduce inbreeding (without specific markers for dominance effects) by looking at predicted ROH: http://www.ars.usda.gov/research/publications/publications.htm?SEQ_NO_115=294115

Identifying recessive lethals and severe diseases: http://aipl.arsusda.gov/reference/recessive_haplotypes_ARR-G3.html http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0054872

For humans, see Genetic architecture and predictive modeling of quantitative traits.

Monday, July 13, 2015

Productive Bubbles

These slides are from one of the best sessions I attended at scifoo. Bill Janeway's perspective was both theoretical and historical, but in addition we had Sam Altman of Y Combinator to discuss Airbnb and other examples of 2 way market platforms (Uber, etc.) that may be enjoying speculative bubbles at the moment.

See also Andrew Odlyzko (Caltech '71 ;-) on British railway manias for specific cases of speculative funding of useful infrastructure: here, here and here.

Friday, July 10, 2015

Rustin Cohle: True Detective S1 (HBO)

"I think human consciousness is a tragic misstep in evolution. We became too self-aware. Nature created an aspect of nature separate from itself. We are creatures that should not exist by natural law. We are things that labor under the illusion of having a self; an accretion of sensory experience and feeling, programmed with total assurance that we are each somebody, when in fact everybody is nobody."

"To realize that all your life—you know, all your love, all your hate, all your memory, all your pain—it was all the same thing. It was all the same dream. A dream that you had inside a locked room. A dream about being a person. And like a lot of dreams there's a monster at the end of it."

More quotes. More video.

Matthew McConaughey on the role:

McConaughey as Wooderson in Dazed and Confused:

Monday, July 06, 2015

I call this progress

The tail of the (green) 2000 curve seems slightly off to me: ~10 million individuals with >$100k annual income? (~ $400k per annum for a family of four; but there are many more than 10 million "one percenters" in the US/Europe/Japan/China/etc.)

Via Roger Chen.

Astrophysical Constraints on Dark Energy v2

This is v2 of a draft we posted earlier in the year. The new version has much more detail on whether rotation curve measurements of an isolated dwarf galaxy might be able to constrain the local dark energy density. As we state in the paper (c is the local dark energy density):

In Table V, we simulate the results of measurements on v 2 (r) with corresponding error of 1%. We take ρ0 ∼ 0.2 GeV cm−3 and Rs ∼ 0.795 kpc for the dwarf galaxies. We vary the number of satellites N and their (randomly generated) orbital radii. For example, at 95% confidence level, one could bound c to be positive using 5 satellites at r ∼ 1 − 10 kpc. In order to bound c close to its cosmological value, one would need, e.g., at least 5 satellites at r ∼ 10 − 20 kpc or 10 satellites at r ∼ 5 − 15 kpc.

... In Table VI, we simulate the results from measurements on v2(r), assuming that the corresponding error is 5%. Again, we take ρ0 ∼ 0.2 GeV cm−3 and Rs ∼ 0.795 kpc for the dwarf galaxies. The table indicates that even at the sensitivity of 5%, one could rule out (at 95% confidence level) any Λ that is significantly larger than 1.58×10−84 GeV2 by using, e.g., 5 satellites at r ∼ 1−10 kpc. The very existence of satellites of dwarf galaxies (even those close to the Milky Way, and hence subject to significant tidal forces that limit r) provides an upper limit on the local dark energy density, probably no more than an order of magnitude larger than the cosmological value.

Since we are not real astronomers, it is unclear to us whether measurements of the type described above are pure science fiction or something possible, say, in the next 10-20 years. Multiple conversations with astronomers (and referees) have failed to completely resolve this issue. Note that papers in reference [11] (Swaters et al.) report velocity measurements for satellites of dwarf galaxies at radii ~ 10 kpc with existing technology.

Astrophysical Constraints on Dark Energy

Chiu Man Ho, Stephen D. H. Hsu
(Submitted on 23 Jan 2015 (v1), last revised 3 Jul 2015 (this version, v2))

Dark energy (i.e., a cosmological constant) leads, in the Newtonian approximation, to a repulsive force which grows linearly with distance and which can have astrophysical consequences. For example, the dark energy force overcomes the gravitational attraction from an isolated object (e.g., dwarf galaxy) of mass 107M⊙ at a distance of 23 kpc. Observable velocities of bound satellites (rotation curves) could be significantly affected, and therefore used to measure or constrain the dark energy density. Here, {\it isolated} means that the gravitational effect of large nearby galaxies (specifically, of their dark matter halos) is negligible; examples of isolated dwarf galaxies include Antlia or DDO 190.

Friday, July 03, 2015

Humans on AMC

This is a new AMC series, done in collaboration with Channel 4 in the UK. I just watched the first episode and it is really good.

Directional dominance on stature and cognition

Interesting results in this recent Nature article. The dominance effect is quite strong: the equivalent of first cousin inbreeding (homozygosity ~ 1/8) results in a decrease in height or cognitive ability of about 1/6 or 1/3 of an SD. That means the effect from alleles which depress the trait increases by significantly more than 2x in the homozygous (AA) as opposed to heterozygous (aA) case.

Directional dominance on stature and cognition in diverse human populations
(Nature July 2015; doi:10.1038/nature14618)

Homozygosity has long been associated with rare, often devastating, Mendelian disorders1, and Darwin was one of the first to recognize that inbreeding reduces evolutionary fitness2. However, the effect of the more distant parental relatedness that is common in modern human populations is less well understood. Genomic data now allow us to investigate the effects of homozygosity on traits of public health importance by observing contiguous homozygous segments (runs of homozygosity), which are inferred to be homozygous along their complete length. Given the low levels of genome-wide homozygosity prevalent in most human populations, information is required on very large numbers of people to provide sufficient power3, 4. Here we use runs of homozygosity to study 16 health-related quantitative traits in 354,224 individuals from 102 cohorts, and find statistically significant associations between summed runs of homozygosity and four complex traits: height, forced expiratory lung volume in one second, general cognitive ability and educational attainment (P < 1 × 10−300, 2.1 × 10−6, 2.5 × 10−10 and 1.8 × 10−10, respectively). In each case, increased homozygosity was associated with decreased trait value, equivalent to the offspring of first cousins being 1.2 cm shorter and having 10 months’ less education. Similar effect sizes were found across four continental groups and populations with different degrees of genome-wide homozygosity, providing evidence that homozygosity, rather than confounding, directly contributes to phenotypic variance. Contrary to earlier reports in substantially smaller samples5, 6, no evidence was seen of an influence of genome-wide homozygosity on blood pressure and low density lipoprotein cholesterol, or ten other cardio-metabolic traits. Since directional dominance is predicted for traits under directional evolutionary selection7, this study provides evidence that increased stature and cognitive function have been positively selected in human evolution, whereas many important risk factors for late-onset complex diseases may not have been.

From the paper:

... After exclusion of outliers, these effect sizes translate into a reduction of 1.2 cm in height and 137 ml in FEV1 for the offspring of first cousins, and into a decrease of 0.3 s.d. in g and 10 months’ less educational attainment.

These results support the claim that height and cognitive ability have been under positive selection in humans / hominids, so that causal variants tend to be rare and deleterious. For related discussion, see, e.g., section 3.1 in my article On the genetic architecture of intelligence and other quantitative traits and earlier post Deleterious variants affecting traits that have been under selection are rare and of small effect.

Information Processing

About Me