Table of Contents
Abstract ……………………………………………………………………………………….. 2
Acknowledgments…………………………………………………………………………………….. 3
Table of Contents ……………………………………………………………………………………. 4
Introduction ………………………………………………………………………………….. 6
Content Attribution………………………………………………………………………………. 6
Natural selection is a fundamental biological force ……………………………………… 7
Computational genomics can be used to quantitatively infer natural selection in action …….. 7
Part I: The assumptions of dN/dS ………………………………………………….. 8
The fitness effects of synonymous mutations are dwarfed by those of nonsynonymous
mutations ……………………………………………………………………………………………………………….. 9
Selection has indirect effects on fitness-neutral sites …………………………………………………… 9
Sequencing is a mature technology that falters in special cases ………………………………….. 10
Uncertainty about mutation generation rates would, if large, invalidate dN/dS ……………. 10
Part II: Selection in the noncoding genome………………………………………………….. 10
Part III: dN/dS is insufficient for Personalized Medicine …………………………………… 11
Natural selection would seem to not to be inferable in a single individual ……………………. 11
The billions of cells present in a single tumor allow natural selection to be inferred in a
single tumor ……………………………………………………………………………………………. 12
Simulations are a well-established tool in bioinformatics generally and tumor evolution
specifically …………………………………………………………………….. 13
Statement of Purpose …………………………………………………………………….. 15
Part I: Estimate the uncertainty in our knowledge of human mutation generation rates using
variants shared between the germline and somatic settings. …………………………. 15
Part II: Devise a new measure to quantify evolutionary selection pressure in noncoding
regions………………………………………………………………………………………………….. 15
Part III: Benchmark a framework for estimating the fitness effects of individual mutations
from individual tumors. …………………………………………………………………………. 15
Methods ……………………………………………………………………………………. 16
Part I ………………………………………………………………………………………………………. 16
Overall approach to estimate our uncertainty of mutation generation rates ………………… 16
The number of mutations that arise independently in multiple samples can be used to
estimate the total implied heterogeneity of mutation generation rates ……………………….. 16
Simulations were used to benchmark this approach ………………………………………………….. 17
A new statistic to quantify the portion of explainable heterogeneity …………………………… 17
The main databases used were large, high-quality public databases of somatic and
germline variants …………………………………………………………………………………………………… 18
Variants were partitioned into nucleotide contexts and genomic regions to apply my
statistical framework ……………………………………………………………………………………………… 19
Part II ……………………………………………………………………………………………………… 19
Developing an analogue of dN/dS for the noncoding genome …………………………………….. 19
Applying dC/dU to detect negative selection in the noncoding genome in cancer …………. 20
Part III …………………………………………………………………………………………………… 20
Simulations for benchmarking a new evolutionary tool ……………………………………………… 20
Tumors were simulated as a stochastic, time-branching process …………………………………. 20
Results ………………………………………………………………………………………………………. 22
Part I ………………………………………………………………………………………………………………. 22
Simulations indicate that the approach is well-powered …………………………………………….. 22
Mutation rates are sufficiently heterogeneous to result in three times as many recurrent
variants than expected by chance ……………………………………………………………………………. 22
Nucleotide context is a major determinant of variants shared between the soma and
germline ……………………………………………………………………………………………………………….. 23
Genomic region is a minor determinant of variants shared between the soma and germline
…………………………………………………………………………………………………………………………….. 23
Part II ……………………………………………………………………………………………………….. 23
Trace levels of negative selection in the cancer noncoding genome generally ………………. 23
Higher signals of negative selection in the most critical regions of the noncoding genome24
Part III ……………………………………………………………………………………………………….. 24
Overall, our approach was able to infer information about the identity and fitness impact
of subclonal drivers in simulated tumors ………………………………………………………………….. 24
Challenges & Troubleshooting …………………………………………………………………………………. 24
Discussion…………………………………………………………………………………………….. 25
Part I ………………………………………………………………………………………………………… 25
Part II …………………………………………………………………………………………………… 26
Part III …………………………………………………………………………………………………….. 28
References ……………………………………………………………………………………………… 29

