Last updated: 2019-07-31

In this document I will summarize my current progress analyzing the latest the GWAS results. I’ll explain briefly how torus does enrichment analysis (and what I mean by “enrichment analysis”). Then I’ll go over the results of that analysis. After that I’ll give a brief overview of susie, and then I’ll show some results. Before any of that, I’m going to talk a little bit about the dataset(s) I’m working with.



A few numbers about the GWAS summary statistics:

  • 14991824 variants
  • top hit has \(p\)=3.03610^{-26}
  • p<1e-8 on chromosomes 1,3,5

#Enrichment analysis and fine mapping pipeline overview

  1. Estimate the relationship between a particular genomic annotation (e.g ATAC-seq peaks) and GWAS significance genome-wide (Torus)
  2. Use the enrichment estimate to specify a per-variant prior.
  3. Using the prior and a reference LD panel, identify putative causal variants (susie)

For both the enrichment analysis and the fine mapping, the genome is broken in to chunks according to approximately independent blocks as determined by ldetect. These blocks are then broken into blocks no greater than 50000 and no less than data_config$min_snp. For torus (enrichment) there is an assumption that there is at most 1 causal variant per chunk, and for susie (fine-mapping) the assumption is that there are at most \(L\), where \(L\) is a tuneable parameter. I ran susie with 3 values of \(L\): 1, 3 and 10. These assumptions make chunk size an important parameter when performing either fine mapping or enrichment analysis.

Epigenomic Data

Noboru has provided me with bed file annotations of the genome generated from the various experiments performed on the cell lines.

  1. ATAC seq
  2. H3K4me1 chip
  3. H3K4me3 chip
  4. H3K27ac chip

Within these categories there are three subcategories:

  1. Per-cell-line. Each cell line was either a control (ctr) or underwent decidualization dec.
  2. Annotations consistent across the three control or decidualized cell line samples.
  3. Differental peaks. Peaks that have increased read counts in decidualized over control

In addition I have:

  • Endometrial eQTL (Ober Lab)
  • Hi-C from one cell line hic_all_interacting_DT1_dTL4_D_48h
  • Annotation predicting repressive regions (from Hoffman et al.)

All the univariate results

Below are the univariate enrichment results. Using a single epigenomic dataset, I ran torus, and got an effect size and standard error of the enrichment of each dataset for GWAS hits. These are plotted below

Multivariate effect size estimates

Below are the multivariate effect size estimates for 5 features that came out of a forward selection.

It should be noted that not all of these features are significant when fit as 5 univariate models.

Fine mapping results

Each SNP that underwent fine mapping has a \(p\)-value, a prior, and a posterior inclusion probability, or pip, which is the predicted probability that the SNP is a causal variant (i.e that the effect size estimate is not a sample from the null distribution).

Below is an example of what this looks like for a region

Susie plots of top regions

What is the effect of the prior?

How does the pip change as a consequence of the prior? I ran susie using a prior derived from the enrichment model and compared it to a result from running with a uniform prior.

Joining, by = "region_id"

Joining, by = "region_id"

Joining, by = "region_id"

And here’s another perspective on the same data

Joining, by = "region_id"

Joining, by = "region_id"

Joining, by = "region_id"

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Manjaro Linux

Matrix products: default
BLAS/LAPACK: /usr/lib/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tidyselect_0.2.5   RSSp_0.9.0.9000    ldmap_0.0.0.9000  
 [4] daprcpp_1.0.0.9000 ldshrink_1.0-1     bigsnpr_0.11.5    
 [7] bigstatsr_0.9.9    vroom_1.0.2.9000   RSQLite_2.1.1     
[10] glue_1.3.1         drake_7.4.0.9000   fs_1.3.1          
[13] susieR_0.8.1.0525  here_0.1           forcats_0.4.0     
[16] stringr_1.4.0      dplyr_0.8.3        purrr_0.3.2       
[19] readr_1.3.1        tidyr_0.8.3        tibble_2.1.3      
[22] tidyverse_1.2.1    dbplyr_1.4.2       MonetDBLite_0.6.1 
[25] plotly_4.9.0       ggplot2_3.2.0      gtable_0.3.0      
[28] gridExtra_2.3      scales_1.0.0      

loaded via a namespace (and not attached):
 [1] colorspace_1.4-1       RcppEigen_0.    rprojroot_1.3-2       
 [4] XVector_0.24.0         GenomicRanges_1.36.0   rstudioapi_0.10       
 [7] DT_0.7                 bit64_0.9-7            lubridate_1.7.4       
[10] xml2_1.2.0             codetools_0.2-16       knitr_1.23            
[13] jsonlite_1.6           workflowr_1.4.0        broom_0.5.2           
[16] shiny_1.3.2            compiler_3.6.1         httr_1.4.0            
[19] backports_1.1.4        assertthat_0.2.1       Matrix_1.2-17         
[22] lazyeval_0.2.2         cli_1.1.0              later_0.8.0           
[25] htmltools_0.3.6        tools_3.6.1            igraph_1.2.4.1        
[28] GenomeInfoDbData_1.2.1 reshape2_1.4.3         Rcpp_1.0.2            
[31] cellranger_1.1.0       nlme_3.1-140           iterators_1.0.10      
[34] crosstalk_1.0.0        wavethresh_4.6.8       xfun_0.7              
[37] rvest_0.3.4            mime_0.7               zlibbioc_1.30.0       
[40] MASS_7.3-51.4          hms_0.4.2              promises_1.0.1        
[43] parallel_3.6.1         yaml_2.2.0             memoise_1.1.0         
[46] stringi_1.4.3          highr_0.8              S4Vectors_0.22.0      
[49] foreach_1.4.4          BiocGenerics_0.30.0    filelock_1.0.2        
[52] storr_1.2.2            GenomeInfoDb_1.20.0    rlang_0.4.0.9000      
[55] pkgconfig_2.0.2        bitops_1.0-6           evaluate_0.14         
[58] lattice_0.20-38        htmlwidgets_1.3        labeling_0.3          
[61] cowplot_1.0.0          bit_1.1-14             plyr_1.8.4            
[64] magrittr_1.5           R6_2.4.0               IRanges_2.18.1        
[67] generics_0.0.2         base64url_1.4          txtq_0.1.3            
[70] DBI_1.0.0              pillar_1.4.2           haven_2.1.0           
[73] whisker_0.3-2          withr_2.1.2            RCurl_1.95-4.12       
[76] modelr_0.1.4           crayon_1.3.4           rmarkdown_1.13        
[79] grid_3.6.1             readxl_1.3.1           data.table_1.12.2     
[82] blob_1.1.1             git2r_0.26.1           digest_0.6.20         
[85] xtable_1.8-4           httpuv_1.5.1           RcppParallel_4.4.3    
[88] stats4_3.6.1           munsell_0.5.0          viridisLite_0.3.0