Last updated: 2017-04-02

Code version: 91bc83d

This notebook shows the SVD of the DeepSea response matrix.

library(ggplot2)

The mean was subtracted from each column before singular values were computed.

Loading data

evals_df <- readRDS("../data/DeepSea_evals_df.RDS")
deepsea_pca_df <- readRDS("../data/DeepSeaPCA_df.RDS")
ggplot(evals_df,aes(x=ind,y=evals/sum(evals)))+geom_point()+ggtitle("Scree plot")+xlab("Rank")+ylab("Eigenvalues")

The Scree plot shows us that the first two eigenvalues (PC’s) contain a significant amount of the variance, but that there’s no sharp dropoff.

PCA plots

ggplot(deepsea_pca_df,aes(x=PC_1,y=PC_2,col=Source))+geom_point()+ggtitle("PC 1  1 vs PC 2")

Plot by annotation types (TF Binding/DNase/Histone Mark). See to_annotation_type function here

library(stringr)
source('../../cell_type/yanyu_lib.R')
aucs_danq <- read.table('../data/aucs.with.tissue.txt', sep = '\t', header = T, fill = T)
aucs_danq$DeepSEA.ROC.AUC <- as.numeric(as.character(aucs_danq$DeepSEA.ROC.AUC))
Warning: NAs introduced by coercion
aucs_danq <- aucs_danq[!is.na(aucs_danq$DeepSEA.ROC.AUC),]
deepsea_pca_df$AnnotationType <- to_annotation_type(aucs_danq$TF.DNase.HistoneMark)
deepsea_pca_df$ROC.AUC <- aucs_danq$DeepSEA.ROC.AUC
deepsea_pca_df$PR.AUC <- as.numeric(as.character(aucs_danq$DeepSEA.PR.AUC))
cleaned_tissue <- as.character(aucs_danq$Tissue)
cleaned_tissue[!cleaned_tissue %in% names(table(cleaned_tissue))[table(cleaned_tissue) > 30]] <- 'Minor'
deepsea_pca_df$Tissue <- cleaned_tissue
cleaned_karyotype <- as.character(aucs_danq$Karyotype)
cleaned_karyotype[cleaned_karyotype == ''] <- 'unknown'
deepsea_pca_df$Karyotype <- cleaned_karyotype
ggplot(deepsea_pca_df,aes(x=PC_1,y=PC_2,col=AnnotationType))+geom_point()+ggtitle("PC 1  1 vs PC 2")

PC1 and PC2 separate Histone marks, TF binding, and DNase.

ggplot(deepsea_pca_df,aes(x=PC_1,y=PC_2,col=Tissue))+geom_point()+ggtitle("PC 1  1 vs PC 2")

ggplot(deepsea_pca_df,aes(x=PC_1,y=PC_2,col=Karyotype))+geom_point()+ggtitle("PC 1  1 vs PC 2")

Code

to_annotation_type
function(x){
  histone.ind <- sapply(x, function(x){
    temp <- str_extract_all(pattern = '^H[0-9]', x)[[1]]
    if(length(temp) != 0){
      return(TRUE)
    }else{
      return(FALSE)
    }
  })
  dnase.ind <- x == 'DNase'
  re <- rep('TF', length(x))
  re[histone.ind] <- 'Histone'
  re[dnase.ind] <- 'DNase'
  return(re)
}

Session information

sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X Yosemite 10.10.5

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stringr_1.2.0 ggplot2_2.2.1 rmarkdown_1.4

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10     assertthat_0.1   digest_0.6.12    rprojroot_1.2   
 [5] plyr_1.8.4       grid_3.3.3       gtable_0.2.0     backports_1.0.5 
 [9] git2r_0.18.0     magrittr_1.5     evaluate_0.10    scales_0.4.1    
[13] stringi_1.1.3    lazyeval_0.2.0   labeling_0.3     tools_3.3.3     
[17] munsell_0.4.3    yaml_2.1.14      colorspace_1.3-2 htmltools_0.3.5 
[21] knitr_1.15.1     tibble_1.2      

This R Markdown site was created with workflowr