Last updated: 2017-04-02
Code version: 91bc83d
This notebook shows the SVD of the DeepSea response matrix.
library(ggplot2)
The mean was subtracted from each column before singular values were computed.
evals_df <- readRDS("../data/DeepSea_evals_df.RDS")
deepsea_pca_df <- readRDS("../data/DeepSeaPCA_df.RDS")
ggplot(evals_df,aes(x=ind,y=evals/sum(evals)))+geom_point()+ggtitle("Scree plot")+xlab("Rank")+ylab("Eigenvalues")
The Scree plot shows us that the first two eigenvalues (PC’s) contain a significant amount of the variance, but that there’s no sharp dropoff.
ggplot(deepsea_pca_df,aes(x=PC_1,y=PC_2,col=Source))+geom_point()+ggtitle("PC 1 1 vs PC 2")
Plot by annotation types (TF Binding/DNase/Histone Mark). See to_annotation_type
function here
library(stringr)
source('../../cell_type/yanyu_lib.R')
aucs_danq <- read.table('../data/aucs.with.tissue.txt', sep = '\t', header = T, fill = T)
aucs_danq$DeepSEA.ROC.AUC <- as.numeric(as.character(aucs_danq$DeepSEA.ROC.AUC))
Warning: NAs introduced by coercion
aucs_danq <- aucs_danq[!is.na(aucs_danq$DeepSEA.ROC.AUC),]
deepsea_pca_df$AnnotationType <- to_annotation_type(aucs_danq$TF.DNase.HistoneMark)
deepsea_pca_df$ROC.AUC <- aucs_danq$DeepSEA.ROC.AUC
deepsea_pca_df$PR.AUC <- as.numeric(as.character(aucs_danq$DeepSEA.PR.AUC))
cleaned_tissue <- as.character(aucs_danq$Tissue)
cleaned_tissue[!cleaned_tissue %in% names(table(cleaned_tissue))[table(cleaned_tissue) > 30]] <- 'Minor'
deepsea_pca_df$Tissue <- cleaned_tissue
cleaned_karyotype <- as.character(aucs_danq$Karyotype)
cleaned_karyotype[cleaned_karyotype == ''] <- 'unknown'
deepsea_pca_df$Karyotype <- cleaned_karyotype
ggplot(deepsea_pca_df,aes(x=PC_1,y=PC_2,col=AnnotationType))+geom_point()+ggtitle("PC 1 1 vs PC 2")
PC1 and PC2 separate Histone marks, TF binding, and DNase.
ggplot(deepsea_pca_df,aes(x=PC_1,y=PC_2,col=Tissue))+geom_point()+ggtitle("PC 1 1 vs PC 2")
ggplot(deepsea_pca_df,aes(x=PC_1,y=PC_2,col=Karyotype))+geom_point()+ggtitle("PC 1 1 vs PC 2")
to_annotation_type
function(x){
histone.ind <- sapply(x, function(x){
temp <- str_extract_all(pattern = '^H[0-9]', x)[[1]]
if(length(temp) != 0){
return(TRUE)
}else{
return(FALSE)
}
})
dnase.ind <- x == 'DNase'
re <- rep('TF', length(x))
re[histone.ind] <- 'Histone'
re[dnase.ind] <- 'DNase'
return(re)
}
sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X Yosemite 10.10.5
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringr_1.2.0 ggplot2_2.2.1 rmarkdown_1.4
loaded via a namespace (and not attached):
[1] Rcpp_0.12.10 assertthat_0.1 digest_0.6.12 rprojroot_1.2
[5] plyr_1.8.4 grid_3.3.3 gtable_0.2.0 backports_1.0.5
[9] git2r_0.18.0 magrittr_1.5 evaluate_0.10 scales_0.4.1
[13] stringi_1.1.3 lazyeval_0.2.0 labeling_0.3 tools_3.3.3
[17] munsell_0.4.3 yaml_2.1.14 colorspace_1.3-2 htmltools_0.3.5
[21] knitr_1.15.1 tibble_1.2
This R Markdown site was created with workflowr