Introduction

Because of the complexities of the library preparation and the low starting material in scRNA-Seq experiments, non-biological variation (batch effects) are present and can be a major source of variation present in single cell experiments (Hicks et al. 2017). ComBat is a widely used method for adjusting for batch effects in microarray and RNA-Seq data (Johnson, Li, and Rabinovic 2007). If users identify variation associated with a technical effect, ComBat can be run within the SCTK to remove this variation before further downstream analysis. Users can choose an annotation present in the annotation data frame and add additional covariates to the ComBat model before performing batch correction. After batch correction, the ComBat results are stored as an additional assay in the SCtkExperiment object, which can then be used in the other analysis tabs within the SCTK.

Plot Batch Effect

Analysis on the batch correction tab is performed on the assay selected in the “Select Assay” field. To visualize the batch effect present in the data, select an annotation column from the annotation data frame in the “Select Batch Annotation” drop down and an experimental condition annotation in the “Select Condition Annotation” drop down. A set of boxplots will appear that show the percent variation explained by condition+batch, condition alone, and batch alone.

Run Batch Correction

Select a batch correction method from the “Select Method” drop-down. Currently, only ComBat is supported, but additional methods will be added in later versions of the toolkit. After batch correction, the corrected data will be saved as an additional assay in the SCtkExperiment object. Choose a name for this assay in the “Assay Name to Use” field.

ComBat

To run ComBat batch correction, select a batch annotation, add any additional covariates to the model, and adjust any of the ComBat parameters available. For details about the available options for ComBat analysis, see the ComBat documentation.

Session info

## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.4
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] BiocStyle_2.12.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.1         rstudioapi_0.10    knitr_1.22        
##  [4] xml2_1.2.0         magrittr_1.5       roxygen2_6.1.1    
##  [7] MASS_7.3-51.4      R6_2.4.0           rlang_0.3.4       
## [10] stringr_1.4.0      tools_3.6.0        xfun_0.6          
## [13] htmltools_0.3.6    commonmark_1.7     yaml_2.2.0        
## [16] digest_0.6.18      assertthat_0.2.1   rprojroot_1.3-2   
## [19] bookdown_0.9       pkgdown_1.3.0      crayon_1.3.4      
## [22] BiocManager_1.30.4 fs_1.3.0           memoise_1.1.0     
## [25] evaluate_0.13      rmarkdown_1.12     stringi_1.4.3     
## [28] compiler_3.6.0     desc_1.2.0         backports_1.1.4

References

Hicks, Stephanie C, F William Townes, Mingxiang Teng, and Rafael A Irizarry. 2017. “Missing Data and Technical Variability in Single-Cell RNA- Sequencing Experiments.” bioRxiv.

Johnson, W Evan, Cheng Li, and Ariel Rabinovic. 2007. “Adjusting Batch Effects in Microarray Expression Data Using Empirical Bayes Methods.” Biostatistics 8 (1): 118–27.