In this vignette, we will suggest a workflow for processing single cell RNA-Seq data to produce an SCtkExperiment object that can be used in the single cell toolkit.

Suggested Software

Some of the steps in this workflow are performed outside of R and are optional, but they produce useful metrics that can be explored using the single cell toolkit. Please see the individual tool websites for installation instructions.

  • FastQC - Command line or GUI program for performing quality control on sequencing files
  • MultiQC - Command line tool that creates a combined report from FastQC results run on individual files.
  • Rsubread - R package for alignment of fastq files to a reference genome

Tutorial, Single Cell RNA-Seq Reads to Single Cell Object

Step 1: FastQC Quality Control

Quality control performed on the FASTQ files can identify low quality or failed samples that you may want to exclude from downstream analysis. FastQC is the standard tool used for quality control of fastq files. Run FastQC on the command line on each individual file:

fastqc input_read_1.fastq.gz

This command will create a fastqc HTML report and a zip file of fastqc result files. To combine individual sample reports into a single report, use multiqc:

multiqc .

Step 2: Gather sample annotations and feature annotations

The single cell object contains raw read counts, normalized expression data, sample annotations, and feature annotations along with downstream analysis. While you can make a single cell object with just a count matrix, to take advantage of some of the analyses available in the single cell toolkit, sample annotation information and feature information may be helpful.

Sample Annotation Data Frame

The sample annotation data frame should contain columns of annotation information and row names that match the sample names in the count matrix. The output table multiqc_fastqc.txt can be used as a sample annotation data frame.

Feature Annotation Data Frame

The feature annotation data frame should contain columns of annotation information and row names that match the row names in the count matrix.

Step 3: Align the data with alignSingleCellData()

For basic alignment and feature counting you can use the alignSingleCellData() function in the single cell toolkit to align fastq data to a reference genome, count the number of reads per gene, and create a single cell object that contains annotation information. After running the alignSingleCellData() function you can take the single cell object directly into the shiny app for downstream analysis. Detailed information about the options available in the alignSingleCellData() function can be found in the function help, but here is an example command:

NOTE: Alignment to large genomes (the human genome) can take more than 8GB of memory. Make sure to run the alignSingleCellData() function from a computer with sufficient memory.

singlecellobject <- alignSingleCellData(inputfile1 = c(
  "/path/to/sample1_1.fastq.gz","/path/to/sample2_1.fastq.gz"),
  inputfile2 = c("/path/to/sample1_2.fastq.gz",
                 "/path/to/sample2_2.fastq.gz"),
  indexPath = "/path/to/genome/index",
  gtfAnnotation = "/path/to/gene/annotations.gtf",
  sampleAnnotations = sample.annotation.df,
  threads=4)

Parallelization

While setting the number of threads in the alignSingleCellData() function will increase the speed of each individual alignment, the function will align each sample sequentially. To save time, you may want to perform alignment and feature counting on each file in a parallel computing environment and combine the samples to create an SCtkExperiment object. The alignSingleCellData() function can be run on an individual file. Once individual sample objects have been created, combine them into a single SCtkExperiment object.

Creating a Single Cell Object from aligned BAM reads

If your data requires additional processing steps (UMI normalization, etc), you can use the alignSingleCellData() function to perform feature counting and create a single cell object without the alignment step if ‘.bam’ files are provided in the inputfile1 parameter. Note that if your read data was sequenced using paired-end reads, set the isPairedEnd parameter TRUE.

Manually Creating a Single Cell Object with createSCE()

If you choose to align and quantify your data using alternative tools, you can use the createSCE() function to create a single cell object that can be used in the single cell toolkit by providing a count data frame and optionally providing sample and feature annotations. See the createSCE() function help page for details.