Skip to contents

Function for demultiplexing sequencing reads arranged in a common format provided by sequencers (such as Illumina) generally for 16S data. This function takes a matrix of sample names/barcodes, a .fastq file of barcodes by sequence header, and a .fastq file of reads corresponding to the barcodes. Based on the barcodes given, the function extracts all reads for the indexed barcode and writes all the reads from that barcode to separate .fastq files.


  rcBarcodes = TRUE,
  location = NULL,
  threads = 1,
  hammingDist = 0,
  quiet = TRUE



File name for a file containing a .tsv matrix with a header row, and then sample names (column 1) and barcodes (column 2).


Location to a .fastq file that contains the barcodes for each read. The headers should be the same (and in the same order) as readFile, and the sequence in the indexFile should be the corresponding barcode for each read. Quality scores are not considered.


Location to the sequencing read .fastq file that corresponds to the indexFile.


Should the barcode indexes in the barcodeFile be reverse complemented to match the sequences in the indexFile? Defaults to TRUE.


A directory location to store the demultiplexed read files. Defaults to generate a new temporary directory.


The number of threads to use for parallelization (BiocParallel). This function will parallelize over the barcodes and extract reads for each barcode separately and write them to separate demultiplexed files.


Uses a Hamming Distance or number of base differences to allow for inexact matches for the barcodes/indexes. Defaults to 0. Warning: if the Hamming Distance is >=1 and this leads to inexact index matches to more than one barcode, that read will be written to more than one demultiplexed read files.


Turns off most messages. Default is TRUE.


Returns multiple .fastq files that contain all reads whose index matches the barcodes given. These files will be written to the location directory, and will be named based on the given sampleNames and barcodes, e.g. './demultiplex_fastq/SampleName1_GGAATTATCGGT.fastq.gz'


## Get barcode, index, and read data locations
barcodePath <- system.file("extdata", "barcodes.txt", package = "MetaScope")
indexPath <- system.file("extdata", "virus_example_index.fastq",
                         package = "MetaScope")
readPath <- system.file("extdata", "virus_example.fastq",
                         package = "MetaScope")

## Demultiplex
demult <- demultiplex(barcodePath, indexPath, readPath, rcBarcodes = FALSE,
                      hammingDist = 2)
#> Warning: metadata columns on input DNAStringSet object were dropped
#>   SampleName Barcode NumberOfReads
#> 1        CDV TCCACGT            25
#> 2   LaCrosse ACAGGCT            25
#> 3        RSV ATCGTGC            25
#> 4       EboV ACTACAG            25
#> 5    Measles AAGTCGC            25
#> 6        VSV TCTCAGG            25