Demultiplexing sequencing readsSource:
Function for demultiplexing sequencing reads arranged in a common format provided by sequencers (such as Illumina) generally for 16S data. This function takes a matrix of sample names/barcodes, a .fastq file of barcodes by sequence header, and a .fastq file of reads corresponding to the barcodes. Based on the barcodes given, the function extracts all reads for the indexed barcode and writes all the reads from that barcode to separate .fastq files.
demultiplex( barcodeFile, indexFile, readFile, rcBarcodes = TRUE, location = NULL, threads = 1, hammingDist = 0, quiet = TRUE )
File name for a file containing a .tsv matrix with a header row, and then sample names (column 1) and barcodes (column 2).
Location to a .fastq file that contains the barcodes for each read. The headers should be the same (and in the same order) as
readFile, and the sequence in the
indexFileshould be the corresponding barcode for each read. Quality scores are not considered.
Location to the sequencing read .fastq file that corresponds to the
Should the barcode indexes in the barcodeFile be reverse complemented to match the sequences in the
indexFile? Defaults to
A directory location to store the demultiplexed read files. Defaults to generate a new temporary directory.
The number of threads to use for parallelization (BiocParallel). This function will parallelize over the barcodes and extract reads for each barcode separately and write them to separate demultiplexed files.
Uses a Hamming Distance or number of base differences to allow for inexact matches for the barcodes/indexes. Defaults to
0. Warning: if the Hamming Distance is
>=1and this leads to inexact index matches to more than one barcode, that read will be written to more than one demultiplexed read files.
Turns off most messages. Default is
Returns multiple .fastq files that contain all reads whose index matches the barcodes given. These files will be written to the location directory, and will be named based on the given sampleNames and barcodes, e.g. './demultiplex_fastq/SampleName1_GGAATTATCGGT.fastq.gz'
## Get barcode, index, and read data locations barcodePath <- system.file("extdata", "barcodes.txt", package = "MetaScope") indexPath <- system.file("extdata", "virus_example_index.fastq", package = "MetaScope") readPath <- system.file("extdata", "virus_example.fastq", package = "MetaScope") ## Demultiplex demult <- demultiplex(barcodePath, indexPath, readPath, rcBarcodes = FALSE, hammingDist = 2) #> Warning: metadata columns on input DNAStringSet object were dropped demult #> SampleName Barcode NumberOfReads #> 1 CDV TCCACGT 25 #> 2 LaCrosse ACAGGCT 25 #> 3 RSV ATCGTGC 25 #> 4 EboV ACTACAG 25 #> 5 Measles AAGTCGC 25 #> 6 VSV TCTCAGG 25