Skip to contents

This function will read in a .bam or .csv.gz file, annotate the taxonomy and genome names, reduce the mapping ambiguity using a mixture model, and output a .csv file with the results. Currently, it assumes that the genome library/.bam files use NCBI accession names for reference names (rnames in .bam file).

Usage

metascope_id(
  input_file,
  input_type = "csv.gz",
  aligner = "bowtie2",
  NCBI_key = NULL,
  out_dir = dirname(input_file),
  convEM = 1/10000,
  maxitsEM = 25,
  num_species_plot = NULL,
  quiet = TRUE
)

Arguments

input_file

The .bam or .csv.gz file that needs to be identified.

input_type

Extension of file input. Should be either "bam" or "csv.gz". Default is "csv.gz".

aligner

The aligner which was used to create the bam file. Default is "bowtie2" but can also be set to "subread" or "other".

NCBI_key

(character) NCBI Entrez API key. optional. See taxize::use_entrez(). Due to the high number of requests made to NCBI, this function will be less prone to errors if you obtain an NCBI key. You may enter the string as an input or set it as ENTREZ_KEY in .Renviron.

out_dir

The directory to which the .csv output file will be output. Defaults to dirname(input_file).

convEM

The convergence parameter of the EM algorithm. Default set at 1/10000.

maxitsEM

The maximum number of EM iterations, regardless of whether the convEM is below the threshhold. Default set at 50. If set at 0, the algorithm skips the EM step and summarizes the .bam file 'as is'

num_species_plot

The number of genome coverage plots to be saved. Default is NULL, which saves coverage plots for the ten most highly abundant species.

quiet

Turns off most messages. Default is TRUE.

Value

This function returns a .csv file with annotated read counts to genomes with mapped reads. The function itself returns the output .csv file name.

Examples

#### Align reads to reference library and then apply metascope_id()
## Assuming filtered bam files already exist

## Create temporary directory
file_temp <- tempfile()
dir.create(file_temp)

#### Subread aligned bam file

## Create object with path to filtered subread csv.gz file
filt_file <- "subread_target.filtered.csv.gz"
bamPath <- system.file("extdata", filt_file, package = "MetaScope")
file.copy(bamPath, file_temp)
#> [1] TRUE

## Run metascope id with the aligner option set to bowtie2
metascope_id(input_file = file.path(file_temp, filt_file),
             aligner = "subread", num_species_plot = 0,
             input_type = "csv.gz")
#> No ENTREZ API key provided
#>  Get one via taxize::use_entrez()
#> See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
#> # A tibble: 2 × 6
#>   TaxonomyID Genome                              read_…¹ Propo…² readsEM EMPro…³
#>   <chr>      <chr>                                 <int>   <dbl>   <dbl>   <dbl>
#> 1 273036     Staphylococcus aureus RF122, compl…     102    0.85   102.    0.847
#> 2 418127     Staphylococcus aureus subsp. aureu…      18    0.15    18.3   0.153
#> # … with abbreviated variable names ¹​read_count, ²​Proportion, ³​EMProportion

#### Bowtie 2 aligned .csv.gz file

## Create object with path to filtered bowtie2 bam file
bowtie_file <- "bowtie_target.filtered.csv.gz"
bamPath <- system.file("extdata", bowtie_file, package = "MetaScope")
file.copy(bamPath, file_temp)
#> [1] TRUE

## Run metascope id with the aligner option set to bowtie2
metascope_id(file.path(file_temp, bowtie_file), aligner = "bowtie2",
             num_species_plot = 0, input_type = "csv.gz")
#> No ENTREZ API key provided
#>  Get one via taxize::use_entrez()
#> See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
#> # A tibble: 1 × 6
#>   TaxonomyID Genome                         read_count Proport…¹ readsEM EMPro…²
#>   <chr>      <chr>                               <int>     <dbl>   <dbl>   <dbl>
#> 1 11234      Measles virus, complete genome          6         1       6       1
#> # … with abbreviated variable names ¹​Proportion, ²​EMProportion

## Remove temporary directory
unlink(file_temp, recursive = TRUE)