Evaluating Signatures Using Original Models
Xutao WangDepartment of Biostatistics, Boston University, Boston, MA
Some of the gene signatures included in the TBSignatureProfiler were originally trained using a machine learning or statistical model. In order to provide an element of completeness to our package, we have included these models for users to run and compare to the methods that serve as the main mechanism of scoring gene signatures in the TBSP.
This vignette provides some examples to allow users to evaluate certain signatures’ performance using these original models. Currently, the package has incorporated the original methods for the gene signatures listed in the code chunk below. The specific genes within each biomarker can be found by calling that gene within the
TBsignatures data object.
library(TBSignatureProfiler) signatureOriginalModel <- c("Anderson_42", "Anderson_OD_51", "Kaforou_27", "Kaforou_OD_44", "Kaforou_OD_53", "Sweeney_OD_3", "Maertzdorf_4", "Maertzdorf_15", "LauxdaCosta_OD_3", "Verhagen_10", "Jacobsen_3", "Sambarey_HIV_10", "Leong_24", "Berry_OD_86", "Berry_393", "Bloom_OD_144", "Suliman_RISK_4", "Zak_RISK_16", "Leong_RISK_29", "Zhao_NANO_6")
In this tutorial, we will work with HIV and Tuberculosis (TB) gene expression data in a
SummarizedExperiment format. First, we evaluate the performance of all available TB gene signatures whose original models have been included in the package by setting
geneSignaturesName = "".
# HIV/TB gene expression data, included in the package hivtb_data <- TB_hiv out <- evaluateOriginalModel(input = hivtb_data, geneSignaturesName = "", useAssay = "counts") out$Zak_RISK_16_OriginalModel
Users can also evaluate selected gene signatures based on their preference.
outSub <- evaluateOriginalModel(input = hivtb_data, geneSignaturesName = c("Anderson_42", "Sweeney_OD_3", "Verhagen_10", "Zak_RISK_16"), useAssay = "counts") # The predicted score from each signature can be viewed by calling: colData(outSub)[, paste0(c("Anderson_42", "Sweeney_OD_3", "Verhagen_10", "Zak_RISK_16"), "_OriginalModel")]
The returned object is also of the
SummarizedExperiment. The scores will be returned as a part of the
colData with column names formatted as “Name_Of_Signature_OriginalModel”. The structure of the returned object is the same as the one given by
runTBsigProfiler. At this point, users may now follow the guidance to using the package given in the main package vignette for downstream analysis.