Training a haft of all cells to find optimal ElasticNet and LDA models to predict a subpopulation

training(
  genes = NULL,
  cluster_mixedpop1 = NULL,
  mixedpop1 = NULL,
  mixedpop2 = NULL,
  c_selectID = NULL,
  listData = list(),
  out_idx = 1,
  standardize = TRUE,
  trainset_ratio = 0.5,
  LDA_run = FALSE,
  log_transform = FALSE
)

Arguments

genes

a vector of gene names (for ElasticNet shrinkage); gene symbols must be in the same format with gene names in subpop2. Note that genes are listed by the order of importance, e.g. differentially expressed genes that are most significan, so that if the gene list contains too many genes, only the top 500 genes are used.

cluster_mixedpop1

a vector of cluster assignment in mixedpop1

mixedpop1

is a SingleCellExperiment object from the train mixed population

mixedpop2

is a SingleCellExperiment object from the target mixed population

c_selectID

a selected number to specify which subpopulation to be used for training

listData

list to store output in

out_idx

a number to specify index to write results into the list output. This is needed for running bootstrap.

standardize

a logical value specifying whether or not to standardize the train matrix

trainset_ratio

a number specifying the proportion of cells to be part of the training subpopulation

LDA_run

logical, if the LDA run is added to compare to ElasticNet

log_transform

boolean whether log transform should be computed

Value

a list with prediction results written in to the indexed out_idx

Author

Quan Nguyen, 2017-11-25

Examples

c_selectID<-1 out_idx<-1 day2 <- day_2_cardio_cell_sample mixedpop1 <-new_scGPS_object(ExpressionMatrix = day2$dat2_counts, GeneMetadata = day2$dat2geneInfo, CellMetadata = day2$dat2_clusters) day5 <- day_5_cardio_cell_sample mixedpop2 <-new_scGPS_object(ExpressionMatrix = day5$dat5_counts, GeneMetadata = day5$dat5geneInfo, CellMetadata = day5$dat5_clusters) genes <-training_gene_sample genes <-genes$Merged_unique listData <- training(genes, cluster_mixedpop1 = colData(mixedpop1)[, 1], mixedpop1 = mixedpop1, mixedpop2 = mixedpop2, c_selectID, listData =list(), out_idx=out_idx, trainset_ratio = 0.5)
#> Total 224 cells as source subpop
#> Total 366 cells in remaining subpops
#> subsampling 112 cells for training source subpop
#> subsampling 112 cells in remaining subpops for training
#> use 6 genes for training model
#> use 6 genes 224 cells for testing model
#> rename remaining subpops to 2_3
#> there are 112 cells in class 2_3 and 112 cells in class 1
#> removing 1 genes with no variance
#> standardizing prediction/target dataset
#> performning elasticnet model training...
#> extracting deviance and best gene features...
#> lambda min is at location 17
#> the leave-out cells in the source subpop is 112
#> use 112 target subpops cells for leave-out test set
#> standardizing the leave-out target and source subpops...
#> start ElasticNet prediction for estimating accuracy...
#> evaluation accuracy ElasticNet 0.660377358490566
names(listData)
#> [1] "Accuracy" "ElasticNetGenes" "Deviance" "ElasticNetFit" #> [5] "LDAFit" "predictor_S1"
listData$Accuracy
#> [[1]] #> [[1]][[1]] #> [[1]][[1]][[1]] #> [1] 140 #> #> [[1]][[1]][[2]] #> [1] 72 #> #> #>