CORE is an algorithm to generate reproduciable clustering, CORE is first implemented in ascend R package. Here, CORE V2.0 uses bagging analysis to find a stable clustering result and detect rare clusters mixed population.

CORE_bagging(
  mixedpop = NULL,
  bagging_run = 20,
  subsample_proportion = 0.8,
  windows = seq(from = 0.025, to = 1, by = 0.025),
  remove_outlier = c(0),
  nRounds = 1,
  PCA = FALSE,
  nPCs = 20,
  ngenes = 1500,
  log_transform = FALSE
)

Arguments

mixedpop

is a SingleCellExperiment object from the train mixed population.

bagging_run

an integer specifying the number of bagging runs to be computed.

subsample_proportion

a numeric specifying the proportion of the tree to be chosen in subsampling.

windows

a numeric vector specifying the ranges of each window.

remove_outlier

a vector containing IDs for clusters to be removed the default vector contains 0, as 0 is the cluster with singletons.

nRounds

an integer specifying the number rounds to attempt to remove outliers.

PCA

logical specifying if PCA is used before calculating distance matrix.

nPCs

an integer specifying the number of principal components to use.

ngenes

number of genes used for clustering calculations.

log_transform

boolean whether log transform should be computed

Value

a list with clustering results of all iterations, and a selected optimal resolution

Author

Quan Nguyen, 2018-05-11

Examples

day5 <- day_5_cardio_cell_sample cellnames<-colnames(day5$dat5_counts) cluster <-day5$dat5_clusters cellnames <- data.frame('cluster' = cluster, 'cellBarcodes' = cellnames) #day5$dat5_counts needs to be in a matrix format mixedpop2 <-new_summarized_scGPS_object(ExpressionMatrix = day5$dat5_counts, GeneMetadata = day5$dat5geneInfo, CellMetadata = day5$dat5_clusters) test <- CORE_bagging(mixedpop2, remove_outlier = c(0), PCA=FALSE, bagging_run = 2, subsample_proportion = .7)
#> Performing 1 round of filtering
#> Identifying top variable genes
#> Calculating distance matrix
#> Performing hierarchical clustering
#> Finding clustering information
#> No more outliers detected in filtering round 1
#> Identifying top variable genes
#> Calculating distance matrix
#> Performing hierarchical clustering
#> Finding clustering information
#> 500 cells left after filtering
#> Running 2 bagging runs, with 0.7 subsampling...
#> Done clustering, moving to stability calculation...
#> Done finding optimal clustering