Main clustering SCORE (CORE V2.0) Stable Clustering at Optimal REsolution with bagging and bootstrapping

CORE is an algorithm to generate reproduciable clustering, CORE is first implemented in ascend R package. Here, CORE V2.0 uses bagging analysis to find a stable clustering result and detect rare clusters mixed population.

CORE_bagging(
  mixedpop = NULL,
  bagging_run = 20,
  subsample_proportion = 0.8,
  windows = seq(from = 0.025, to = 1, by = 0.025),
  remove_outlier = c(0),
  nRounds = 1,
  PCA = FALSE,
  nPCs = 20,
  ngenes = 1500,
  log_transform = FALSE
)

Arguments

mixedpop	is a SingleCellExperiment object from the train mixed population.
bagging_run	an integer specifying the number of bagging runs to be computed.
subsample_proportion	a numeric specifying the proportion of the tree to be chosen in subsampling.
windows	a numeric vector specifying the ranges of each window.
remove_outlier	a vector containing IDs for clusters to be removed the default vector contains 0, as 0 is the cluster with singletons.
nRounds	an integer specifying the number rounds to attempt to remove outliers.
PCA	logical specifying if PCA is used before calculating distance matrix.
nPCs	an integer specifying the number of principal components to use.
ngenes	number of genes used for clustering calculations.
log_transform	boolean whether log transform should be computed

Value

a list with clustering results of all iterations, and a selected optimal resolution

Author

Quan Nguyen, 2018-05-11

Examples

day5 <- day_5_cardio_cell_sample
cellnames<-colnames(day5$dat5_counts)
cluster <-day5$dat5_clusters
cellnames <- data.frame('cluster' = cluster, 'cellBarcodes' = cellnames)
#day5$dat5_counts needs to be in a matrix format
mixedpop2 <-new_summarized_scGPS_object(ExpressionMatrix = day5$dat5_counts, 
    GeneMetadata = day5$dat5geneInfo, CellMetadata = day5$dat5_clusters)
test <- CORE_bagging(mixedpop2, remove_outlier = c(0), PCA=FALSE,
    bagging_run = 2, subsample_proportion = .7)
#> Performing 1 round of filtering
#> Identifying top variable genes
#> Calculating distance matrix
#> Performing hierarchical clustering
#> Finding clustering information
#> No more outliers detected in filtering round 1
#> Identifying top variable genes
#> Calculating distance matrix
#> Performing hierarchical clustering
#> Finding clustering information
#> 500 cells left after filtering
#> Running 2 bagging runs, with 0.7 subsampling...
#> Done clustering, moving to stability calculation...
#> Done finding optimal clustering