CellBender version 0.3.0¶

2023-12-05 21:10:01

cellbender.h5¶

Loaded dataset¶

AnnData object with n_obs × n_vars = 8159 × 36601
    obs: 'background_fraction', 'cell_probability', 'cell_size', 'droplet_efficiency', 'n_raw', 'n_cellbender'
    var: 'ambient_expression', 'feature_type', 'genome', 'gene_id', 'cellbender_analyzed', 'n_raw', 'n_cellbender'
    uns: 'cell_size_lognormal_std', 'empty_droplet_size_lognormal_loc', 'empty_droplet_size_lognormal_scale', 'swapping_fraction_dist_params', 'estimator', 'features_analyzed_inds', 'fraction_data_used_for_testing', 'learning_curve_learning_rate_epoch', 'learning_curve_learning_rate_value', 'learning_curve_test_elbo', 'learning_curve_test_epoch', 'learning_curve_train_elbo', 'learning_curve_train_epoch', 'target_false_positive_rate'
    obsm: 'cellbender_embedding'
    layers: 'raw', 'cellbender'

Examine how many counts were removed in total¶

removed 428626 counts from non-empty droplets
removed 1.11% of the counts in non-empty droplets

Rough estimate of expectations based on nothing but the plot above:
roughly 54476 noise counts should be in non-empty droplets
that is approximately 0.14% of the counts in non-empty droplets
with a false positive rate [FPR] of 1.0%, we would expect to remove about 1.14% of the counts in non-empty droplets

It looks like the algorithm did a great job meeting that expectation.

Assessing convergence of the algorithm¶

The learning curve tells us about the progress of the algorithm in inferring all the latent variables in our model. We want to see the ELBO increasing as training epochs increase. Generally it is desirable for the ELBO to converge at some high plateau, and be fairly stable.

What to watch out for:

1. large downward spikes in the ELBO (of value more than a few hundred) 2. the test ELBO can be smaller than the train ELBO, but generally we want to see both curves increasing and reaching a stable plateau. We do not want the test ELBO to dip way back down at the end. 3. lack of convergence, where it looks like the ELBO would change quite a bit if training went on for more epochs.

Automated assessment --------

We hope to see the test ELBO follow the training ELBO, increasing almost monotonically (though there will be deviations, and that is expected). There may be a large gap, and that is okay. However, this curve ends with a low test ELBO compared to the max test ELBO value during training. The output could be suboptimal.

Summary:

This is slightly unusual behavior, and a reduced --learning-rate might be indicated. Consider re-running with half the current learning rate to compare the results.

Examine count removal per gene¶

Pearson correlation coefficient for the above is 0.9317

This meets expectations.

Table of top genes removed¶

Ranked by fraction removed, and excluding genes with fewer than 2171 total raw counts (90th percentile)

	ambient_expression	feature_type	genome	gene_id	cellbender_analyzed	n_raw	n_cellbender	n_removed	fraction_removed	fraction_remaining	n_raw_cells	n_cellbender_cells	n_removed_cells	fraction_removed_cells	fraction_remaining_cells
gene_name
MT-ND3	0.004209	Gene Expression	GRCh38	ENSG00000198840	True	5977	4868	1109	0.185545	0.814455	5154	4868	286	0.055491	0.944509
MT-ATP6	0.006080	Gene Expression	GRCh38	ENSG00000198899	True	9017	7382	1635	0.181324	0.818676	7804	7382	422	0.054075	0.945925
MT-CYB	0.004366	Gene Expression	GRCh38	ENSG00000198727	True	6383	5241	1142	0.178913	0.821087	5544	5241	303	0.054654	0.945346
MT-CO2	0.010887	Gene Expression	GRCh38	ENSG00000198712	True	16173	13326	2847	0.176034	0.823966	14060	13326	734	0.052205	0.947795
MT-ND2	0.004485	Gene Expression	GRCh38	ENSG00000198763	True	6975	5799	1176	0.168602	0.831398	6118	5799	319	0.052141	0.947859
MT-CO1	0.011273	Gene Expression	GRCh38	ENSG00000198804	True	17899	15014	2885	0.161182	0.838818	15808	15014	794	0.050228	0.949772
MT-ND1	0.004616	Gene Expression	GRCh38	ENSG00000198888	True	7543	6334	1209	0.160281	0.839719	6666	6334	332	0.049805	0.950195
MT-CO3	0.007446	Gene Expression	GRCh38	ENSG00000198938	True	12440	10462	1978	0.159003	0.840997	10995	10462	533	0.048477	0.951523
MT-ND5	0.001510	Gene Expression	GRCh38	ENSG00000198786	True	2656	2235	421	0.158509	0.841491	2357	2235	122	0.051761	0.948239
MT-ND4	0.005869	Gene Expression	GRCh38	ENSG00000198886	True	11474	9935	1539	0.134129	0.865871	10361	9935	426	0.041116	0.958884

Cell probabilities¶

The inferred posterior probability that each droplet is non-empty.

We sometimes write "non-empty" instead of "cell" because dead cells and other cellular debris can still lead to a "non-empty" droplet, which will have a high posterior cell probability. But these kinds of low-quality droplets should be removed during cell QC to retain only high-quality cells for downstream analyses.

Concordance of data before and after `remove-background`¶

The intent is to change the input data as little as possible while achieving noise removal. These plots show general summary statistics about similarity of the input and output data. We expect to see the data lying close to a straight line (gray). There may be outlier genes/features, which are often those highest-expressed in the ambient RNA.

The plots here show data for inferred cell-containing droplets, and exclude the empty droplets.

PCA of encoded gene expression¶

We are not looking for anything specific in the PCA plot of the gene expression embedding, but often we see clusters that correspond to different cell types. If you see only a single large blob, then the dataset might contain only one cell type, or perhaps there are few counts per droplet.

Summary of warnings:¶

Final test ELBO is much lower than the max test ELBO.

CellBender `remove-background` report¶

Input and output files¶

Report¶

CellBender version 0.3.0¶

cellbender.h5¶

Loaded dataset¶

Examine how many counts were removed in total¶

Assessing convergence of the algorithm¶

Examine count removal per gene¶

Table of top genes removed¶

Cell probabilities¶

Concordance of data before and after `remove-background`¶

PCA of encoded gene expression¶

Summary of warnings:¶

CellBender remove-background report¶

Input and output files¶

Report¶

CellBender version 0.3.0¶

cellbender.h5¶

Loaded dataset¶

Examine how many counts were removed in total¶

Assessing convergence of the algorithm¶

Examine count removal per gene¶

Table of top genes removed¶

Cell probabilities¶

Concordance of data before and after remove-background¶

PCA of encoded gene expression¶

Summary of warnings:¶

CellBender `remove-background` report¶

Concordance of data before and after `remove-background`¶