CellBender version 0.3.0¶

2023-12-05 11:57:36

cellbender.h5¶

Loaded dataset¶

AnnData object with n_obs × n_vars = 18310 × 36601
    obs: 'background_fraction', 'cell_probability', 'cell_size', 'droplet_efficiency', 'n_raw', 'n_cellbender'
    var: 'ambient_expression', 'feature_type', 'genome', 'gene_id', 'cellbender_analyzed', 'n_raw', 'n_cellbender'
    uns: 'cell_size_lognormal_std', 'empty_droplet_size_lognormal_loc', 'empty_droplet_size_lognormal_scale', 'swapping_fraction_dist_params', 'estimator', 'features_analyzed_inds', 'fraction_data_used_for_testing', 'learning_curve_learning_rate_epoch', 'learning_curve_learning_rate_value', 'learning_curve_test_elbo', 'learning_curve_test_epoch', 'learning_curve_train_elbo', 'learning_curve_train_epoch', 'target_false_positive_rate'
    obsm: 'cellbender_embedding'
    layers: 'raw', 'cellbender'

Examine how many counts were removed in total¶

removed 1833889 counts from non-empty droplets
removed 3.44% of the counts in non-empty droplets

Rough estimate of expectations based on nothing but the plot above:
roughly 1215847 noise counts should be in non-empty droplets
that is approximately 2.28% of the counts in non-empty droplets
with a false positive rate [FPR] of 1.0%, we would expect to remove about 3.28% of the counts in non-empty droplets

It looks like the algorithm did a great job meeting that expectation.

Assessing convergence of the algorithm¶

The learning curve tells us about the progress of the algorithm in inferring all the latent variables in our model. We want to see the ELBO increasing as training epochs increase. Generally it is desirable for the ELBO to converge at some high plateau, and be fairly stable.

What to watch out for:

1. large downward spikes in the ELBO (of value more than a few hundred) 2. the test ELBO can be smaller than the train ELBO, but generally we want to see both curves increasing and reaching a stable plateau. We do not want the test ELBO to dip way back down at the end. 3. lack of convergence, where it looks like the ELBO would change quite a bit if training went on for more epochs.

Automated assessment --------

Summary:

This learning curve looks normal.

Examine count removal per gene¶

Pearson correlation coefficient for the above is 0.9763

This meets expectations.

Table of top genes removed¶

Ranked by fraction removed, and excluding genes with fewer than 2892 total raw counts (90th percentile)

	ambient_expression	feature_type	genome	gene_id	cellbender_analyzed	n_raw	n_cellbender	n_removed	fraction_removed	fraction_remaining	n_raw_cells	n_cellbender_cells	n_removed_cells	fraction_removed_cells	fraction_remaining_cells
gene_name
S100A8	0.004762	Gene Expression	GRCh38	ENSG00000143546	True	33848	21860	11988	0.354172	0.645828	28784	21860	6924	0.240550	0.759450
S100A9	0.006040	Gene Expression	GRCh38	ENSG00000163220	True	48890	33856	15034	0.307507	0.692493	42397	33856	8541	0.201453	0.798547
RPS29	0.012873	Gene Expression	GRCh38	ENSG00000213741	True	113718	82469	31249	0.274794	0.725206	100510	82469	18041	0.179495	0.820505
CD14	0.000561	Gene Expression	GRCh38	ENSG00000170458	True	5404	3945	1459	0.269985	0.730015	4829	3945	884	0.183061	0.816939
RPL39	0.006768	Gene Expression	GRCh38	ENSG00000198918	True	62573	46089	16484	0.263436	0.736564	55589	46089	9500	0.170897	0.829103
ATP5F1E	0.002720	Gene Expression	GRCh38	ENSG00000124172	True	26589	19994	6595	0.248035	0.751965	23779	19994	3785	0.159174	0.840826
RPL32	0.007683	Gene Expression	GRCh38	ENSG00000144713	True	77471	58777	18694	0.241303	0.758697	69550	58777	10773	0.154896	0.845104
RPS21	0.006728	Gene Expression	GRCh38	ENSG00000171858	True	69437	52964	16473	0.237237	0.762763	62559	52964	9595	0.153375	0.846625
ATP5ME	0.000488	Gene Expression	GRCh38	ENSG00000169020	True	5008	3820	1188	0.237220	0.762780	4495	3820	675	0.150167	0.849833
COX7B	0.000453	Gene Expression	GRCh38	ENSG00000131174	True	4689	3591	1098	0.234165	0.765835	4208	3591	617	0.146625	0.853375

Cell probabilities¶

The inferred posterior probability that each droplet is non-empty.

We sometimes write "non-empty" instead of "cell" because dead cells and other cellular debris can still lead to a "non-empty" droplet, which will have a high posterior cell probability. But these kinds of low-quality droplets should be removed during cell QC to retain only high-quality cells for downstream analyses.

Concordance of data before and after `remove-background`¶

The intent is to change the input data as little as possible while achieving noise removal. These plots show general summary statistics about similarity of the input and output data. We expect to see the data lying close to a straight line (gray). There may be outlier genes/features, which are often those highest-expressed in the ambient RNA.

The plots here show data for inferred cell-containing droplets, and exclude the empty droplets.

PCA of encoded gene expression¶

We are not looking for anything specific in the PCA plot of the gene expression embedding, but often we see clusters that correspond to different cell types. If you see only a single large blob, then the dataset might contain only one cell type, or perhaps there are few counts per droplet.

Summary of warnings:¶

None.

CellBender `remove-background` report¶

Input and output files¶

Report¶

CellBender version 0.3.0¶

cellbender.h5¶

Loaded dataset¶

Examine how many counts were removed in total¶

Assessing convergence of the algorithm¶

Examine count removal per gene¶

Table of top genes removed¶

Cell probabilities¶

Concordance of data before and after `remove-background`¶

PCA of encoded gene expression¶

Summary of warnings:¶

CellBender remove-background report¶

Input and output files¶

Report¶

CellBender version 0.3.0¶

cellbender.h5¶

Loaded dataset¶

Examine how many counts were removed in total¶

Assessing convergence of the algorithm¶

Examine count removal per gene¶

Table of top genes removed¶

Cell probabilities¶

Concordance of data before and after remove-background¶

PCA of encoded gene expression¶

Summary of warnings:¶

CellBender `remove-background` report¶

Concordance of data before and after `remove-background`¶