Normalization of single-cell RNA-seq counts by log(x + 1) or log(1 + x)

AS Booeshaghi, L Pachter - Bioinformatics, 2021 - academic.oup.com
Bioinformatics, 2021academic.oup.com
1 Results The ACE2 receptor, which facilitates entry of SARS-Cov-2 into cells (Zhang et al.,
2020), has become one of the most studied genes in the history of genomics over the past
two months. There are already hundreds of preprints about the gene (Google Scholar), and
it is currently the default gene displayed on the UCSC genome browser (Lee et al., 2020).
Several studies have reported on the expression of ACE2 at single-cell resolution, and
papers have been rife with speculation about implications of differential ACE2 mRNA …
1 Results
The ACE2 receptor, which facilitates entry of SARS-Cov-2 into cells (Zhang et al., 2020), has become one of the most studied genes in the history of genomics over the past two months. There are already hundreds of preprints about the gene (Google Scholar), and it is currently the default gene displayed on the UCSC genome browser (Lee et al., 2020). Several studies have reported on the expression of ACE2 at single-cell resolution, and papers have been rife with speculation about implications of differential ACE2 mRNA abundance for severity of disease. As is common in single-cell RNA-seq, the expression estimates of ACE2 are derived from counts that are filtered and normalized. Figure 1a shows an analysis of ACE2 mRNA in mice lungs (data from (Angelidis et al., 2019)). The expression is computed from cells containing at least one copy of the gene. While single-cell RNA-seq expression data has been modeled with many different distributions (Dadaneh et al., 2020; Van den Berge et al., 2018), for simplicity in illustrating our points we model this count data with a simple Poisson random variable X with parameter λ in order to demonstrate the implications of this restriction. Application of the filter amounts to computing
Oxford University Press