Scoring¶
Gene set scoring via pyUCell (pipeline step 1).
score_gene_sets
¶
score_gene_sets(
adata: AnnData,
gene_sets: dict[str, list[str]],
*,
max_rank: int = 1500,
chunk_size: int = 1000,
n_jobs: int = -1,
inplace: bool = True,
prefix: str = SCORE_PREFIX,
suffix: str = "",
clip_pct: float | tuple[float, float] | None = None,
normalize: bool = False,
) -> DataFrame
Score each cell for each gene set using UCell.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
Annotated data matrix (cells x genes). |
required |
gene_sets
|
dict[str, list[str]]
|
Mapping of gene set names to lists of gene symbols. |
required |
max_rank
|
int
|
Rank cap passed to pyUCell (tune to median genes per cell). |
1500
|
chunk_size
|
int
|
Number of cells processed per batch. |
1000
|
n_jobs
|
int
|
Parallelism ( |
-1
|
inplace
|
bool
|
If True (default), scores are stored in |
True
|
prefix
|
str
|
Column name prefix (default |
SCORE_PREFIX
|
suffix
|
str
|
Column name suffix (default |
''
|
clip_pct
|
float | tuple[float, float] | None
|
Percentile clipping (winsorization), applied per gene set. A single
float (e.g., |
None
|
normalize
|
bool
|
If True, per-gene-set min-max rescaling so min → 0 and max → 1. Applied after clipping. Default False. |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame with index ``adata.obs_names`` and columns
|
|
``["{prefix}{name}{suffix}" for name in gene_sets]``.
|
|
When *clip_pct* or *normalize* are used, the returned values reflect the
|
|
post-processed scores.
|
|
Notes
Missing genes: Genes in a gene set that are not found in
adata.var_names are imputed by pyUCell with worst-case rank
(max_rank), which degrades the signal. A UserWarning is emitted
listing the missing genes so you can verify your gene symbols.
Read-only arrays: After sc.pp.scale() or similar operations,
adata.X may become a read-only numpy array. This function
automatically copies adata.X when it detects a read-only array to
prevent crashes inside pyUCell.
Negative values: UCell is rank-based and designed for raw or
normalized (non-negative) counts. If adata.X contains negative
values (e.g., after sc.pp.scale(zero_center=True)), a
UserWarning is emitted. Consider using adata.raw.to_adata()
or a layer with non-negative values for more meaningful scores.
Source code in src/multiscoresplot/_scoring.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 | |