Scoring¶
Gene set scoring via pyUCell (pipeline step 1).
score_gene_sets
¶
score_gene_sets(
adata: AnnData,
gene_sets: dict[str, list[str]],
*,
max_rank: int = 1500,
chunk_size: int = 1000,
n_jobs: int = -1,
inplace: bool = True,
prefix: str = SCORE_PREFIX,
suffix: str = "",
) -> DataFrame
Score each cell for each gene set using UCell.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
Annotated data matrix (cells x genes). |
required |
gene_sets
|
dict[str, list[str]]
|
Mapping of gene set names to lists of gene symbols. |
required |
max_rank
|
int
|
Rank cap passed to pyUCell (tune to median genes per cell). |
1500
|
chunk_size
|
int
|
Number of cells processed per batch. |
1000
|
n_jobs
|
int
|
Parallelism ( |
-1
|
inplace
|
bool
|
If True (default), scores are stored in |
True
|
prefix
|
str
|
Column name prefix (default |
SCORE_PREFIX
|
suffix
|
str
|
Column name suffix (default |
''
|
Returns:
| Type | Description |
|---|---|
DataFrame with index ``adata.obs_names`` and columns
|
|
``["{prefix}{name}{suffix}" for name in gene_sets]``.
|
|