alifestd_downsample_tips_lineage_stratified_polars
- alifestd_downsample_tips_lineage_stratified_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, n_downsample: int | None = None, seed: int | None = None, *, criterion_delta: str | ~polars.expr.expr.Expr = 'origin_time', criterion_stratify: str | ~polars.expr.expr.Expr = 'origin_time', criterion_target: str | ~polars.expr.expr.Expr = 'origin_time', n_tips_per_stratum: int = 1, progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame
Retain leaves per stratified group, chosen by proximity to the lineage of a target leaf.
Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each non-target leaf, the most recent common ancestor (MRCA) of that leaf and the target leaf is identified, and the “off-lineage delta” is computed as the absolute difference between that leaf’s criterion_delta value and the MRCA’s criterion_delta value.
Leaves are grouped by their criterion_stratify value. When n_downsample is an integer, stratified values are coarsened by ranking and integer-dividing to form exactly
n_downsample // n_tips_per_stratumgroups. When n_downsample isNone, each distinct stratified value forms its own group (without ranking). Within each group, then_tips_per_stratumleaves with the smallest off-lineage delta are retained.Only supports asexual phylogenies.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_downsampleint, optional
Desired number of retained tips. If
None, every distinctcriterion_stratifyvalue forms its own group.- seedint, optional
Random seed for reproducible target-leaf selection when there are ties in criterion_target.
- criterion_deltastr or polars.Expr, default “origin_time”
Column name or polars expression used to compute the off-lineage delta for each leaf. The delta is the absolute difference between a leaf’s value and its MRCA’s value.
- criterion_stratifystr or polars.Expr, default “origin_time”
Column name or polars expression used to stratify leaves into groups.
- criterion_targetstr or polars.Expr, default “origin_time”
Column name or polars expression used to select the target leaf. The leaf with the largest value is chosen as the target. Note that ties are broken by random sample, allowing a seed to be provided.
- n_tips_per_stratumint, default 1
Number of tips to retain per stratified group. Must evenly divide
n_downsamplewhenn_downsampleis notNone.- progress_wrapCallable, optional
Pass tqdm or equivalent to display a progress bar.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.
- ValueError
If criterion_delta, criterion_stratify, or criterion_target is not a column in phylogeny_df.
- ValueError
If
n_downsampleis notNoneandn_tips_per_stratumdoes not evenly dividen_downsample.
Returns
- polars.DataFrame
The pruned phylogeny in alife standard format.
See Also
- alifestd_downsample_tips_lineage_stratified_asexual :
Pandas-based implementation.