alifestd_downsample_tips_lineage_stratified_polars

alifestd_downsample_tips_lineage_stratified_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, n_downsample: int | None = None, seed: int | None = None, *, criterion_delta: str | ~polars.expr.expr.Expr = 'origin_time', criterion_stratify: str | ~polars.expr.expr.Expr = 'origin_time', criterion_target: str | ~polars.expr.expr.Expr = 'origin_time', n_tips_per_stratum: int = 1, progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame

Retain leaves per stratified group, chosen by proximity to the lineage of a target leaf.

Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each non-target leaf, the most recent common ancestor (MRCA) of that leaf and the target leaf is identified, and the “off-lineage delta” is computed as the absolute difference between that leaf’s criterion_delta value and the MRCA’s criterion_delta value.

Leaves are grouped by their criterion_stratify value. When n_downsample is an integer, stratified values are coarsened by ranking and integer-dividing to form exactly n_downsample // n_tips_per_stratum groups. When n_downsample is None, each distinct stratified value forms its own group (without ranking). Within each group, the n_tips_per_stratum leaves with the smallest off-lineage delta are retained.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_downsampleint, optional

Desired number of retained tips. If None, every distinct criterion_stratify value forms its own group.

seedint, optional

Random seed for reproducible target-leaf selection when there are ties in criterion_target.

criterion_deltastr or polars.Expr, default “origin_time”

Column name or polars expression used to compute the off-lineage delta for each leaf. The delta is the absolute difference between a leaf’s value and its MRCA’s value.

criterion_stratifystr or polars.Expr, default “origin_time”

Column name or polars expression used to stratify leaves into groups.

criterion_targetstr or polars.Expr, default “origin_time”

Column name or polars expression used to select the target leaf. The leaf with the largest value is chosen as the target. Note that ties are broken by random sample, allowing a seed to be provided.

n_tips_per_stratumint, default 1

Number of tips to retain per stratified group. Must evenly divide n_downsample when n_downsample is not None.

progress_wrapCallable, optional

Pass tqdm or equivalent to display a progress bar.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.

ValueError

If criterion_delta, criterion_stratify, or criterion_target is not a column in phylogeny_df.

ValueError

If n_downsample is not None and n_tips_per_stratum does not evenly divide n_downsample.

Returns

polars.DataFrame

The pruned phylogeny in alife standard format.

See Also

alifestd_downsample_tips_lineage_stratified_asexual :

Pandas-based implementation.