alifestd_downsample_tips_lineage_stratified_asexual
- alifestd_downsample_tips_lineage_stratified_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, n_downsample: int | None = None, mutate: bool = False, seed: int | None = None, *, criterion_delta: str = 'origin_time', criterion_stratify: str = 'origin_time', criterion_target: str = 'origin_time', n_tips_per_stratum: int = 1, progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame
Retain leaves per stratified group, chosen by proximity to the lineage of a target leaf.
Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each non-target leaf, the most recent common ancestor (MRCA) of that leaf and the target leaf is identified, and the “off-lineage delta” is computed as the absolute difference between that leaf’s criterion_delta value and the MRCA’s criterion_delta value.
Leaves are grouped by their criterion_stratify value. When n_downsample is an integer, stratified values are coarsened by ranking and integer-dividing to form exactly
n_downsample // n_tips_per_stratumgroups. When n_downsample isNone, each distinct stratified value forms its own group (without ranking). Within each group, then_tips_per_stratumleaves with the smallest off-lineage delta are retained.Only supports asexual phylogenies.
Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_downsampleint, optional
Desired number of retained tips. If
None, every distinctcriterion_stratifyvalue forms its own group.- mutatebool, default False
Are side effects on the input argument phylogeny_df allowed?
- seedint, optional
Random seed for reproducible target-leaf selection when there are ties in criterion_target.
- criterion_deltastr, default “origin_time”
Column name used to compute the off-lineage delta for each leaf. The delta is the absolute difference between a leaf’s value and its MRCA’s value in this column.
- criterion_stratifystr, default “origin_time”
Column name used to stratify leaves into groups.
- criterion_targetstr, default “origin_time”
Column name used to select the target leaf. The leaf with the largest value in this column is chosen as the target. Note that ties are broken by random sample, allowing a seed to be provided.
- n_tips_per_stratumint, default 1
Number of tips to retain per stratified group. Must evenly divide
n_downsamplewhenn_downsampleis notNone.- progress_wrapCallable, optional
Pass tqdm or equivalent to display a progress bar.
Raises
- ValueError
If criterion_delta, criterion_stratify, or criterion_target is not a column in phylogeny_df.
- ValueError
If
n_downsampleis notNoneandn_tips_per_stratumdoes not evenly dividen_downsample.
Returns
- pandas.DataFrame
The pruned phylogeny in alife standard format.