alifestd_downsample_tips_lineage_stratified_asexual

alifestd_downsample_tips_lineage_stratified_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, n_downsample: int | None = None, mutate: bool = False, seed: int | None = None, *, criterion_delta: str = 'origin_time', criterion_stratify: str = 'origin_time', criterion_target: str = 'origin_time', n_tips_per_stratum: int = 1, progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame

Retain leaves per stratified group, chosen by proximity to the lineage of a target leaf.

Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each non-target leaf, the most recent common ancestor (MRCA) of that leaf and the target leaf is identified, and the “off-lineage delta” is computed as the absolute difference between that leaf’s criterion_delta value and the MRCA’s criterion_delta value.

Leaves are grouped by their criterion_stratify value. When n_downsample is an integer, stratified values are coarsened by ranking and integer-dividing to form exactly n_downsample // n_tips_per_stratum groups. When n_downsample is None, each distinct stratified value forms its own group (without ranking). Within each group, the n_tips_per_stratum leaves with the smallest off-lineage delta are retained.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpandas.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_downsampleint, optional

Desired number of retained tips. If None, every distinct criterion_stratify value forms its own group.

mutatebool, default False

Are side effects on the input argument phylogeny_df allowed?

seedint, optional

Random seed for reproducible target-leaf selection when there are ties in criterion_target.

criterion_deltastr, default “origin_time”

Column name used to compute the off-lineage delta for each leaf. The delta is the absolute difference between a leaf’s value and its MRCA’s value in this column.

criterion_stratifystr, default “origin_time”

Column name used to stratify leaves into groups.

criterion_targetstr, default “origin_time”

Column name used to select the target leaf. The leaf with the largest value in this column is chosen as the target. Note that ties are broken by random sample, allowing a seed to be provided.

n_tips_per_stratumint, default 1

Number of tips to retain per stratified group. Must evenly divide n_downsample when n_downsample is not None.

progress_wrapCallable, optional

Pass tqdm or equivalent to display a progress bar.

Raises

ValueError

If criterion_delta, criterion_stratify, or criterion_target is not a column in phylogeny_df.

ValueError

If n_downsample is not None and n_tips_per_stratum does not evenly divide n_downsample.

Returns

pandas.DataFrame

The pruned phylogeny in alife standard format.