alifestd_mark_sample_tips_lineage_asexual
- alifestd_mark_sample_tips_lineage_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, n_sample: int, mutate: bool = False, seed: int | None = None, *, criterion_delta: str = 'origin_time', criterion_target: str = 'origin_time', progress_wrap: ~typing.Callable = <function <lambda>>, mark_as: str = 'alifestd_mark_sample_tips_lineage_asexual') DataFrame
Mark the n_sample leaves closest to the lineage of a target leaf.
Adds a boolean column
mark_asindicating retained tips.Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each leaf, the most recent common ancestor (MRCA) with the target leaf is identified and the “off-lineage delta” is computed as the absolute difference between the leaf’s criterion_delta value and its MRCA’s criterion_delta value. The n_sample leaves with the smallest off-lineage deltas are marked.
If n_sample is greater than or equal to the number of leaves in the phylogeny, all leaves are marked. Ties in off-lineage delta are broken arbitrarily.
Only supports asexual phylogenies.
Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_sampleint
Number of tips to mark.
- mutatebool, default False
Are side effects on the input argument phylogeny_df allowed?
- seedint, optional
Random seed for reproducible target-leaf selection when there are ties in criterion_target.
- criterion_deltastr, default “origin_time”
Column name used to compute the off-lineage delta for each leaf.
- criterion_targetstr, default “origin_time”
Column name used to select the target leaf.
- progress_wrapCallable, optional
Pass tqdm or equivalent to display a progress bar.
- mark_asstr, default “alifestd_mark_sample_tips_lineage_asexual”
Column name for the boolean mark.
Raises
- ValueError
If criterion_delta or criterion_target is not a column in phylogeny_df.
Returns
- pandas.DataFrame
The phylogeny with an added boolean mark column.