alifestd_mark_sample_tips_canopy_polars

alifestd_mark_sample_tips_canopy_polars(phylogeny_df: DataFrame, n_sample: int | None = None, criterion: str | Expr = 'origin_time', *, mark_as: str = 'alifestd_mark_sample_tips_canopy_polars') DataFrame

Mark the n_sample leaves with the largest criterion values.

Adds a boolean column mark_as indicating retained tips.

If n_sample is None, it defaults to the number of leaves that share the maximum value of the criterion column. If n_sample is greater than or equal to the number of leaves in the phylogeny, all leaves are marked. Ties are broken arbitrarily.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_sampleint, optional

Number of tips to mark. If None, defaults to the count of leaves with the maximum criterion value.

criterionstr or polars.Expr, default “origin_time”

Column name or polars expression used to rank leaves. The n_sample leaves with the largest values are marked. Ties are broken arbitrarily.

mark_asstr, default “alifestd_mark_sample_tips_canopy_polars”

Column name for the boolean mark.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column.

ValueError

If criterion is not a column in phylogeny_df.

Returns

polars.DataFrame

The phylogeny with an added boolean mark column.

See Also

alifestd_mark_sample_tips_canopy_asexual :

Pandas-based implementation.