legacy

Functions

alifestd_add_global_root(phylogeny_df[, ...])

Add a new global root node that all existing roots point to.

alifestd_add_global_root_polars(phylogeny_df)

Add a new global root node that all existing roots point to.

alifestd_add_inner_knuckles_asexual(phylogeny_df)

For all inner nodes, add a subtending unifurcation ("knuckle").

alifestd_add_inner_knuckles_polars(phylogeny_df)

For all inner nodes, add a subtending unifurcation ("knuckle").

alifestd_add_inner_leaves(phylogeny_df[, mutate])

Create a zero-length branch with leaf node for each inner node.

alifestd_add_inner_niblings_asexual(phylogeny_df)

For all inner nodes, add a subtending unifurcation, adding a "nibling" leaf as the child of the knuckle.

alifestd_add_inner_niblings_polars(phylogeny_df)

For all inner nodes, add a subtending unifurcation, adding a "nibling" leaf as the child of the knuckle.

alifestd_aggregate_phylogenies(phylogeny_dfs)

Concatenate independent phylogenies, reassigning organism ids to prevent collisions.

alifestd_aggregate_phylogenies_polars(...)

Concatenate independent phylogenies, reassigning organism ids to prevent collisions.

alifestd_as_newick_asexual(phylogeny_df[, ...])

Convert phylogeny dataframe to Newick format.

alifestd_as_newick_polars(phylogeny_df, *[, ...])

Convert phylogeny dataframe to Newick format.

alifestd_assign_contiguous_ids(phylogeny_df)

Reassign so each organism's id corresponds to its row number.

alifestd_assign_contiguous_ids_polars(...)

Reassign so each organism's id corresponds to its row number.

alifestd_assign_root_ancestor_token(...[, ...])

Set root_ancestor_token for "ancestor_list" column.

alifestd_calc_clade_lookback_n_asexual(...)

Find ancestor ids of nodes that are lookback_n nodes away in the phylogeny.

alifestd_calc_clade_lookback_origin_time_delta_asexual(...)

Find ancestor ids of nodes that precede each phylogeny node by at least lookback_origin_time_delta branch distance.

alifestd_calc_clade_trait_count_asexual(...)

Count how many nodes within each clade have a given trait.

alifestd_calc_clade_trait_frequency_asexual(...)

Calculate what fraction of nodes within each clade have a given trait.

alifestd_calc_distance_matrix_asexual(...[, ...])

Calculate pairwise distances between all taxa via their MRCAs.

alifestd_calc_distance_matrix_polars(...[, ...])

Calculate pairwise distances between all taxa via their MRCAs.

alifestd_calc_mrca_id_matrix_asexual(...[, ...])

Calculate the Most Recent Common Ancestor (MRCA) taxon id for each pair of taxa.

alifestd_calc_mrca_id_matrix_asexual_polars(...)

Calculate the Most Recent Common Ancestor (MRCA) taxon id for each pair of taxa.

alifestd_calc_mrca_id_vector_asexual(...[, ...])

Calculate the Most Recent Common Ancestor (MRCA) taxon id for target_id and each other taxon.

alifestd_calc_mrca_id_vector_asexual_polars(...)

Calculate the Most Recent Common Ancestor (MRCA) taxon id for target_id and each other taxon.

alifestd_calc_polytomic_index(phylogeny_df)

Count how many fewer inner nodes are contained in phylogeny than expected if strictly bifurcationg.

alifestd_calc_polytomic_index_polars(...)

Count how many fewer inner nodes are contained in phylogeny than expected if strictly bifurcating.

alifestd_categorize_triplet_asexual(...[, ...])

Assess the topological configuration of three id's in phylogeny_df.

alifestd_check_topological_sensitivity(...)

Return names of columns present in phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations.

alifestd_check_topological_sensitivity_polars(...)

Return names of columns present in phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations.

alifestd_chronological_sort(phylogeny_df[, ...])

Sort rows so all organisms appear in chronological order, default origin_time.

alifestd_chronological_sort_polars(phylogeny_df)

Sort rows so all organisms appear in chronological order, default origin_time.

alifestd_coarsen_dilate_asexual(phylogeny_df, *)

Coarsen a phylogeny by collapsing inner nodes within dilation windows.

alifestd_coarsen_dilate_polars(phylogeny_df, *)

Coarsen a phylogeny by collapsing inner nodes within dilation windows.

alifestd_coarsen_mask(phylogeny_df, mask[, ...])

Pare record to bypass organisms outside mask.

alifestd_coarsen_taxa_asexual(phylogeny_df)

Condense consecutive phylogeny nodes sharing identical trait values, according to values in by column(s).

alifestd_coarsen_taxa_asexual_make_agg(...)

Build per-column aggregation rules for asexual taxa coarsening.

alifestd_coerce_chronological_consistency(...)

For any taxa with origin time preceding its parent's, set origin time to parent's origin time.

alifestd_collapse_trunk_asexual(phylogeny_df)

Collapse entries masked by is_trunk column, keeping only the oldest root.

alifestd_collapse_trunk_polars(phylogeny_df)

Collapse entries masked by is_trunk column, keeping only the oldest root.

alifestd_collapse_unifurcations(phylogeny_df)

Pare record to bypass organisms with one ancestor and one descendant.

alifestd_collapse_unifurcations_polars(...)

Pare record to bypass organisms with one ancestor and one descendant.

alifestd_convert_root_ancestor_token(...[, ...])

Set root_ancestor_token for ancestor_list series.

alifestd_count_children_of_asexual(...[, mutate])

How many taxa are direct descendants of the given parent?

alifestd_count_children_of_polars(...)

How many taxa are direct descendants of the given parent?

alifestd_count_inner_nodes(phylogeny_df[, ...])

Count how many non-leaf nodes are contained in phylogeny.

alifestd_count_inner_nodes_polars(phylogeny_df)

Count how many non-leaf nodes are contained in phylogeny.

alifestd_count_leaf_nodes(phylogeny_df)

How many leaf nodes are contained in phylogeny?

alifestd_count_leaf_nodes_polars(phylogeny_df)

How many leaf nodes are contained in phylogeny?

alifestd_count_polytomies(phylogeny_df)

Count how many inner nodes have more than two descendant nodes.

alifestd_count_polytomies_polars(phylogeny_df)

Count how many inner nodes have more than two descendant nodes.

alifestd_count_root_nodes(phylogeny_df)

How many root nodes are contained in phylogeny?

alifestd_count_root_nodes_polars(phylogeny_df)

How many root nodes are contained in phylogeny?

alifestd_count_unifurcating_roots_asexual(...)

How many root nodes with one child are contained in phylogeny?

alifestd_count_unifurcating_roots_polars(...)

How many root nodes with one child are contained in phylogeny?

alifestd_count_unifurcations(phylogeny_df)

Count how many inner nodes have exactly one descendant node.

alifestd_count_unifurcations_polars(phylogeny_df)

Count how many inner nodes have exactly one descendant node.

alifestd_delete_trunk_asexual(phylogeny_df)

Delete entries masked by is_trunk column.

alifestd_delete_trunk_asexual_polars(...)

Delete entries masked by is_trunk column.

alifestd_delete_unifurcating_roots_asexual(...)

Pare record to bypass root nodes with only one descendant.

alifestd_delete_unifurcating_roots_polars(...)

Pare record to bypass root nodes with only one descendant.

alifestd_downsample_tips_asexual(...[, ...])

Create a subsample phylogeny containing n_downsample tips.

alifestd_downsample_tips_canopy_asexual(...)

Retain the n_downsample leaves with the largest criterion values and prune extinct lineages.

alifestd_downsample_tips_canopy_polars(...)

Retain the n_downsample leaves with the largest criterion values and prune extinct lineages.

alifestd_downsample_tips_clade_asexual(...)

Create a subsample phylogeny containing at most n_downsample tips, comprising a single clade within the original phylogeny.

alifestd_downsample_tips_clade_polars(...[, ...])

Create a subsample phylogeny containing at most n_downsample tips, comprising a single clade within the original phylogeny.

alifestd_downsample_tips_lineage_asexual(...)

Retain the n_downsample leaves closest to the lineage of a target leaf.

alifestd_downsample_tips_lineage_polars(...)

Retain the n_downsample leaves closest to the lineage of a target leaf.

alifestd_downsample_tips_lineage_stratified_asexual(...)

Retain leaves per stratified group, chosen by proximity to the lineage of a target leaf.

alifestd_downsample_tips_lineage_stratified_polars(...)

Retain leaves per stratified group, chosen by proximity to the lineage of a target leaf.

alifestd_downsample_tips_polars(...[, seed])

Create a subsample phylogeny containing n_downsample tips.

alifestd_downsample_tips_uniform_asexual(...)

Create a subsample phylogeny containing n_downsample tips.

alifestd_downsample_tips_uniform_polars(...)

Create a subsample phylogeny containing n_downsample tips.

alifestd_drop_topological_sensitivity(...[, ...])

Drop columns from phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations.

alifestd_drop_topological_sensitivity_polars(...)

Drop columns from phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations.

alifestd_estimate_triplet_distance_asexual(...)

Estimate the triplet distance between two asexual phylogenetic trees in alife sampling sets of three leaf taxa and counting the fraction whose phylogenetic connectivity mismatch between trees.

alifestd_find_chronological_inconsistency(...)

Return the id of a taxon with origin time preceding its parent's, if any are present.

alifestd_find_chronological_inconsistency_polars(...)

Return the id of a taxon with origin time preceding its parent's, if any are present.

alifestd_find_leaf_ids(phylogeny_df)

What ids are not listed in any ancestor_list?

alifestd_find_leaf_ids_polars(phylogeny_df)

What ids are ancestor to no other ids?

alifestd_find_mrca_id_asexual(phylogeny_df, ...)

Find most recent common ancestor of leaf_ids.

alifestd_find_pair_distance_asexual(...[, ...])

Find the pairwise distance between two taxa via their MRCA.

alifestd_find_pair_distance_polars(...[, ...])

Find the pairwise distance between two taxa via their MRCA.

alifestd_find_pair_mrca_id_asexual(...[, ...])

Find the Most Recent Common Ancestor of two taxa.

alifestd_find_pair_mrca_id_polars(...[, ...])

Find the Most Recent Common Ancestor of two taxa.

alifestd_find_root_ids(phylogeny_df)

What ids have an empty ancestor_list?

alifestd_find_root_ids_polars(phylogeny_df)

What ids have an empty ancestor_list?

alifestd_from_avida_spop(spop_text, *[, ...])

Convert Avida .spop population snapshot text to a phylogeny dataframe.

alifestd_from_avida_spop_polars(spop_text, *)

Convert Avida .spop population snapshot text to a phylogeny dataframe.

alifestd_from_newick(newick, *[, ...])

Convert a Newick format string to a phylogeny dataframe.

alifestd_from_newick_polars(newick, *[, ...])

Convert a Newick format string to a phylogeny dataframe.

alifestd_has_compact_ids(phylogeny_df)

Are id values between 0 and len(phylogeny_df), in any order?

alifestd_has_compact_ids_polars(phylogeny_df)

Are id values between 0 and len(phylogeny_df), in any order?

alifestd_has_contiguous_ids(phylogeny_df)

Do organisms ids' correspond to their row number?

alifestd_has_contiguous_ids_polars(phylogeny_df)

Do organisms ids' correspond to their row number?

alifestd_has_increasing_ids(phylogeny_df)

Do offspring have larger id values than ancestors?

alifestd_has_increasing_ids_polars(phylogeny_df)

Do offspring have larger id values than ancestors?

alifestd_has_multiple_roots(phylogeny_df)

Does the phylogeny two or more root organisms?

alifestd_has_multiple_roots_polars(phylogeny_df)

Does the phylogeny have two or more root organisms?

alifestd_is_asexual(phylogeny_df)

Do all organisms in the phylogeny have one or no immediate ancestor?

alifestd_is_asexual_polars(phylogeny_df)

Do all organisms in the phylogeny have one or no immediate ancestor?

alifestd_is_chronologically_ordered(phylogeny_df)

Do any organisms have origin_time`s preceding members of their `ancestor_list?

alifestd_is_chronologically_ordered_polars(...)

Check if all taxa have origin times at or after their ancestor's origin time.

alifestd_is_chronologically_sorted(phylogeny_df)

Do rows appear in chronological order?

alifestd_is_chronologically_sorted_polars(...)

Do rows appear in chronological order?

alifestd_is_sexual(phylogeny_df)

Do any organisms in the phylogeny have than one immediate ancestor?

alifestd_is_sexual_polars(phylogeny_df)

Do any organisms in the phylogeny have more than one immediate ancestor?

alifestd_is_strictly_bifurcating_asexual(...)

Are all organisms listed after members of their ancestor_list?

alifestd_is_strictly_bifurcating_polars(...)

Are all internal nodes strictly bifurcating (exactly 2 children)?

alifestd_is_topologically_sorted(phylogeny_df)

Are all organisms listed after members of their ancestor_list?

alifestd_is_topologically_sorted_polars(...)

Are all organisms listed after members of their ancestor_list?

alifestd_is_ultrametric(phylogeny_df[, ...])

Do all tips share the same origin_time (within atol)?

alifestd_is_ultrametric_polars(phylogeny_df, *)

Do all tips share the same origin_time (within atol)?

alifestd_is_working_format_asexual(phylogeny_df)

Test if phylogeny_df is an asexual phylogeny in working format.

alifestd_is_working_format_polars(phylogeny_df)

Test if phylogeny_df is an asexual phylogeny in working format.

alifestd_join_roots(phylogeny_df[, mutate])

Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id.

alifestd_join_roots_polars(phylogeny_df)

Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id.

alifestd_ladderize_asexual(phylogeny_df[, ...])

Reorder rows so children are sorted by number of descendant leaves, gathering children into contiguous rows.

alifestd_ladderize_polars(phylogeny_df[, ...])

Reorder rows so children are sorted by number of descendant leaves, gathering children into contiguous rows.

alifestd_make_ancestor_id_col(ids, ...)

Translate ancestor ids from a column of singleton `ancestor_list`s into a pure-integer series representation.

alifestd_make_ancestor_id_col_polars(ids, ...)

Translate ancestor ids from a column of singleton `ancestor_list`s into a pure-integer series representation.

alifestd_make_ancestor_list_col(ids, ...[, ...])

Translate a column of integer ancestor id values into alife standard

alifestd_make_ancestor_list_col_polars(ids, ...)

Translate a column of integer ancestor id values into alife standard ancestor_list representation.

alifestd_make_balanced_bifurcating(depth)

Build a perfectly balanced bifurcating tree of given depth.

alifestd_make_balanced_bifurcating_polars(depth)

Build a perfectly balanced bifurcating tree of given depth.

alifestd_make_comb(n_leaves)

Build a comb/caterpillar tree with n_leaves leaves.

alifestd_make_comb_polars(n_leaves)

Build a comb/caterpillar tree with n_leaves leaves.

alifestd_make_edge_split(n_leaves[, seed])

Build a random bifurcating tree via edge-split (PDA) sampling.

alifestd_make_edge_split_polars(n_leaves[, seed])

Build a random bifurcating tree via edge-split (PDA) sampling.

alifestd_make_empty([ancestor_id])

Create an alife standard phylogeny dataframe with zero rows.

alifestd_make_empty_polars([ancestor_id])

Create an alife standard phylogeny dataframe with zero rows.

alifestd_make_leaf_split(n_leaves[, seed])

Build a random bifurcating tree via leaf-split (Yule) sampling.

alifestd_make_leaf_split_polars(n_leaves[, seed])

Build a random bifurcating tree via leaf-split (Yule) sampling.

alifestd_make_star(n_leaves)

Build a star tree with n_leaves leaves.

alifestd_make_star_polars(n_leaves)

Build a star tree with n_leaves leaves.

alifestd_mark_ancestor_origin_time_asexual(...)

Add column ancestor_origin_time.

alifestd_mark_ancestor_origin_time_polars(...)

Add column ancestor_origin_time.

alifestd_mark_clade_duration_asexual(...[, ...])

Add column clade_duration, containing the difference between each the origin_time of each node and the maximum origin_time of its descendants.

alifestd_mark_clade_duration_polars(...[, ...])

Add column clade_duration, containing the difference between each node's origin_time and the maximum origin_time of its descendants.

alifestd_mark_clade_duration_ratio_sister_asexual(...)

Add column clade_duration_ratio_sister, containing the ratio of each clade's duration to that of its sister.

alifestd_mark_clade_duration_ratio_sister_polars(...)

Add column clade_duration_ratio_sister, containing the ratio of each clade's duration to that of its sister.

alifestd_mark_clade_faithpd_asexual(phylogeny_df)

Add column clade_faithpd, containing sum branch length among descendant noes.

alifestd_mark_clade_faithpd_polars(...[, ...])

Add column clade_faithpd, containing sum branch length among descendant nodes.

alifestd_mark_clade_fblr_growth_children_asexual(...)

Add column clade_fblr_growth_children, containing the coefficient of a fblr regression fit to origin times of the leaf descendants of each node.

alifestd_mark_clade_fblr_growth_sister_asexual(...)

Add column clade_fblr_growth_children, containing the coefficient of a fblr regression fit to origin times of this clade's descendant leaves versus those of its sister clade.

alifestd_mark_clade_leafcount_ratio_sister_asexual(...)

Add column clade_leafcount_ratio_sister, containing the ratio of each clade's leaf count to that of its sister.

alifestd_mark_clade_leafcount_ratio_sister_polars(...)

Add column clade_leafcount_ratio_sister, containing the ratio of each clade's leaf count to that of its sister.

alifestd_mark_clade_logistic_growth_children_asexual(...)

Add column clade_logistic_growth_children, containing the coefficient of a logistic regression fit to origin times of the leaf descendants of each node.

alifestd_mark_clade_logistic_growth_sister_asexual(...)

Add column clade_logistic_growth_children, containing the coefficient of a logistic regression fit to origin times of this clade's descendant leaves versus those of its sister clade.

alifestd_mark_clade_nodecount_ratio_sister_asexual(...)

Add column clade_nodecount_ratio_sister, containing the ratio of each clade size to that of its sister.

alifestd_mark_clade_nodecount_ratio_sister_polars(...)

Add column clade_nodecount_ratio_sister, containing the ratio of each clade size to that of its sister.

alifestd_mark_clade_subtended_duration_asexual(...)

Add column clade_subtended_duration, containing the difference between each the origin_time of each node's ancestor and the maximum origin_time of its descendants.

alifestd_mark_clade_subtended_duration_polars(...)

Add column clade_subtended_duration, containing the difference between each node's ancestor's origin_time and the maximum origin_time of its descendants.

alifestd_mark_clade_subtended_duration_ratio_sister_asexual(...)

Add column clade_subtended_duration_ratio_sister, containing the ratio of each clade's subtended duration to that of its sister.

alifestd_mark_clade_subtended_duration_ratio_sister_polars(...)

Add column clade_subtended_duration_ratio_sister, containing the ratio of each clade's subtended duration to that of its sister.

alifestd_mark_colless_index_asexual(phylogeny_df)

Add column colless_index with Colless imbalance index for each subtree.

alifestd_mark_colless_index_corrected_asexual(...)

Add column colless_index_corrected with the corrected Colless index for each subtree.

alifestd_mark_colless_index_corrected_polars(...)

Add column colless_index_corrected with the corrected Colless index for each subtree.

alifestd_mark_colless_index_polars(...[, ...])

Add column colless_index with Colless imbalance index for each subtree.

alifestd_mark_colless_like_index_mdm_asexual(...)

Add column colless_like_index_mdm with Colless-like index using mean deviation from the median (MDM) as dissimilarity.

alifestd_mark_colless_like_index_mdm_polars(...)

Add column colless_like_index_mdm with Colless-like index using mean deviation from the median (MDM) as dissimilarity.

alifestd_mark_colless_like_index_sd_asexual(...)

Add column colless_like_index_sd with Colless-like index using sample standard deviation as dissimilarity.

alifestd_mark_colless_like_index_sd_polars(...)

Add column colless_like_index_sd with Colless-like index using sample standard deviation as dissimilarity.

alifestd_mark_colless_like_index_var_asexual(...)

Add column colless_like_index_var with Colless-like index using sample variance as dissimilarity.

alifestd_mark_colless_like_index_var_polars(...)

Add column colless_like_index_var with Colless-like index using sample variance as dissimilarity.

alifestd_mark_csr_children_asexual(phylogeny_df)

Add column csr_children, a flat array of child ids grouped by parent according to CSR offsets from the csr_offsets column.

alifestd_mark_csr_children_polars(...[, mark_as])

Add column csr_children, a flat array of child ids grouped by parent according to CSR offsets from the csr_offsets column.

alifestd_mark_csr_offsets_asexual(phylogeny_df)

Add column csr_offsets, the CSR offset where each node's children begin in the corresponding csr_children array.

alifestd_mark_csr_offsets_polars(phylogeny_df, *)

Add column csr_offsets, the CSR offset where each node's children begin in the corresponding csr_children array.

alifestd_mark_first_child_id_asexual(...[, ...])

Add column first_child_id, the smallest-id child of each node.

alifestd_mark_first_child_id_polars(...[, ...])

Add column first_child_id, the smallest-id child of each node.

alifestd_mark_is_left_child_asexual(phylogeny_df)

Add column is_left_child, containing for each node whether it is the smaller-id child.

alifestd_mark_is_left_child_polars(...[, ...])

Add column is_left_child, containing for each node whether it is the smaller-id child.

alifestd_mark_is_right_child_asexual(...[, ...])

Add column is_right_child, containing for each node whether it is the larger-id child.

alifestd_mark_is_right_child_polars(...[, ...])

Add column is_right_child, containing for each node whether it is the larger-id child.

alifestd_mark_leaves(phylogeny_df[, mutate, ...])

What rows are ancestor to no other row?

alifestd_mark_leaves_polars(phylogeny_df, *)

Add column is_leaf marking rows that are ancestor to no other row.

alifestd_mark_left_child_asexual(phylogeny_df)

Add column left_child, containing for each node its smallest-id child.

alifestd_mark_left_child_polars(phylogeny_df, *)

Add column left_child_id, containing for each node its smallest-id child.

alifestd_mark_lineage_cummax_asexual(...[, ...])

Add column with maximum of values along each lineage.

alifestd_mark_lineage_cummax_polars(...[, ...])

Add column with maximum of values along each lineage.

alifestd_mark_lineage_cummin_asexual(...[, ...])

Add column with minimum of values along each lineage.

alifestd_mark_lineage_cummin_polars(...[, ...])

Add column with minimum of values along each lineage.

alifestd_mark_lineage_cumprod_asexual(...[, ...])

Add column with cumulative product of values along each lineage.

alifestd_mark_lineage_cumprod_polars(...[, ...])

Add column with cumulative product of values along each lineage.

alifestd_mark_lineage_cumsum_asexual(...[, ...])

Add column with cumulative sum of values along each lineage.

alifestd_mark_lineage_cumsum_polars(...[, ...])

Add column with cumulative sum of values along each lineage.

alifestd_mark_max_descendant_origin_time_asexual(...)

Add column max_descendant_origin_time, excluding self.

alifestd_mark_max_descendant_origin_time_polars(...)

Add column max_descendant_origin_time, excluding self.

alifestd_mark_next_sibling_id_asexual(...[, ...])

Add column next_sibling_id, the next-highest id sharing the same parent.

alifestd_mark_next_sibling_id_polars(...[, ...])

Add column next_sibling_id, the next-highest id sharing the same parent.

alifestd_mark_node_depth_asexual(phylogeny_df)

Add column node_depth, counting the number of nodes between a node and the root.

alifestd_mark_node_depth_polars(phylogeny_df, *)

Add column node_depth, counting the number of nodes between a node and the root.

alifestd_mark_num_children_asexual(phylogeny_df)

Add column num_children, counting for each node the number of nodes it is parent to.

alifestd_mark_num_children_polars(...[, mark_as])

Add column num_children, counting for each node the number of nodes it is parent to.

alifestd_mark_num_descendants_asexual(...[, ...])

Add column num_descendants, excluding self.

alifestd_mark_num_descendants_polars(...[, ...])

Add column num_descendants, excluding self.

alifestd_mark_num_leaves_asexual(phylogeny_df)

Add column num_leaves with count of all descendant leaves, including self if a leaf.

alifestd_mark_num_leaves_polars(phylogeny_df, *)

Add column num_leaves with count of all descendant leaves, including self if a leaf.

alifestd_mark_num_leaves_sibling_asexual(...)

Mark the number of leaves descendant from each node's siblings.

alifestd_mark_num_leaves_sibling_polars(...)

Mark the number of leaves descendant from each node's siblings.

alifestd_mark_num_preceding_leaves_asexual(...)

Add column num_preceding_leaves with count of all leaves occurring before the present node in an inorder traversal.

alifestd_mark_num_preceding_leaves_polars(...)

Add column num_preceding_leaves with count of all leaves occurring before the present node in an inorder traversal.

alifestd_mark_oldest_root(phylogeny_df[, ...])

Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id.

alifestd_mark_oldest_root_polars(phylogeny_df, *)

Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id.

alifestd_mark_origin_time_delta_asexual(...)

Add columns origin_time_delta and ancestor_origin_time.

alifestd_mark_origin_time_delta_polars(...)

Add columns origin_time_delta and ancestor_origin_time.

alifestd_mark_ot_mrca_asexual(phylogeny_df)

Appends columns characterizing the Most Recent Common Ancestor (MRCA) of the entire extant population at each taxon's origin_time.

alifestd_mark_ot_mrca_polars(phylogeny_df, *)

Appends columns characterizing the Most Recent Common Ancestor (MRCA) of the entire extant population at each taxon's origin_time.

alifestd_mark_prev_sibling_id_asexual(...[, ...])

Add column prev_sibling_id, the next-lowest id sharing the same parent.

alifestd_mark_prev_sibling_id_polars(...[, ...])

Add column prev_sibling_id, the next-lowest id sharing the same parent.

alifestd_mark_right_child_asexual(phylogeny_df)

Add column right_child, containing for each node its largest-id child.

alifestd_mark_right_child_polars(phylogeny_df, *)

Add column right_child_id, containing for each node its largest-id child.

alifestd_mark_root_id(phylogeny_df[, ...])

Add column root_id, containing the id of entries' ultimate ancestor.

alifestd_mark_root_id_polars(phylogeny_df, *)

Add column root_id, containing the id of entries' ultimate ancestor.

alifestd_mark_roots(phylogeny_df[, mutate, ...])

Create column is_root to mark rows with no ancestor.

alifestd_mark_roots_polars(phylogeny_df, *)

Create column is_root to mark rows with no ancestor.

alifestd_mark_sackin_index_asexual(phylogeny_df)

Add column sackin_index with Sackin index for each subtree.

alifestd_mark_sackin_index_polars(...[, mark_as])

Add column sackin_index with Sackin index for each subtree.

alifestd_mark_sample_tips_asexual(...[, ...])

Mark a random subsample of n_sample tips.

alifestd_mark_sample_tips_canopy_asexual(...)

Mark the n_sample leaves with the largest criterion values.

alifestd_mark_sample_tips_canopy_polars(...)

Mark the n_sample leaves with the largest criterion values.

alifestd_mark_sample_tips_clade_asexual(...)

Mark tips belonging to a randomly sampled clade of at most n_sample tips.

alifestd_mark_sample_tips_clade_polars(...)

Mark tips belonging to a randomly sampled clade of at most n_sample tips.

alifestd_mark_sample_tips_lineage_asexual(...)

Mark the n_sample leaves closest to the lineage of a target leaf.

alifestd_mark_sample_tips_lineage_polars(...)

Mark the n_sample leaves closest to the lineage of a target leaf.

alifestd_mark_sample_tips_lineage_stratified_asexual(...)

Mark leaves per stratified group, chosen by proximity to the lineage of a target leaf.

alifestd_mark_sample_tips_lineage_stratified_polars(...)

Mark leaves per stratified group, chosen by proximity to the lineage of a target leaf.

alifestd_mark_sample_tips_polars(...[, ...])

Mark a random subsample of n_sample tips.

alifestd_mark_sample_tips_uniform_asexual(...)

Mark a random subsample of n_sample tips.

alifestd_mark_sample_tips_uniform_polars(...)

Mark a random subsample of n_sample tips.

alifestd_mark_sister_asexual(phylogeny_df[, ...])

Add column sister, containing the id of each node's sibling.

alifestd_mark_sister_polars(phylogeny_df, *)

Add column sister_id, containing the id of each node's sibling.

alifestd_mask_descendants_asexual(phylogeny_df)

For given ancestor nodes, create a mask identifying those nodes and all descendants.

alifestd_mask_descendants_polars(...)

For given ancestor nodes, create a mask identifying those nodes and all descendants.

alifestd_mask_monomorphic_clades_asexual(...)

Compute a mask marking "monomorphic" clades where all members with a trait defined value share the same trait value.

alifestd_parse_ancestor_id(ancestor_list_str)

Parse at most a single ancestor id from an ancestor_list field.

alifestd_parse_ancestor_ids(ancestor_list_str)

Parse ancestor ids from an ancestor_list field.

alifestd_pipe_unary_ops(phylogeny_df, *unary_ops)

Pipe a phylogeny DataFrame through a sequence of unary operations.

alifestd_pipe_unary_ops_polars(phylogeny_df, ...)

Pipe a phylogeny DataFrame through a sequence of unary operations.

alifestd_prefix_roots(phylogeny_df, *[, ...])

Add new roots to the phylogeny, prefixing existing roots.

alifestd_prefix_roots_polars(phylogeny_df, *)

Add new roots to the phylogeny, prefixing existing roots.

alifestd_prune_extinct_lineages_asexual(...)

Drop taxa without extant descendants.

alifestd_prune_extinct_lineages_polars(...)

Drop taxa without extant descendants.

alifestd_reroot_at_id_asexual(phylogeny_df, ...)

Reroot phylogeny, preserving topology.

alifestd_reroot_at_id_polars(phylogeny_df, ...)

Reroot phylogeny at specified node id, preserving topology.

alifestd_sample_triplet_comparisons_asexual(...)

Sample triplet comparisons between two asexual phylogenetic trees in alife standard form, creating a DataFrame with the triplet categorizations and comparison results as well as corresponding data from MRCA row within the first tree.

alifestd_screen_trait_defined_clades_fisher_asexual(...)

Perform a screen for trait-defined clades based on Fisher's exact test.

alifestd_screen_trait_defined_clades_fitch_asexual(...)

Perform a maximum parsimony screen for trait-defined clades using Fitch's algorithm.

alifestd_screen_trait_defined_clades_naive_asexual(...)

Perform a naive screen for trait-defined clades.

alifestd_sort_children_asexual(phylogeny_df, ...)

Reorder rows so children are sorted by the given criterion column, gathering children into contiguous rows.

alifestd_sort_children_polars(phylogeny_df, ...)

Reorder rows so children are sorted by the given criterion column, gathering children into contiguous rows.

alifestd_splay_polytomies(phylogeny_df[, mutate])

Use a simple splay strategy to resolve polytomies, converting them into bifurcations.

alifestd_splay_polytomies_polars(phylogeny_df)

Use a simple splay strategy to resolve polytomies, converting them into bifurcations.

alifestd_sum_origin_time_deltas_asexual(...)

Sum differences between taxa origin times and their ancestors' origin time.

alifestd_sum_origin_time_deltas_polars(...)

Sum origin_time_delta values.

alifestd_test_leaves_isomorphic_asexual(df1, ...)

Test if phylogenetic relationships between leaf nodes are topologically isomorphic between two phylogenies.

alifestd_test_leaves_isomorphic_polars(df1, ...)

Test if phylogenetic relationships between leaf nodes are topologically isomorphic between two phylogenies.

alifestd_to_iplotx_pandas(phylogeny_df[, mutate])

Wrap a pandas phylogeny DataFrame for use with iplotx.

alifestd_to_iplotx_polars(phylogeny_df)

Wrap a polars phylogeny DataFrame for use with iplotx.

alifestd_to_working_format(phylogeny_df[, ...])

Re-encode phylogeny_df to facilitate efficient analysis and transformation operations.

alifestd_to_working_format_polars(phylogeny_df)

Re-encode phylogeny_df to facilitate efficient analysis and transformation operations.

alifestd_topological_sensitivity_warned(*, ...)

Decorator that emits a topological sensitivity warning before the wrapped function executes.

alifestd_topological_sensitivity_warned_polars(*, ...)

Decorator that emits a topological sensitivity warning before the wrapped function executes.

alifestd_topological_sort(phylogeny_df[, mutate])

Sort rows so all organisms follow members of their ancestor_list.

alifestd_topological_sort_polars(phylogeny_df)

Sort rows so all organisms follow members of their ancestor_id.

alifestd_try_add_ancestor_id_col(phylogeny_df)

Add an ancestor_id column to the input DataFrame if the phylogeny is asexual and the column does not already exist.

alifestd_try_add_ancestor_id_col_polars(...)

Add an ancestor_id column to the input DataFrame if the phylogeny is asexual and the column does not already exist.

alifestd_try_add_ancestor_list_col(phylogeny_df)

Add an ancestor_list column to the input DataFrame if the column does

alifestd_try_add_ancestor_list_col_polars(...)

Add an ancestor_list column to the input DataFrame if the column does not already exist.

alifestd_ultrametricize(phylogeny_df[, ...])

Adjust tip origin_time values so all tips share the same time.

alifestd_ultrametricize_polars(phylogeny_df, *)

Adjust tip origin_time values so all tips share the same time.

alifestd_unfurl_lineage_asexual(...[, mutate])

List leaf_id and its ancestor id sequence through tree root.

alifestd_unfurl_traversal_inorder_asexual(...)

List id values in semiorder traversal order, with left children visited first.

alifestd_unfurl_traversal_inorder_polars(...)

List node indices in inorder traversal order, with left children visited first.

alifestd_unfurl_traversal_levelorder_asexual(...)

List id values in levelorder (BFS) traversal order.

alifestd_unfurl_traversal_levelorder_polars(...)

List node indices in levelorder (BFS) traversal order.

alifestd_unfurl_traversal_postorder_asexual(...)

List id values in postorder traversal order.

alifestd_unfurl_traversal_postorder_contiguous_asexual(...)

List node indices in DFS postorder traversal order, with subtree contiguity.

alifestd_unfurl_traversal_postorder_contiguous_polars(...)

List node indices in DFS postorder traversal order, with subtree contiguity.

alifestd_unfurl_traversal_postorder_polars(...)

List node indices in postorder traversal order.

alifestd_unfurl_traversal_preorder_asexual(...)

List id values in DFS preorder traversal order.

alifestd_unfurl_traversal_preorder_polars(...)

List node indices in DFS preorder traversal order.

alifestd_unfurl_traversal_semiorder_asexual(...)

List id values in semiorder traversal order.

alifestd_unfurl_traversal_semiorder_polars(...)

List node indices in semiorder traversal order.

alifestd_unfurl_traversal_topological_asexual(...)

List id values in topological traversal order.

alifestd_unfurl_traversal_topological_polars(...)

List node indices in topological traversal order.

alifestd_validate(phylogeny_df[, mutate, ...])

Is the phylogeny compliant to alife data standards?

alifestd_warn_topological_sensitivity(...)

Emit a warning if phylogeny_df contains columns that may be invalidated by topological operations.

alifestd_warn_topological_sensitivity_polars(...)

Emit a warning if phylogeny_df contains columns that may be invalidated by topological operations.

Classes

AlifestdIplotxShimNumpy

Numpy-backed iplotx TreeDataProvider for alife-standard data.

AlifestdIplotxShimPandas

Iplotx TreeDataProvider for pandas alife-standard dataframes.

AlifestdIplotxShimPolars

Iplotx TreeDataProvider for polars alife-standard dataframes.

class AlifestdIplotxShimNumpy

Numpy-backed iplotx TreeDataProvider for alife-standard data.

This class assumes contiguous ids (id == row index) and topologically sorted rows (ancestors appear before descendants).

Parameters

ancestor_idsnp.ndarray

Integer array of ancestor ids; roots satisfy ancestor_ids[i] == i.

namesnp.ndarray, optional

Per-node name strings.

branch_lengthsnp.ndarray, optional

Per-node branch lengths (edge from parent to this node).

__init__(ancestor_ids: ndarray, names: ndarray | None = None, branch_lengths: ndarray | None = None) None[source]
static check_dependencies() bool[source]
static get_branch_length(node: _AlifestdNode) float | None[source]
get_children(node: _AlifestdNode) Sequence[_AlifestdNode][source]
get_leaves(node: _AlifestdNode | None = None) Sequence[_AlifestdNode][source]
get_root() _AlifestdNode[source]
get_subtree(node: _AlifestdNode) AlifestdIplotxShimNumpy[source]
is_rooted() bool[source]
levelorder() Iterable[_AlifestdNode][source]
postorder() Iterable[_AlifestdNode][source]
preorder() Iterable[_AlifestdNode][source]
static tree_type() type[source]
class AlifestdIplotxShimPandas

Iplotx TreeDataProvider for pandas alife-standard dataframes.

The dataframe must be asexual with contiguous ids and topologically sorted rows. An ancestor_id column will be derived from ancestor_list if needed.

Parameters

treepd.DataFrame

Pandas phylogeny dataframe in alife standard format.

mutatebool, default False

If True, allow modification of the input dataframe.

__init__(tree: DataFrame, mutate: bool = False) None[source]
static check_dependencies() bool[source]
static tree_type() type[source]
class AlifestdIplotxShimPolars

Iplotx TreeDataProvider for polars alife-standard dataframes.

The dataframe must be asexual with contiguous ids and topologically sorted rows.

Parameters

treepolars.DataFrame

Polars phylogeny dataframe in alife standard format.

__init__(tree: DataFrame) None[source]
static check_dependencies() bool[source]
static tree_type() type[source]
alifestd_add_global_root(phylogeny_df: DataFrame, mutate: bool = False, root_attrs: Mapping[str, Any] = mappingproxy({})) DataFrame

Add a new global root node that all existing roots point to.

The new root node will have columns id, ancestor_id (if applicable), ancestor_list (if applicable), and any columns specified in root_attrs. All other columns will be NaN for the new root row.

Parameters

phylogeny_dfpd.DataFrame

Phylogeny dataframe in alife standard format.

mutatebool, default False

If True, allows mutation of the input dataframe.

root_attrsMapping[str, Any], default {}

Column values to set on the new global root row, e.g., {"origin_time": 0.0, "taxon_label": "root"}.

Keys "id", "ancestor_id", and "ancestor_list" are reserved and may not be specified; a ValueError is raised if any are present.

Returns

pd.DataFrame

The phylogeny dataframe with a new global root added.

Raises

ValueError

If root_attrs contains reserved keys.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_add_global_root_polars(phylogeny_df: DataFrame) DataFrame

Add a new global root node that all existing roots point to.

alifestd_add_inner_knuckles_asexual(phylogeny_df: DataFrame, mutate: bool = False) DataFrame

For all inner nodes, add a subtending unifurcation (“knuckle”).

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_add_inner_knuckles_polars(phylogeny_df: DataFrame) DataFrame

For all inner nodes, add a subtending unifurcation (“knuckle”).

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny with contiguous ids and topological sort order.

Returns

polars.DataFrame

The phylogeny with knuckle nodes added for each inner node.

See Also

alifestd_add_inner_knuckles_asexual :

Pandas-based implementation.

alifestd_add_inner_leaves(phylogeny_df: DataFrame, mutate: bool = False) DataFrame

Create a zero-length branch with leaf node for each inner node.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_add_inner_niblings_asexual(phylogeny_df: DataFrame, mutate: bool = False) DataFrame

For all inner nodes, add a subtending unifurcation, adding a “nibling” leaf as the child of the knuckle.

Here, “nibling” refers to a leaf that is a neice/nephew of the inner node. If not topologically sorted, a topological sort will be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_add_inner_niblings_polars(phylogeny_df: DataFrame) DataFrame

For all inner nodes, add a subtending unifurcation, adding a “nibling” leaf as the child of the knuckle.

Here, “nibling” refers to a leaf that is a niece/nephew of the inner node.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.

Returns

polars.DataFrame

The phylogeny with inner niblings added.

See Also

alifestd_add_inner_niblings_asexual :

Pandas-based implementation.

alifestd_aggregate_phylogenies(phylogeny_dfs: List[DataFrame], mutate: bool = False) DataFrame

Concatenate independent phylogenies, reassigning organism ids to prevent collisions.

Inputs dataframe are not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_aggregate_phylogenies_polars(phylogeny_dfs: List[DataFrame]) DataFrame

Concatenate independent phylogenies, reassigning organism ids to prevent collisions.

Assumes asexual phylogenies with contiguous ids, topologically sorted, and with an ancestor_id column (not ancestor_list).

See Also

alifestd_aggregate_phylogenies :

Pandas-based implementation.

alifestd_as_newick_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, taxon_label: str | None = None, progress_wrap: ~typing.Callable = <function <lambda>>) str

Convert phylogeny dataframe to Newick format.

Parameters

phylogeny_dfpd.DataFrame

Phylogeny dataframe in Alife standard format.

mutatebool, optional

Allow in-place mutations of the input dataframe, by default False.

taxon_labelstr, optional

Column to use for taxon labels, by default None.

progress_wraptyping.Callable, optional

Pass tqdm or equivalent to display a progress bar.

alifestd_as_newick_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, *, taxon_label: str | None = None, progress_wrap: ~typing.Callable = <function <lambda>>) str

Convert phylogeny dataframe to Newick format.

Parameters

phylogeny_dfpolars.DataFrame

Phylogeny dataframe in Alife standard format.

taxon_labelstr, optional

Column to use for taxon labels, by default None.

progress_wraptyping.Callable, optional

Pass tqdm or equivalent to display a progress bar.

See Also

alifestd_as_newick_asexual :

Pandas-based implementation.

alifestd_assign_contiguous_ids(phylogeny_df: DataFrame, mutate: bool = False) DataFrame

Reassign so each organism’s id corresponds to its row number.

Organisms retain the same row location; only id numbers change. Input dataframe is not mutated by this operation unless mutate True.

alifestd_assign_contiguous_ids_polars(phylogeny_df: DataFrame) DataFrame

Reassign so each organism’s id corresponds to its row number.

Organisms retain the same row location; only id numbers change.

alifestd_assign_root_ancestor_token(phylogeny_df: DataFrame, root_ancestor_token: str, mutate: bool = False) DataFrame

Set root_ancestor_token for “ancestor_list” column.

The option root_ancestor_token will be sandwiched in brackets to create the ancestor list entry for genesis organisms. For example, the token “None” will yield the entry “[None]” and the token “” will yield the entry

alifestd_calc_clade_lookback_n_asexual(phylogeny_df: DataFrame, lookback_n: int, mutate: bool = False) ndarray

Find ancestor ids of nodes that are lookback_n nodes away in the phylogeny.

The root node will be returned if the lookback distance exceeds available nodes.

Returns a numpy array of the same length as the input DataFrame, with array elements as the number of nodes in the clade that have the trait. Returned array matches row order of the input DataFrame.

alifestd_calc_clade_lookback_origin_time_delta_asexual(phylogeny_df: DataFrame, lookback_origin_time_delta: float, mutate: bool = False) ndarray

Find ancestor ids of nodes that precede each phylogeny node by at least lookback_origin_time_delta branch distance.

The root node will be returned if the lookback distance exceeds available nodes.

Returns a numpy array of the same length as the input DataFrame, with array elements as the number of nodes in the clade that have the trait. Returned array matches row order of the input DataFrame.

alifestd_calc_clade_trait_count_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, trait_mask: ndarray) ndarray

Count how many nodes within each clade have a given trait.

Clades are defined as a node and all descendant nodes.

Returns a numpy array of the same length as the input DataFrame, with array elements as the number of nodes in the clade that have the trait. Returned array matches row order of the input DataFrame.

alifestd_calc_clade_trait_frequency_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mask_trait_absent: ndarray, mask_trait_present: ndarray) ndarray

Calculate what fraction of nodes within each clade have a given trait.

Clades are defined as a node and all descendant nodes. The mask_trait_absent parameter can be used to exclude nodes from consideration, for instance &’ing with alifestd_mark_leaves can be used to only consider trait frequency among descendant leaves.

Returns a numpy array of the same length as the input DataFrame, with array elements as the number of nodes in the clade that have the trait. Returned array matches row order of the input DataFrame.

alifestd_calc_distance_matrix_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, criterion: str = 'origin_time', progress_wrap: ~typing.Callable = <function <lambda>>) ndarray

Calculate pairwise distances between all taxa via their MRCAs.

The distance between two taxa is computed as the sum of criterion differences between each taxon and their Most Recent Common Ancestor (MRCA):

distance[i, j] = (criterion[i] - criterion[mrca])
  • (criterion[j] - criterion[mrca])

Taxa sharing no common ancestor will have distance NaN.

Pass tqdm or equivalent as progress_wrap to display a progress bar.

Input dataframe is not mutated by this operation unless mutate set True.

Parameters

phylogeny_dfpd.DataFrame

Phylogeny in alife standard format.

mutatebool, default False

If True, allows in-place modification of phylogeny_df.

criterionstr, default “origin_time”

Column name used to measure distance between taxa and their MRCA.

progress_wrapcallable, optional

Wrapper for progress display (e.g., tqdm).

Returns

np.ndarray

n x n float64 matrix of pairwise distances. Entry [i, j] is NaN when organisms i and j share no common ancestor.

See Also

alifestd_calc_mrca_id_matrix_asexual :

Computes the MRCA id matrix used internally by this function.

alifestd_find_pair_distance_asexual :

Computes distance for a single pair of taxa.

alifestd_calc_distance_matrix_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, *, criterion: str | ~polars.expr.expr.Expr = 'origin_time', progress_wrap: ~typing.Callable = <function <lambda>>) ndarray

Calculate pairwise distances between all taxa via their MRCAs.

The distance between two taxa is computed as the sum of criterion differences between each taxon and their Most Recent Common Ancestor (MRCA):

distance[i, j] = (criterion[i] - criterion[mrca])
  • (criterion[j] - criterion[mrca])

Taxa sharing no common ancestor will have distance NaN.

Pass tqdm or equivalent as progress_wrap to display a progress bar.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny in working format (i.e., topologically sorted with contiguous ids and an ancestor_id column, or an ancestor_list column from which ancestor_id can be derived).

criterionstr or polars.Expr, default “origin_time”

Column name or polars expression used to measure distance between taxa and their MRCA.

progress_wrapcallable, optional

Wrapper for progress display (e.g., tqdm).

Returns

numpy.ndarray

Array of shape (n, n) with dtype float64, containing pairwise distances. Entries are NaN where organisms share no common ancestor.

See Also

alifestd_calc_distance_matrix_asexual :

Pandas-based implementation.

alifestd_calc_mrca_id_matrix_asexual_polars :

Computes the underlying MRCA id matrix.

alifestd_find_pair_distance_polars :

Computes distance for a single pair of taxa.

alifestd_calc_mrca_id_matrix_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, progress_wrap: ~typing.Callable = <function <lambda>>) ndarray

Calculate the Most Recent Common Ancestor (MRCA) taxon id for each pair of taxa.

Taxa sharing no common ancestor will have MRCA id -1.

Pass tqdm or equivalent as progress_wrap to display a progress bar.

Input dataframe is not mutated by this operation unless mutate set True.

alifestd_calc_mrca_id_matrix_asexual_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, *, progress_wrap: ~typing.Callable = <function <lambda>>) ndarray

Calculate the Most Recent Common Ancestor (MRCA) taxon id for each pair of taxa.

Taxa sharing no common ancestor will have MRCA id -1.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny in working format (i.e., topologically sorted with contiguous ids and an ancestor_id column, or an ancestor_list column from which ancestor_id can be derived).

progress_wrapcallable, optional

Wrapper for progress display (e.g., tqdm).

Returns

numpy.ndarray

Array of shape (n, n) with dtype int64, containing MRCA ids for each pair of organisms. Entries are -1 where organisms share no common ancestor.

See Also

alifestd_calc_mrca_id_matrix_asexual :

Pandas-based implementation.

alifestd_calc_mrca_id_vector_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, target_id: int, progress_wrap: ~typing.Callable = <function <lambda>>) ndarray

Calculate the Most Recent Common Ancestor (MRCA) taxon id for target_id and each other taxon.

Taxa sharing no common ancestor will have MRCA id -1.

Pass tqdm or equivalent as progress_wrap to display a progress bar.

Input dataframe is not mutated by this operation unless mutate set True.

alifestd_calc_mrca_id_vector_asexual_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, *, target_id: int, progress_wrap: ~typing.Callable = <function <lambda>>) ndarray

Calculate the Most Recent Common Ancestor (MRCA) taxon id for target_id and each other taxon.

Taxa sharing no common ancestor will have MRCA id -1.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny in working format (i.e., topologically sorted with contiguous ids and an ancestor_id column, or an ancestor_list column from which ancestor_id can be derived).

target_idint

The target organism id to compute MRCA against.

progress_wrapcallable, optional

Wrapper for progress display (e.g., tqdm).

Returns

numpy.ndarray

Array of shape (n,) with dtype int64, containing MRCA ids for each organism with the target. Entries are -1 where organisms share no common ancestor with the target.

See Also

alifestd_calc_mrca_id_vector_asexual :

Pandas-based implementation.

alifestd_calc_polytomic_index(phylogeny_df: DataFrame) int

Count how many fewer inner nodes are contained in phylogeny than expected if strictly bifurcationg.

Excludes unifurcations from calculation.

alifestd_calc_polytomic_index_polars(phylogeny_df: DataFrame) int

Count how many fewer inner nodes are contained in phylogeny than expected if strictly bifurcating.

Excludes unifurcations from calculation.

alifestd_categorize_triplet_asexual(phylogeny_df: DataFrame, triplet_ids: Iterable[int], mutate: bool = False) int

Assess the topological configuration of three id’s in phylogeny_df.

If polytomy, return -1. Else, return index of outgroup id.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

See Also

alifestd_estimate_triplet_distance_asexual alifestd_sample_triplet_comparisons_asexual

alifestd_check_topological_sensitivity(phylogeny_df: DataFrame, *, insert: bool, delete: bool, update: bool) List[str]

Return names of columns present in phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations.

If no such columns exist, returns an empty list.

Parameters

phylogeny_dfpandas.DataFrame

The phylogeny as a dataframe in alife standard format.

insertbool

Whether the operation inserts new nodes.

deletebool

Whether the operation deletes nodes.

updatebool

Whether the operation updates ancestor relationships.

Input dataframe is not mutated by this operation.

See Also

alifestd_check_topological_sensitivity_polars :

Polars-based implementation.

alifestd_check_topological_sensitivity_polars(phylogeny_df: DataFrame | LazyFrame, *, insert: bool, delete: bool, update: bool) List[str]

Return names of columns present in phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations.

Accepts polars DataFrames and LazyFrames.

If no such columns exist, returns an empty list.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

insertbool

Whether the operation inserts new nodes.

deletebool

Whether the operation deletes nodes.

updatebool

Whether the operation updates ancestor relationships.

See Also

alifestd_check_topological_sensitivity :

Pandas-based implementation.

alifestd_chronological_sort(phylogeny_df: DataFrame, how: str = 'origin_time', mutate: bool = False) DataFrame

Sort rows so all organisms appear in chronological order, default origin_time.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_chronological_sort_polars(phylogeny_df: DataFrame, how: str = 'origin_time') DataFrame

Sort rows so all organisms appear in chronological order, default origin_time.

alifestd_coarsen_dilate_asexual(phylogeny_df: DataFrame, *, criterion: str = 'origin_time', dilation: int = 1, mutate: bool = False) DataFrame

Coarsen a phylogeny by collapsing inner nodes within dilation windows.

All inner (non-leaf) nodes with criterion values in the half-open interval [n, n + dilation), where n % dilation == 0, are collapsed to a single inner node at n.

Tip nodes are never moved. The MRCA of two tips may only shift backward (never forward), by at most dilation units, and never across a n % dilation == 0 boundary.

Parameters

phylogeny_dfpd.DataFrame

Input phylogeny in alife standard format.

criterionstr, default “origin_time”

Column whose values define the time axis for dilation.

dilationint

Width of the dilation window. Must be a positive integer.

mutatebool, default False

If True, allow in-place mutation of the input dataframe.

Returns

pd.DataFrame

Coarsened phylogeny in alife standard format.

Raises

NotImplementedError

If input is not topologically sorted with contiguous ids.

ValueError

If dilation is not a positive integer, if criterion is not present in phylogeny_df, or if criterion is "id" or "ancestor_id".

alifestd_coarsen_dilate_polars(phylogeny_df: DataFrame | LazyFrame, *, criterion: str = 'origin_time', dilation: int = 1) DataFrame

Coarsen a phylogeny by collapsing inner nodes within dilation windows.

All inner (non-leaf) nodes with criterion values in the half-open interval [n, n + dilation), where n % dilation == 0, are collapsed to a single inner node at n.

Tip nodes are never moved. The MRCA of two tips may only shift backward (never forward), by at most dilation units, and never across a n % dilation == 0 boundary.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

Input phylogeny in alife standard format.

criterionstr, default “origin_time”

Column whose values define the time axis for dilation.

dilationint

Width of the dilation window. Must be a positive integer.

Returns

polars.DataFrame

Coarsened phylogeny in alife standard format.

Raises

NotImplementedError

If input is not topologically sorted with contiguous ids.

ValueError

If dilation is not a positive integer, if criterion is not present in phylogeny_df, or if criterion is "id" or "ancestor_id".

See Also

alifestd_coarsen_dilate_asexual :

Pandas-based implementation.

alifestd_coarsen_mask(phylogeny_df: ~pandas.core.frame.DataFrame, mask: ~pandas.core.series.Series, mutate: bool = False, progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame

Pare record to bypass organisms outside mask.

The root ancestor token will be adopted from phylogeny_df.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_coarsen_taxa_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, agg: Dict[str, str] | None = None, by: str | Sequence[str]) DataFrame

Condense consecutive phylogeny nodes sharing identical trait values, according to values in by column(s).

The manner in which consecutive nodes with identical traits are condensed may be fine-tuned on a column-by-column basis through the optional agg kwarg, a dict mapping column names to a Pandas GroupBy aggregation operation (e.g., “first”, “min”, “max”, etc.).

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

Dataframe reindexing (e.g., df.index) may be applied.

See Also

alifestd_coarsen_taxa_asexual_make_agg :

Helper function to generate default agg dict, which may be customized before being passed to alifestd_coarsen_taxa_asexual.

alifestd_coarsen_taxa_asexual_make_agg(phylogeny_df: DataFrame, default_agg: str = 'first') Dict[str, str]

Build per-column aggregation rules for asexual taxa coarsening.

Parameters

phylogeny_dfpd.DataFrame

Input phylogeny table.

default_aggstr, default “first”

Aggregation function to apply to any column not in the hard-coded overrides.

Returns

Dict[str, str]

Mapping of column name to aggregation method. Four columns are overridden as follows:

  • “destruction_time”: “last”

  • “is_root”: “first”

  • “origin_time”: “first”

Columns named

  • “ancestor_id”

  • “ancestor_list”

  • “branch_length”

  • “edge_length”

  • “id”

  • “is_leaf”

will be excluded from the result. All other (non-excluded) columns use default_agg.

alifestd_coerce_chronological_consistency(phylogeny_df: DataFrame, mutate: bool = False) DataFrame

For any taxa with origin time preceding its parent’s, set origin time to parent’s origin time.

If an inconsistency is detected, the corrected phylogeny will be returned sorted in topological order.

alifestd_collapse_trunk_asexual(phylogeny_df: DataFrame, mutate: bool = False) DataFrame

Collapse entries masked by is_trunk column, keeping only the oldest root.

Masked entries must be contiguous, meaning that no non-trunk entry can be an ancestor of a trunk entry.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

See Also

alifestd_delete_trunk_asexual

alifestd_collapse_trunk_polars(phylogeny_df: DataFrame) DataFrame

Collapse entries masked by is_trunk column, keeping only the oldest root.

alifestd_collapse_unifurcations(phylogeny_df: DataFrame, mutate: bool = False, root_ancestor_token: str = 'none') DataFrame

Pare record to bypass organisms with one ancestor and one descendant.

May leave a root unifurcation present. See alifestd_delete_unifurcating_roots_asexual.

The option root_ancestor_token will be sandwiched in brackets to create the ancestor list entry for genesis organisms. For example, the token “None” will yield the entry “[None]” and the token “” will yield the entry “[]”. Default “none”.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

See Also

alifestd_collapse_unifurcations_polars :

Polars-based implementation.

alifestd_collapse_unifurcations_polars(phylogeny_df: DataFrame) DataFrame

Pare record to bypass organisms with one ancestor and one descendant.

See Also

alifestd_collapse_unifurcations :

Pandas-based implementation.

alifestd_convert_root_ancestor_token(ancestor_list: Series, root_ancestor_token: str, mutate: bool = False) Series

Set root_ancestor_token for ancestor_list series.

The option root_ancestor_token will be sandwiched in brackets to create the ancestor list entry for genesis organisms. For example, the token “None” will yield the entry “[None]” and the token “” will yield the entry

alifestd_count_children_of_asexual(phylogeny_df: DataFrame, parent: int, mutate: bool = False) int

How many taxa are direct descendants of the given parent?

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_count_children_of_polars(phylogeny_df: DataFrame, parent: int) int

How many taxa are direct descendants of the given parent?

alifestd_count_inner_nodes(phylogeny_df: DataFrame, mutate: bool = False) int

Count how many non-leaf nodes are contained in phylogeny.

alifestd_count_inner_nodes_polars(phylogeny_df: DataFrame) int

Count how many non-leaf nodes are contained in phylogeny.

alifestd_count_leaf_nodes(phylogeny_df: DataFrame) int

How many leaf nodes are contained in phylogeny?

alifestd_count_leaf_nodes_polars(phylogeny_df: DataFrame) int

How many leaf nodes are contained in phylogeny?

alifestd_count_polytomies(phylogeny_df: DataFrame) int

Count how many inner nodes have more than two descendant nodes.

Only supports asexual phylogenies.

alifestd_count_polytomies_polars(phylogeny_df: DataFrame) int

Count how many inner nodes have more than two descendant nodes.

Only supports asexual phylogenies.

alifestd_count_root_nodes(phylogeny_df: DataFrame) int

How many root nodes are contained in phylogeny?

alifestd_count_root_nodes_polars(phylogeny_df: DataFrame) int

How many root nodes are contained in phylogeny?

alifestd_count_unifurcating_roots_asexual(phylogeny_df: DataFrame, mutate: bool = False) int

How many root nodes with one child are contained in phylogeny?

alifestd_count_unifurcating_roots_polars(phylogeny_df: DataFrame) int

How many root nodes with one child are contained in phylogeny?

alifestd_count_unifurcations(phylogeny_df: DataFrame) int

Count how many inner nodes have exactly one descendant node.

Only supports asexual phylogenies.

alifestd_count_unifurcations_polars(phylogeny_df: DataFrame) int

Count how many inner nodes have exactly one descendant node.

Only supports asexual phylogenies.

alifestd_delete_trunk_asexual(phylogeny_df: DataFrame, mutate: bool = False) DataFrame

Delete entries masked by is_trunk column.

Masked entries must be contiguous, meaning that no non-trunk entry can be an ancestor of a trunk entry. Children of deleted entries will become roots.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

See Also

alifestd_collapse_trunk_asexual

alifestd_delete_trunk_asexual_polars(phylogeny_df: DataFrame) DataFrame

Delete entries masked by is_trunk column.

Masked entries must be contiguous, meaning that no non-trunk entry can be an ancestor of a trunk entry. Children of deleted entries will become roots.

See Also

alifestd_collapse_trunk_asexual

alifestd_delete_unifurcating_roots_asexual(phylogeny_df: DataFrame, mutate: bool = False, root_ancestor_token: str = 'none') DataFrame

Pare record to bypass root nodes with only one descendant.

Dataframe reindexing (e.g., df.index) may be applied.

See also alifestd_collapse_unifurcations.

The option root_ancestor_token will be sandwiched in brackets to create the ancestor list entry for genesis organisms. For example, the token “None” will yield the entry “[None]” and the token “” will yield the entry “[]”. Default “none”.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_delete_unifurcating_roots_polars(phylogeny_df: DataFrame) DataFrame

Pare record to bypass root nodes with only one descendant.

alifestd_downsample_tips_asexual(phylogeny_df: DataFrame, n_downsample: int, mutate: bool = False, seed: int | None = None, **kwargs) DataFrame

Create a subsample phylogeny containing n_downsample tips.

If n_downsample is greater than the number of tips in the phylogeny, the whole phylogeny is returned.

Only supports asexual phylogenies.

Deprecated since version 0.6.0: Use alifestd_downsample_tips_uniform_asexual instead.

alifestd_downsample_tips_canopy_asexual(phylogeny_df: DataFrame, n_downsample: int | None = None, mutate: bool = False, criterion: str = 'origin_time') DataFrame

Retain the n_downsample leaves with the largest criterion values and prune extinct lineages.

If n_downsample is None, it defaults to the number of leaves that share the maximum value of the criterion column. If n_downsample is greater than or equal to the number of leaves in the phylogeny, the whole phylogeny is returned. Ties are broken arbitrarily.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpandas.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_downsampleint, optional

Number of tips to retain. If None, defaults to the count of leaves with the maximum criterion value.

mutatebool, default False

Are side effects on the input argument phylogeny_df allowed?

criterionstr, default “origin_time”

Column name used to rank leaves. The n_downsample leaves with the largest values in this column are retained. Ties are broken arbitrarily.

Raises

ValueError

If criterion is not a column in phylogeny_df.

Returns

pandas.DataFrame

The pruned phylogeny in alife standard format.

alifestd_downsample_tips_canopy_polars(phylogeny_df: DataFrame, n_downsample: int | None = None, criterion: str | Expr = 'origin_time') DataFrame

Retain the n_downsample leaves with the largest criterion values and prune extinct lineages.

If n_downsample is None, it defaults to the number of leaves that share the maximum value of the criterion column. If n_downsample is greater than or equal to the number of leaves in the phylogeny, the whole phylogeny is returned. Ties are broken arbitrarily.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_downsampleint, optional

Number of tips to retain. If None, defaults to the count of leaves with the maximum criterion value.

criterionstr or polars.Expr, default “origin_time”

Column name or polars expression used to rank leaves. The n_downsample leaves with the largest values are retained. Ties are broken arbitrarily.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column.

ValueError

If criterion is not a column in phylogeny_df.

Returns

polars.DataFrame

The pruned phylogeny in alife standard format.

See Also

alifestd_downsample_tips_canopy_asexual :

Pandas-based implementation.

alifestd_downsample_tips_clade_asexual(phylogeny_df: DataFrame, n_downsample: int, mutate: bool = False, seed: int | None = None) DataFrame

Create a subsample phylogeny containing at most n_downsample tips, comprising a single clade within the original phylogeny. Candidate clades are sampled proportionally to their size.

If n_downsample is greater than the number of tips in the phylogeny, the whole phylogeny is returned.

Only supports asexual phylogenies.

alifestd_downsample_tips_clade_polars(phylogeny_df: DataFrame, n_downsample: int, seed: int | None = None) DataFrame

Create a subsample phylogeny containing at most n_downsample tips, comprising a single clade within the original phylogeny. Candidate clades are sampled proportionally to their size.

If n_downsample is greater than the number of tips in the phylogeny, the whole phylogeny is returned.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_downsampleint

Number of tips to retain.

seedint, optional

Integer seed for deterministic behavior.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.

Returns

polars.DataFrame

The downsampled phylogeny in alife standard format.

See Also

alifestd_downsample_tips_clade_asexual :

Pandas-based implementation.

alifestd_downsample_tips_lineage_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, n_downsample: int, mutate: bool = False, seed: int | None = None, *, criterion_delta: str = 'origin_time', criterion_target: str = 'origin_time', progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame

Retain the n_downsample leaves closest to the lineage of a target leaf.

Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each leaf, the most recent common ancestor (MRCA) with the target leaf is identified and the “off-lineage delta” is computed as the absolute difference between the leaf’s criterion_delta value and its MRCA’s criterion_delta value. The n_downsample leaves with the smallest off-lineage deltas are retained.

If n_downsample is greater than or equal to the number of leaves in the phylogeny, the whole phylogeny is returned. Ties in off-lineage delta are broken arbitrarily.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpandas.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_downsampleint

Number of tips to retain.

mutatebool, default False

Are side effects on the input argument phylogeny_df allowed?

seedint, optional

Random seed for reproducible target-leaf selection when there are ties in criterion_target.

criterion_deltastr, default “origin_time”

Column name used to compute the off-lineage delta for each leaf. The delta is the absolute difference between a leaf’s value and its MRCA’s value in this column.

criterion_targetstr, default “origin_time”

Column name used to select the target leaf. The leaf with the largest value in this column is chosen as the target. Note that ties are broken by random sample, allowing a seed to be provided.

progress_wrapCallable, optional

Pass tqdm or equivalent to display a progress bar.

Raises

ValueError

If criterion_delta or criterion_target is not a column in phylogeny_df.

Returns

pandas.DataFrame

The pruned phylogeny in alife standard format.

alifestd_downsample_tips_lineage_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, n_downsample: int, seed: int | None = None, *, criterion_delta: str | ~polars.expr.expr.Expr = 'origin_time', criterion_target: str | ~polars.expr.expr.Expr = 'origin_time', progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame

Retain the n_downsample leaves closest to the lineage of a target leaf.

Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each leaf, the most recent common ancestor (MRCA) with the target leaf is identified and the “off-lineage delta” is computed as the absolute difference between the leaf’s criterion_delta value and its MRCA’s criterion_delta value. The n_downsample leaves with the smallest off-lineage deltas are retained.

If n_downsample is greater than or equal to the number of leaves in the phylogeny, the whole phylogeny is returned. Ties in off-lineage delta are broken arbitrarily.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_downsampleint

Number of tips to retain.

seedint, optional

Random seed for reproducible target-leaf selection when there are ties in criterion_target.

criterion_deltastr or polars.Expr, default “origin_time”

Column name or polars expression used to compute the off-lineage delta for each leaf. The delta is the absolute difference between a leaf’s value and its MRCA’s value.

criterion_targetstr or polars.Expr, default “origin_time”

Column name or polars expression used to select the target leaf. The leaf with the largest value is chosen as the target. Note that ties are broken by random sample, allowing a seed to be provided.

progress_wrapCallable, optional

Pass tqdm or equivalent to display a progress bar.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.

ValueError

If criterion_delta or criterion_target is not a column in phylogeny_df.

Returns

polars.DataFrame

The pruned phylogeny in alife standard format.

See Also

alifestd_downsample_tips_lineage_asexual :

Pandas-based implementation.

alifestd_downsample_tips_lineage_stratified_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, n_downsample: int | None = None, mutate: bool = False, seed: int | None = None, *, criterion_delta: str = 'origin_time', criterion_stratify: str = 'origin_time', criterion_target: str = 'origin_time', n_tips_per_stratum: int = 1, progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame

Retain leaves per stratified group, chosen by proximity to the lineage of a target leaf.

Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each non-target leaf, the most recent common ancestor (MRCA) of that leaf and the target leaf is identified, and the “off-lineage delta” is computed as the absolute difference between that leaf’s criterion_delta value and the MRCA’s criterion_delta value.

Leaves are grouped by their criterion_stratify value. When n_downsample is an integer, stratified values are coarsened by ranking and integer-dividing to form exactly n_downsample // n_tips_per_stratum groups. When n_downsample is None, each distinct stratified value forms its own group (without ranking). Within each group, the n_tips_per_stratum leaves with the smallest off-lineage delta are retained.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpandas.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_downsampleint, optional

Desired number of retained tips. If None, every distinct criterion_stratify value forms its own group.

mutatebool, default False

Are side effects on the input argument phylogeny_df allowed?

seedint, optional

Random seed for reproducible target-leaf selection when there are ties in criterion_target.

criterion_deltastr, default “origin_time”

Column name used to compute the off-lineage delta for each leaf. The delta is the absolute difference between a leaf’s value and its MRCA’s value in this column.

criterion_stratifystr, default “origin_time”

Column name used to stratify leaves into groups.

criterion_targetstr, default “origin_time”

Column name used to select the target leaf. The leaf with the largest value in this column is chosen as the target. Note that ties are broken by random sample, allowing a seed to be provided.

n_tips_per_stratumint, default 1

Number of tips to retain per stratified group. Must evenly divide n_downsample when n_downsample is not None.

progress_wrapCallable, optional

Pass tqdm or equivalent to display a progress bar.

Raises

ValueError

If criterion_delta, criterion_stratify, or criterion_target is not a column in phylogeny_df.

ValueError

If n_downsample is not None and n_tips_per_stratum does not evenly divide n_downsample.

Returns

pandas.DataFrame

The pruned phylogeny in alife standard format.

alifestd_downsample_tips_lineage_stratified_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, n_downsample: int | None = None, seed: int | None = None, *, criterion_delta: str | ~polars.expr.expr.Expr = 'origin_time', criterion_stratify: str | ~polars.expr.expr.Expr = 'origin_time', criterion_target: str | ~polars.expr.expr.Expr = 'origin_time', n_tips_per_stratum: int = 1, progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame

Retain leaves per stratified group, chosen by proximity to the lineage of a target leaf.

Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each non-target leaf, the most recent common ancestor (MRCA) of that leaf and the target leaf is identified, and the “off-lineage delta” is computed as the absolute difference between that leaf’s criterion_delta value and the MRCA’s criterion_delta value.

Leaves are grouped by their criterion_stratify value. When n_downsample is an integer, stratified values are coarsened by ranking and integer-dividing to form exactly n_downsample // n_tips_per_stratum groups. When n_downsample is None, each distinct stratified value forms its own group (without ranking). Within each group, the n_tips_per_stratum leaves with the smallest off-lineage delta are retained.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_downsampleint, optional

Desired number of retained tips. If None, every distinct criterion_stratify value forms its own group.

seedint, optional

Random seed for reproducible target-leaf selection when there are ties in criterion_target.

criterion_deltastr or polars.Expr, default “origin_time”

Column name or polars expression used to compute the off-lineage delta for each leaf. The delta is the absolute difference between a leaf’s value and its MRCA’s value.

criterion_stratifystr or polars.Expr, default “origin_time”

Column name or polars expression used to stratify leaves into groups.

criterion_targetstr or polars.Expr, default “origin_time”

Column name or polars expression used to select the target leaf. The leaf with the largest value is chosen as the target. Note that ties are broken by random sample, allowing a seed to be provided.

n_tips_per_stratumint, default 1

Number of tips to retain per stratified group. Must evenly divide n_downsample when n_downsample is not None.

progress_wrapCallable, optional

Pass tqdm or equivalent to display a progress bar.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.

ValueError

If criterion_delta, criterion_stratify, or criterion_target is not a column in phylogeny_df.

ValueError

If n_downsample is not None and n_tips_per_stratum does not evenly divide n_downsample.

Returns

polars.DataFrame

The pruned phylogeny in alife standard format.

See Also

alifestd_downsample_tips_lineage_stratified_asexual :

Pandas-based implementation.

alifestd_downsample_tips_polars(phylogeny_df: DataFrame, n_downsample: int, seed: int | None = None, **kwargs) DataFrame

Create a subsample phylogeny containing n_downsample tips.

If n_downsample is greater than the number of tips in the phylogeny, the whole phylogeny is returned.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_downsampleint

Number of tips to retain.

seedint, optional

Integer seed for deterministic behavior.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column.

Returns

polars.DataFrame

The downsampled phylogeny in alife standard format.

See Also

alifestd_downsample_tips_uniform_polars :

Preferred non-deprecated implementation.

alifestd_downsample_tips_asexual :

Pandas-based implementation.

Deprecated since version 0.6.0: Use alifestd_downsample_tips_uniform_polars instead.

alifestd_downsample_tips_uniform_asexual(phylogeny_df: DataFrame, n_downsample: int, mutate: bool = False, seed: int | None = None) DataFrame

Create a subsample phylogeny containing n_downsample tips.

If n_downsample is greater than the number of tips in the phylogeny, the whole phylogeny is returned.

Only supports asexual phylogenies.

alifestd_downsample_tips_uniform_polars(phylogeny_df: DataFrame, n_downsample: int, seed: int | None = None) DataFrame

Create a subsample phylogeny containing n_downsample tips.

If n_downsample is greater than the number of tips in the phylogeny, the whole phylogeny is returned.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_downsampleint

Number of tips to retain.

seedint, optional

Integer seed for deterministic behavior.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column.

Returns

polars.DataFrame

The downsampled phylogeny in alife standard format.

See Also

alifestd_downsample_tips_uniform_asexual :

Pandas-based implementation.

alifestd_drop_topological_sensitivity(phylogeny_df: DataFrame, mutate: bool = False, *, insert: bool = True, delete: bool = True, update: bool = True) DataFrame

Drop columns from phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations.

Parameters

phylogeny_dfpandas.DataFrame

The phylogeny as a dataframe in alife standard format.

mutatebool, default False

Are side effects on the input argument allowed?

insertbool, default True

Drop columns sensitive to node insertion.

deletebool, default True

Drop columns sensitive to node deletion.

updatebool, default True

Drop columns sensitive to ancestor relationship updates.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

See Also

alifestd_drop_topological_sensitivity_polars :

Polars-based implementation.

alifestd_drop_topological_sensitivity_polars(phylogeny_df: DataFrame, *, insert: bool = True, delete: bool = True, update: bool = True) DataFrame

Drop columns from phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

insertbool, default True

Drop columns sensitive to node insertion.

deletebool, default True

Drop columns sensitive to node deletion.

updatebool, default True

Drop columns sensitive to ancestor relationship updates.

See Also

alifestd_drop_topological_sensitivity :

Pandas-based implementation.

alifestd_estimate_triplet_distance_asexual(first_df: ~pandas.core.frame.DataFrame, second_df: ~pandas.core.frame.DataFrame, taxon_label_key: str, confidence: float = 0.99, precision: float = 0.01, strict: bool | ~typing.Tuple[bool, bool] = True, detail: bool = False, progress_wrap: ~typing.Callable = <function <lambda>>, mutate: bool = False) float | Tuple[float, Tuple[float, float, int]]

Estimate the triplet distance between two asexual phylogenetic trees in alife sampling sets of three leaf taxa and counting the fraction whose phylogenetic connectivity mismatch between trees.

Parameters

first_dfpd.DataFrame

The DataFrame representing the first phylogenetic tree.

second_dfpd.DataFrame

The DataFrame representing the second phylogenetic tree.

taxon_label_keystr

The key in the DataFrame to identify the taxon labels.

confidencefloat, default 0.99

The confidence level for the estimation.

See estimate_binomial_p for details.

precisionfloat, default 0.01

The precision of the estimation.

See estimate_binomial_p for details.

strictbool or Tuple[bool, bool], default True

A flag or a tuple of flags indicating how to treat tuples.

If False, triplets that form a polytomy in either tree are not counted as mismatching. If True, they are counted as mismatching. If a tuple is given, polytomies in the first and second trees are treated according to the first and second elements of the tuple, respectively.

detailbool, default False

If True, returns a detailed result including the estimated distance, confidence interval, and sample size.

progress_wraptyping.Callable, optional

Pass tqdm or equivalent to display a progress bar.

mutatebool, default False

If True, allows mutation of input DataFrames.

Returns

float or Tuple[float, Tuple[float, float, int]]

The estimated distance between the two trees.

If detail is True, returns a tuple containing the estimated distance, the confidence interval, and the sample size.

Notes

The core comparison is done by sampling triplets of taxa, categorizing them, and comparing these categorizations across the two trees, taking into account the strict and lax parameters for handling polytomies. See alifestd_categorize_triplet_asexual for details.

See Also

alifestd_categorize_triplet_asexual alifestd_sample_triplet_comparisons_asexual

alifestd_find_chronological_inconsistency(phylogeny_df: DataFrame) int | None

Return the id of a taxon with origin time preceding its parent’s, if any are present.

alifestd_find_chronological_inconsistency_polars(phylogeny_df: DataFrame) int | None

Return the id of a taxon with origin time preceding its parent’s, if any are present.

alifestd_find_leaf_ids(phylogeny_df: DataFrame) ndarray

What ids are not listed in any ancestor_list?

Input dataframe is not mutated by this operation.

alifestd_find_leaf_ids_polars(phylogeny_df: DataFrame) ndarray

What ids are ancestor to no other ids?

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must have contiguous ids and represent an asexual phylogeny.

Returns

numpy.ndarray

Array of leaf node ids.

See Also

alifestd_find_leaf_ids :

Pandas-based implementation.

alifestd_find_mrca_id_asexual(phylogeny_df: DataFrame, leaf_ids: Iterable[int], mutate: bool = False) int

Find most recent common ancestor of leaf_ids.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_find_pair_distance_asexual(phylogeny_df: DataFrame, first: int, second: int, *, criterion: str = 'origin_time', mutate: bool = False) float | None

Find the pairwise distance between two taxa via their MRCA.

The distance is computed as the sum of criterion differences between each taxon and their Most Recent Common Ancestor (MRCA):

distance = (criterion[first] - criterion[mrca])
  • (criterion[second] - criterion[mrca])

Parameters

phylogeny_dfpd.DataFrame

Phylogeny in alife standard format.

firstint

First taxon id.

secondint

Second taxon id.

criterionstr, default “origin_time”

Column name used to measure distance between taxa and their MRCA.

mutatebool, default False

If True, allows in-place modification of phylogeny_df.

Returns

float or None

The pairwise distance between the two taxa, or None if they have no common ancestor.

See Also

alifestd_find_pair_mrca_id_asexual :

Finds the MRCA id used internally by this function.

alifestd_find_pair_distance_polars :

Polars-based implementation.

alifestd_find_pair_distance_polars(phylogeny_df: DataFrame, first: int, second: int, *, criterion: str | Expr = 'origin_time') float | None

Find the pairwise distance between two taxa via their MRCA.

The distance is computed as the sum of criterion differences between each taxon and their Most Recent Common Ancestor (MRCA):

distance = (criterion[first] - criterion[mrca])
  • (criterion[second] - criterion[mrca])

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.

firstint

First taxon id.

secondint

Second taxon id.

criterionstr or polars.Expr, default “origin_time”

Column name or polars expression used to measure distance between taxa and their MRCA.

Returns

float or None

The pairwise distance between the two taxa, or None if they have no common ancestor.

See Also

alifestd_find_pair_mrca_id_polars :

Finds the MRCA id used internally by this function.

alifestd_find_pair_distance_asexual :

Pandas-based implementation.

alifestd_find_pair_mrca_id_asexual(phylogeny_df: DataFrame, first: int, second: int, *, mutate: bool = False, is_topologically_sorted: bool | None = None, has_contiguous_ids: bool | None = None) int | None

Find the Most Recent Common Ancestor of two taxa.

Parameters

phylogeny_dfpd.DataFrame

Phylogeny in alife standard format.

firstint

First taxon id.

secondint

Second taxon id.

mutatebool, default False

If True, allows in-place modification of phylogeny_df.

is_topologically_sortedbool, optional

If provided, skips the topological sort check. If None (default), the check is performed automatically.

has_contiguous_idsbool, optional

If provided, skips the contiguous ids check. If None (default), the check is performed automatically.

Returns

int or None

The id of the most recent common ancestor, or None if no common ancestor exists.

alifestd_find_pair_mrca_id_polars(phylogeny_df: DataFrame, first: int, second: int, *, is_topologically_sorted: bool | None = None, has_contiguous_ids: bool | None = None) int | None

Find the Most Recent Common Ancestor of two taxa.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.

firstint

First taxon id.

secondint

Second taxon id.

is_topologically_sortedbool, optional

If provided, skips the topological sort check. If None (default), the check is performed automatically.

has_contiguous_idsbool, optional

If provided, skips the contiguous ids check. If None (default), the check is performed automatically.

Returns

int or None

The id of the most recent common ancestor, or None if no common ancestor exists.

See Also

alifestd_find_pair_mrca_id_asexual :

Pandas-based implementation.

alifestd_find_root_ids(phylogeny_df: DataFrame) ndarray

What ids have an empty ancestor_list?

Input dataframe is not mutated by this operation.

alifestd_find_root_ids_polars(phylogeny_df: DataFrame) ndarray

What ids have an empty ancestor_list?

alifestd_from_avida_spop(spop_text: str, *, create_ancestor_list: bool = True, dtype_id: type | None = <class 'numpy.int64'>) DataFrame

Convert Avida .spop population snapshot text to a phylogeny dataframe.

Parses the text content of an Avida .spop (structured population) file and returns a pandas DataFrame in alife standard format.

Parameters

spop_textstr

Full text content of an Avida .spop file.

create_ancestor_listbool, default True

If True, include an ancestor_list column in the result.

dtype_idtype or None, default np.int64

Numpy dtype for the id column. If None, the smallest signed integer dtype is chosen automatically based on the number of rows in the data.

Returns

pd.DataFrame

Phylogeny dataframe in alife standard format.

See Also

alifestd_from_avida_spop_polars :

Polars-based implementation.

Raises

ValueError

If the #format header is missing from the spop text.

alifestd_from_avida_spop_polars(spop_text: str, *, create_ancestor_list: bool = True, dtype_id: DataType | None = Int64) DataFrame

Convert Avida .spop population snapshot text to a phylogeny dataframe.

Parses the text content of an Avida .spop (structured population) file and returns a polars DataFrame in alife standard format.

Parameters

spop_textstr

Full text content of an Avida .spop file.

create_ancestor_listbool, default True

If True, include an ancestor_list column in the result.

dtype_idpl.DataType or None, default pl.Int64

Polars dtype for the id column. If None, the smallest signed integer dtype is chosen automatically based on the number of rows in the data.

Returns

pl.DataFrame

Phylogeny dataframe in alife standard format.

See Also

alifestd_from_avida_spop :

Pandas-based implementation.

Raises

ValueError

If the #format header is missing from the spop text.

alifestd_from_newick(newick: str, *, branch_length_dtype: type = <class 'float'>, create_ancestor_list: bool = False, dtype_id: type | None = <class 'numpy.int64'>) DataFrame

Convert a Newick format string to a phylogeny dataframe.

Parses a Newick tree string and returns a pandas DataFrame in alife standard format with columns: id, ancestor_id, taxon_label, origin_time_delta, and branch_length. Optionally includes ancestor_list.

Parameters

newickstr

A phylogeny in Newick format.

branch_length_dtypetype, default float

Dtype for branch length values. Use int to get nullable integer columns (pd.Int64Dtype). Missing branch lengths will be pd.NA for integer dtypes or NaN for float dtypes.

create_ancestor_listbool, default False

If True, include an ancestor_list column in the result.

dtype_idtype or None, default np.int64

Numpy dtype for the id and ancestor_id columns. If None, the smallest signed integer dtype is chosen automatically based on the number of commas in the Newick string.

Returns

pd.DataFrame

Phylogeny dataframe in alife standard format.

See Also

alifestd_from_newick_polars :

Polars-based implementation.

alifestd_as_newick_asexual :

Inverse conversion, from alife standard to Newick format.

alifestd_from_newick_polars(newick: str, *, branch_length_dtype: type = <class 'float'>, create_ancestor_list: bool = False, dtype_id: ~polars.datatypes.classes.DataType | None = Int64) DataFrame

Convert a Newick format string to a phylogeny dataframe.

Parses a Newick tree string and returns a polars DataFrame in alife standard format with columns: id, ancestor_id, taxon_label, origin_time_delta, and branch_length. Optionally includes ancestor_list.

Parameters

newickstr

A phylogeny in Newick format.

branch_length_dtypetype, default float

Dtype for branch length values. Use int to get nullable integer columns (pl.Int64). Missing branch lengths will be null for integer dtypes or NaN for float dtypes.

create_ancestor_listbool, default False

If True, include an ancestor_list column in the result.

dtype_idpl.DataType or None, default pl.Int64

Polars dtype for the id and ancestor_id columns. If None, the smallest signed integer dtype is chosen automatically based on the number of commas in the Newick string.

Returns

pl.DataFrame

Phylogeny dataframe in alife standard format.

See Also

alifestd_from_newick :

Pandas-based implementation.

alifestd_as_newick_asexual :

Inverse conversion, from alife standard to Newick format.

alifestd_has_compact_ids(phylogeny_df: DataFrame) bool

Are id values between 0 and len(phylogeny_df), in any order?

Input dataframe is not mutated by this operation.

alifestd_has_compact_ids_polars(phylogeny_df: DataFrame) bool

Are id values between 0 and len(phylogeny_df), in any order?

alifestd_has_contiguous_ids(phylogeny_df: DataFrame) bool

Do organisms ids’ correspond to their row number?

Input dataframe is not mutated by this operation.

alifestd_has_contiguous_ids_polars(phylogeny_df: DataFrame) bool

Do organisms ids’ correspond to their row number?

alifestd_has_increasing_ids(phylogeny_df: DataFrame) bool

Do offspring have larger id values than ancestors?

Input dataframe is not mutated by this operation.

alifestd_has_increasing_ids_polars(phylogeny_df: DataFrame) bool

Do offspring have larger id values than ancestors?

Requires ancestor_id column.

alifestd_has_multiple_roots(phylogeny_df: DataFrame) bool

Does the phylogeny two or more root organisms?

Input dataframe is not mutated by this operation.

alifestd_has_multiple_roots_polars(phylogeny_df: DataFrame) bool

Does the phylogeny have two or more root organisms?

alifestd_is_asexual(phylogeny_df: DataFrame) bool

Do all organisms in the phylogeny have one or no immediate ancestor?

Input dataframe is not mutated by this operation.

alifestd_is_asexual_polars(phylogeny_df: DataFrame) bool

Do all organisms in the phylogeny have one or no immediate ancestor?

alifestd_is_chronologically_ordered(phylogeny_df: DataFrame, diagnose: bool = True) bool

Do any organisms have origin_time`s preceding members of their `ancestor_list?

Input dataframe is not mutated by this operation.

alifestd_is_chronologically_ordered_polars(phylogeny_df: DataFrame) bool

Check if all taxa have origin times at or after their ancestor’s origin time.

alifestd_is_chronologically_sorted(phylogeny_df: DataFrame, how: str = 'origin_time') bool

Do rows appear in chronological order?

Defaults to origin_time. Input dataframe is not mutated by this operation.

alifestd_is_chronologically_sorted_polars(phylogeny_df: DataFrame, how: str = 'origin_time') bool

Do rows appear in chronological order?

Defaults to origin_time.

alifestd_is_sexual(phylogeny_df: DataFrame) bool

Do any organisms in the phylogeny have than one immediate ancestor?

Input dataframe is not mutated by this operation.

alifestd_is_sexual_polars(phylogeny_df: DataFrame) bool

Do any organisms in the phylogeny have more than one immediate ancestor?

alifestd_is_strictly_bifurcating_asexual(phylogeny_df: DataFrame, mutate: bool = False) bool

Are all organisms listed after members of their ancestor_list?

Input dataframe is not mutated by this operation.

alifestd_is_strictly_bifurcating_polars(phylogeny_df: DataFrame) bool

Are all internal nodes strictly bifurcating (exactly 2 children)?

alifestd_is_topologically_sorted(phylogeny_df: DataFrame) bool

Are all organisms listed after members of their ancestor_list?

Input dataframe is not mutated by this operation.

alifestd_is_topologically_sorted_polars(phylogeny_df: DataFrame) bool

Are all organisms listed after members of their ancestor_list?

alifestd_is_ultrametric(phylogeny_df: DataFrame, mutate: bool = False, *, atol: float = 0.0) bool

Do all tips share the same origin_time (within atol)?

Tests the peak-to-peak (ptp) range of origin_time among tips against atol. Returns True for empty phylogenies. Raises ValueError if any tip’s origin_time is null/NaN.

Input dataframe is not mutated by this operation unless mutate set True.

alifestd_is_ultrametric_polars(phylogeny_df: DataFrame, *, atol: float = 0.0) bool

Do all tips share the same origin_time (within atol)?

Tests the peak-to-peak (ptp) range of origin_time among tips against atol. Returns True for empty phylogenies. Raises ValueError if any tip’s origin_time is null/NaN. Must represent an asexual phylogeny (when is_leaf is not already present).

alifestd_is_working_format_asexual(phylogeny_df, mutate: bool = False) DataFrame

Test if phylogeny_df is an asexual phylogeny in working format.

The working format is a dataframe with the following properties:
  • topologically sorted (i.e., organisms appear after all ancestors),

  • contiguous ids (i.e., organisms’ ids correspond to row number), and

  • contains an integer datatype ancestor_id column.

alifestd_is_working_format_polars(phylogeny_df: DataFrame) bool

Test if phylogeny_df is an asexual phylogeny in working format.

The working format is a dataframe with the following properties:
  • contains an integer datatype ancestor_id column,

  • topologically sorted (organisms appear after all ancestors), and

  • contiguous ids (organisms’ ids correspond to row number).

alifestd_join_roots(phylogeny_df: DataFrame, mutate: bool = False) DataFrame

Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_join_roots_polars(phylogeny_df: DataFrame) DataFrame

Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id.

alifestd_ladderize_asexual(phylogeny_df: DataFrame, reverse: bool = False, mutate: bool = False) DataFrame

Reorder rows so children are sorted by number of descendant leaves, gathering children into contiguous rows.

By default, subtrees with fewer leaves come first (ascending). Set reverse=True to sort descending (more leaves first).

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

Note: after ladderizing, ids will no longer be contiguous with respect to row indices. Call alifestd_assign_contiguous_ids on the result to reassign contiguous ids if needed.

alifestd_ladderize_polars(phylogeny_df: DataFrame, reverse: bool = False) DataFrame

Reorder rows so children are sorted by number of descendant leaves, gathering children into contiguous rows.

By default, subtrees with fewer leaves come first (ascending). Set reverse=True to sort descending (more leaves first).

Note: after ladderizing, ids will no longer be contiguous with respect to row indices. Call alifestd_assign_contiguous_ids_polars on the result to reassign contiguous ids if needed.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.

reversebool, default False

If True, sort descending (more leaves first).

Returns

polars.DataFrame

The phylogeny with rows reordered in ladderized order.

Raises

NotImplementedError

If ids are not contiguous or rows are not topologically sorted.

See Also

alifestd_ladderize_asexual :

Pandas-based implementation.

alifestd_make_ancestor_id_col(ids: Series, ancestor_lists: Series) Series

Translate ancestor ids from a column of singleton `ancestor_list`s into a pure-integer series representation.

Each organism must have one or zero ancestors (i.e., asexualasexual data). In the returned series, ancestor id will be assigned to own id for no- ancestor organisms.

alifestd_make_ancestor_id_col_polars(ids: Series, ancestor_lists: Series) Series

Translate ancestor ids from a column of singleton `ancestor_list`s into a pure-integer series representation.

Each organism must have one or zero ancestors (i.e., asexual data). In the returned series, ancestor id will be assigned to own id for no-ancestor organisms.

alifestd_make_ancestor_list_col(ids: Series_T, ancestor_ids: Series_T, root_ancestor_token: str = 'none') Series_T
Translate a column of integer ancestor id values into alife standard

ancestor_list representation.

The option root_ancestor_token will be sandwiched in brackets to create the ancestor list entry for genesis organisms. For example, the token “None” will yield the entry “[None]” and the token “” will yield the entry “[]”. Default “none”.

This function also accepts a polars.DataFrame, for which there is a separate delegated implementation.

alifestd_make_ancestor_list_col_polars(ids: Series, ancestor_ids: Series, root_ancestor_token: str = 'none') Series

Translate a column of integer ancestor id values into alife standard ancestor_list representation.

The option root_ancestor_token will be sandwiched in brackets to create the ancestor list entry for genesis organisms. For example, the token “None” will yield the entry “[None]” and the token “” will yield the entry “[]”. Default “none”.

alifestd_make_balanced_bifurcating(depth: int) DataFrame

Build a perfectly balanced bifurcating tree of given depth.

Parameters

depthint

Depth of the tree, where depth=1 is a single root node.

  • depth=0 -> empty tree (no nodes)

  • depth=1 -> 1 node (root only)

  • depth=2 -> 3 nodes (root + 2 leaves)

  • depth=3 -> 7 nodes (4 leaves)

  • depth=4 -> 15 nodes (8 leaves)

Returns

pd.DataFrame

Alife-standard phylogeny dataframe with ‘id’ and ‘ancestor_list’ columns.

Raises

ValueError

If depth is negative.

alifestd_make_balanced_bifurcating_polars(depth: int) DataFrame

Build a perfectly balanced bifurcating tree of given depth.

Parameters

depthint

Depth of the tree, where depth=1 is a single root node.

Returns

pl.DataFrame

Phylogeny dataframe with ‘id’ and ‘ancestor_id’ columns.

alifestd_make_comb(n_leaves: int) DataFrame

Build a comb/caterpillar tree with n_leaves leaves.

Structure (e.g., n_leaves=4):

  0
 / \
1   2
   / \
  3   4
     / \
    5   6

Internal nodes: 0, 2, 4, … Leaves: 1, 3, 5, …

Parameters

n_leavesint

Number of leaf nodes in the resulting tree.

Returns

pd.DataFrame

Alife-standard phylogeny dataframe with ‘id’ and ‘ancestor_list’ columns.

Raises

ValueError

If n_leaves is negative.

alifestd_make_comb_polars(n_leaves: int) DataFrame

Build a comb/caterpillar tree with n_leaves leaves.

Parameters

n_leavesint

Number of leaf nodes in the resulting tree.

Returns

pl.DataFrame

Phylogeny dataframe with ‘id’ and ‘ancestor_id’ columns.

alifestd_make_edge_split(n_leaves: int, seed: int | None = None) DataFrame

Build a random bifurcating tree via edge-split (PDA) sampling.

At each step, a uniformly chosen existing edge is split by inserting a new internal node, with a new leaf attached as its sibling. This produces samples from the Proportional-to-Distinguishable-Arrangements (PDA) distribution over rooted bifurcating tree shapes.

Ids are contiguous but not topologically sorted; inserted internal nodes may have ids greater than some of their descendants. Pass the result through alifestd_topological_sort if topological id order is needed.

Parameters

n_leavesint

Number of leaf nodes in the resulting tree.

seedint, optional

Integer seed for deterministic behavior.

Returns

pd.DataFrame

Alife-standard phylogeny dataframe with ‘id’ and ‘ancestor_list’ columns.

Raises

ValueError

If n_leaves is negative.

alifestd_make_edge_split_polars(n_leaves: int, seed: int | None = None) DataFrame

Build a random bifurcating tree via edge-split (PDA) sampling.

At each step, a uniformly chosen existing edge is split by inserting a new internal node, with a new leaf attached as its sibling. This produces samples from the Proportional-to-Distinguishable-Arrangements (PDA) distribution over rooted bifurcating tree shapes.

Ids are contiguous but not topologically sorted; inserted internal nodes may have ids greater than some of their descendants. Pass the result through alifestd_topological_sort_polars if topological id order is needed.

Parameters

n_leavesint

Number of leaf nodes in the resulting tree.

seedint, optional

Integer seed for deterministic behavior.

Returns

pl.DataFrame

Phylogeny dataframe with ‘id’ and ‘ancestor_id’ columns.

alifestd_make_empty(ancestor_id: bool = False) DataFrame

Create an alife standard phylogeny dataframe with zero rows.

alifestd_make_empty_polars(ancestor_id: bool = True) DataFrame

Create an alife standard phylogeny dataframe with zero rows.

alifestd_make_leaf_split(n_leaves: int, seed: int | None = None) DataFrame

Build a random bifurcating tree via leaf-split (Yule) sampling.

At each step, a uniformly chosen leaf is replaced by an internal node with two new leaf children. This produces samples from the Yule (pure- birth) distribution over rooted bifurcating tree shapes.

Parameters

n_leavesint

Number of leaf nodes in the resulting tree.

seedint, optional

Integer seed for deterministic behavior.

Returns

pd.DataFrame

Alife-standard phylogeny dataframe with ‘id’ and ‘ancestor_list’ columns.

Raises

ValueError

If n_leaves is negative.

alifestd_make_leaf_split_polars(n_leaves: int, seed: int | None = None) DataFrame

Build a random bifurcating tree via leaf-split (Yule) sampling.

At each step, a uniformly chosen leaf is replaced by an internal node with two new leaf children. This produces samples from the Yule (pure- birth) distribution over rooted bifurcating tree shapes.

Parameters

n_leavesint

Number of leaf nodes in the resulting tree.

seedint, optional

Integer seed for deterministic behavior.

Returns

pl.DataFrame

Phylogeny dataframe with ‘id’ and ‘ancestor_id’ columns.

alifestd_make_star(n_leaves: int) DataFrame

Build a star tree with n_leaves leaves.

Structure (e.g., n_leaves=4):

   0
 / | \ \
1  2  3 4

The root (id 0) has every leaf as a direct child.

Parameters

n_leavesint

Number of leaf nodes in the resulting tree.

Returns

pd.DataFrame

Alife-standard phylogeny dataframe with ‘id’ and ‘ancestor_list’ columns.

Raises

ValueError

If n_leaves is negative.

alifestd_make_star_polars(n_leaves: int) DataFrame

Build a star tree with n_leaves leaves.

Structure (e.g., n_leaves=4):

   0
 / | \ \
1  2  3 4

The root (id 0) has every leaf as a direct child.

Parameters

n_leavesint

Number of leaf nodes in the resulting tree.

Returns

pl.DataFrame

Phylogeny dataframe with ‘id’ and ‘ancestor_id’ columns.

alifestd_mark_ancestor_origin_time_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'ancestor_origin_time') DataFrame

Add column ancestor_origin_time.

The output column name can be changed via the mark_as parameter.

Dataframe must provide column origin_time.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_ancestor_origin_time_polars(phylogeny_df: DataFrame, *, mark_as: str = 'ancestor_origin_time') DataFrame

Add column ancestor_origin_time.

The output column name can be changed via the mark_as parameter.

Dataframe must provide column origin_time.

alifestd_mark_clade_duration_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'clade_duration') DataFrame

Add column clade_duration, containing the difference between each the origin_time of each node and the maximum origin_time of its descendants.

The output column name can be changed via the mark_as parameter.

Leaf nodes will have duration 0.

Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_clade_duration_polars(phylogeny_df: DataFrame, *, mark_as: str = 'clade_duration') DataFrame

Add column clade_duration, containing the difference between each node’s origin_time and the maximum origin_time of its descendants.

The output column name can be changed via the mark_as parameter.

Leaf nodes will have duration 0.

alifestd_mark_clade_duration_ratio_sister_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'clade_duration_ratio_sister') DataFrame

Add column clade_duration_ratio_sister, containing the ratio of each clade’s duration to that of its sister.

The output column name can be changed via the mark_as parameter.

Root nodes will have ratio 1, unless also a leaf node. Leaf nodes and leaf-sisters may have ratio inf or NaN.

Tree must be strictly bifurcating.

Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_clade_duration_ratio_sister_polars(phylogeny_df: DataFrame, *, mark_as: str = 'clade_duration_ratio_sister') DataFrame

Add column clade_duration_ratio_sister, containing the ratio of each clade’s duration to that of its sister.

The output column name can be changed via the mark_as parameter.

Tree must be strictly bifurcating.

alifestd_mark_clade_faithpd_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'clade_faithpd') DataFrame

Add column clade_faithpd, containing sum branch length among descendant noes.

The output column name can be changed via the mark_as parameter.

Branch length is defined as the difference between the origin time of the node and the origin time of its ancestor.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_clade_faithpd_polars(phylogeny_df: DataFrame, *, mark_as: str = 'clade_faithpd') DataFrame

Add column clade_faithpd, containing sum branch length among descendant nodes.

The output column name can be changed via the mark_as parameter.

alifestd_mark_clade_fblr_growth_children_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, mark_as: str = 'clade_fblr_growth_children', parallel_backend: str | None = None, progress_wrap: ~typing.Callable = <function <lambda>>, work_mask: ~numpy.ndarray | None = None) DataFrame

Add column clade_fblr_growth_children, containing the coefficient of a fblr regression fit to origin times of the leaf descendants of each node.

The output column name can be changed via the mark_as parameter.

Nodes with left/right child clades with equal growth rates will have value approximately 0.0. If left child clade has greater growth rate, value will be negative. If right child clade has greater growth rate, value will be positive.

Pass “loky” to parallel_backend to use joblib with loky backend.

Leaf nodes will have value NaN. If provided, any nodes not included in work_mask will also have value NaN.

Tree must be strictly bifurcating and single-rooted.

Dataframe reindexing (e.g., df.index) may be applied.

Input phylogeny_df and work_mask are not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

References

Bonetti Franceschi V and Volz E. Phylogenetic signatures reveal

multilevel selection and fitness costs in SARS-CoV-2 [version 2; peer review: 2 approved, 1 approved with reservations]. Wellcome Open Res 2024, 9:85 (https://doi.org/10.12688/wellcomeopenres.20704.2)

Volz, E. Fitness, growth and transmissibility of SARS-CoV-2 genetic

variants. Nat Rev Genet 24, 724-734 (2023). https://doi.org/10.1038/s41576-023-00610-z

Saran NA, Nar F. 2025. Fast binary logistic regression. PeerJ Computer

Science 11:e2579 https://doi.org/10.7717/peerj-cs.2579

alifestd_mark_clade_fblr_growth_sister_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, mark_as: str = 'clade_fblr_growth_sister', parallel_backend: str | None = None, progress_wrap: ~typing.Callable = <function <lambda>>, work_mask: ~numpy.ndarray | None = None) DataFrame

Add column clade_fblr_growth_children, containing the coefficient of a fblr regression fit to origin times of this clade’s descendant leaves versus those of its sister clade.

The output column name can be changed via the mark_as parameter.

Clades with equal growth rate to their sister will have value approximately 0.0. Clades growing faster than their sister clade will have value greater than 0.0. Clades growing slower than their sister clade will have value less than 0.0.

Pass “loky” to parallel_backend to use joblib with loky backend.

Root nodes will have value NaN. If provided, any nodes not included in work_mask will also have value NaN.

Tree must be strictly bifurcating and single-rooted.

Dataframe reindexing (e.g., df.index) may be applied.

Input phylogeny_df and work_mask are not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

References

Bonetti Franceschi V and Volz E. Phylogenetic signatures reveal

multilevel selection and fitness costs in SARS-CoV-2 [version 2; peer review: 2 approved, 1 approved with reservations]. Wellcome Open Res 2024, 9:85 (https://doi.org/10.12688/wellcomeopenres.20704.2)

Volz, E. Fitness, growth and transmissibility of SARS-CoV-2 genetic

variants. Nat Rev Genet 24, 724-734 (2023). https://doi.org/10.1038/s41576-023-00610-z

Saran NA, Nar F. 2025. Fast binary logistic regression. PeerJ Computer

Science 11:e2579 https://doi.org/10.7717/peerj-cs.2579

alifestd_mark_clade_leafcount_ratio_sister_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'clade_leafcount_ratio_sister') DataFrame

Add column clade_leafcount_ratio_sister, containing the ratio of each clade’s leaf count to that of its sister.

The output column name can be changed via the mark_as parameter.

Root nodes will have ratio 1. Tree must be strictly bifurcating.

Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_clade_leafcount_ratio_sister_polars(phylogeny_df: DataFrame, *, mark_as: str = 'clade_leafcount_ratio_sister') DataFrame

Add column clade_leafcount_ratio_sister, containing the ratio of each clade’s leaf count to that of its sister.

The output column name can be changed via the mark_as parameter.

Tree must be strictly bifurcating.

alifestd_mark_clade_logistic_growth_children_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, mark_as: str = 'clade_logistic_growth_children', parallel_backend: str | None = None, progress_wrap: ~typing.Callable = <function <lambda>>, work_mask: ~numpy.ndarray | None = None) DataFrame

Add column clade_logistic_growth_children, containing the coefficient of a logistic regression fit to origin times of the leaf descendants of each node.

The output column name can be changed via the mark_as parameter.

Nodes with left/right child clades with equal growth rates will have value approximately 0.0. If left child clade has greater growth rate, value will be negative. If right child clade has greater growth rate, value will be positive.

Pass “loky” to parallel_backend to use joblib with loky backend.

Leaf nodes will have value NaN. If provided, any nodes not included in work_mask will also have value NaN.

Tree must be strictly bifurcating and single-rooted.

Dataframe reindexing (e.g., df.index) may be applied.

Input phylogeny_df and work_mask are not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

References

Bonetti Franceschi V and Volz E. Phylogenetic signatures reveal

multilevel selection and fitness costs in SARS-CoV-2 [version 2; peer review: 2 approved, 1 approved with reservations]. Wellcome Open Res 2024, 9:85 (https://doi.org/10.12688/wellcomeopenres.20704.2)

Volz, E. Fitness, growth and transmissibility of SARS-CoV-2 genetic

variants. Nat Rev Genet 24, 724-734 (2023). https://doi.org/10.1038/s41576-023-00610-z

alifestd_mark_clade_logistic_growth_sister_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, mark_as: str = 'clade_logistic_growth_sister', parallel_backend: str | None = None, progress_wrap: ~typing.Callable = <function <lambda>>, work_mask: ~numpy.ndarray | None = None) DataFrame

Add column clade_logistic_growth_children, containing the coefficient of a logistic regression fit to origin times of this clade’s descendant leaves versus those of its sister clade.

The output column name can be changed via the mark_as parameter.

Clades with equal growth rate to their sister will have value approximately 0.0. Clades growing faster than their sister clade will have value greater than 0.0. Clades growing slower than their sister clade will have value less than 0.0.

Pass “loky” to parallel_backend to use joblib with loky backend.

Root nodes will have value NaN. If provided, any nodes not included in work_mask will also have value NaN.

Tree must be strictly bifurcating and single-rooted.

Dataframe reindexing (e.g., df.index) may be applied.

Input phylogeny_df and work_mask are not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

References

Bonetti Franceschi V and Volz E. Phylogenetic signatures reveal

multilevel selection and fitness costs in SARS-CoV-2 [version 2; peer review: 2 approved, 1 approved with reservations]. Wellcome Open Res 2024, 9:85 (https://doi.org/10.12688/wellcomeopenres.20704.2)

Volz, E. Fitness, growth and transmissibility of SARS-CoV-2 genetic

variants. Nat Rev Genet 24, 724-734 (2023). https://doi.org/10.1038/s41576-023-00610-z

alifestd_mark_clade_nodecount_ratio_sister_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'clade_nodecount_ratio_sister') DataFrame

Add column clade_nodecount_ratio_sister, containing the ratio of each clade size to that of its sister.

The output column name can be changed via the mark_as parameter.

Root nodes will have ratio 1. Tree must be strictly bifurcating.

Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_clade_nodecount_ratio_sister_polars(phylogeny_df: DataFrame, *, mark_as: str = 'clade_nodecount_ratio_sister') DataFrame

Add column clade_nodecount_ratio_sister, containing the ratio of each clade size to that of its sister.

The output column name can be changed via the mark_as parameter.

Tree must be strictly bifurcating.

alifestd_mark_clade_subtended_duration_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'clade_subtended_duration') DataFrame

Add column clade_subtended_duration, containing the difference between each the origin_time of each node’s ancestor and the maximum origin_time of its descendants.

The output column name can be changed via the mark_as parameter.

Ancestor origin time for root nodes will be 0.

Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_clade_subtended_duration_polars(phylogeny_df: DataFrame, *, mark_as: str = 'clade_subtended_duration') DataFrame

Add column clade_subtended_duration, containing the difference between each node’s ancestor’s origin_time and the maximum origin_time of its descendants.

The output column name can be changed via the mark_as parameter.

Ancestor origin time for root nodes will be 0.

alifestd_mark_clade_subtended_duration_ratio_sister_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'clade_subtended_duration_ratio_sister') DataFrame

Add column clade_subtended_duration_ratio_sister, containing the ratio of each clade’s subtended duration to that of its sister.

The output column name can be changed via the mark_as parameter.

Root nodes will have ratio 1, unless also a leaf node. Leaf nodes and leaf-sisters may have ratio inf or NaN.

Tree must be strictly bifurcating.

Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_clade_subtended_duration_ratio_sister_polars(phylogeny_df: DataFrame, *, mark_as: str = 'clade_subtended_duration_ratio_sister') DataFrame

Add column clade_subtended_duration_ratio_sister, containing the ratio of each clade’s subtended duration to that of its sister.

The output column name can be changed via the mark_as parameter.

Tree must be strictly bifurcating.

alifestd_mark_colless_index_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'colless_index') DataFrame

Add column colless_index with Colless imbalance index for each subtree.

The output column name can be changed via the mark_as parameter.

Computes the classic Colless index for strictly bifurcating trees. For each internal node with exactly two children, the local contribution is |L - R| where L and R are leaf counts in left and right subtrees. The value at each node represents the total Colless index for the subtree rooted at that node.

Raises ValueError if the tree is not strictly bifurcating. For trees with polytomies, use alifestd_mark_colless_like_index_mdm_asexual for the Colless-like index instead.

Leaf nodes will have Colless index 0 (no imbalance in subtree of size 1). The root node contains the Colless index for the entire tree.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

Parameters

phylogeny_dfpd.DataFrame

Alife standard DataFrame containing the phylogenetic relationships.

mutatebool, optional

If True, modify the input DataFrame in place. Default is False.

Returns

pd.DataFrame

Phylogeny DataFrame with an additional column “colless_index” containing the Colless imbalance index for the subtree rooted at each node.

Raises

ValueError

If phylogeny_df is not strictly bifurcating.

See Also

alifestd_mark_colless_index_corrected_asexual :

Normalized Colless index (corrected for tree size).

alifestd_mark_colless_like_index_mdm_asexual :

Colless-like index (MDM) that supports polytomies.

alifestd_mark_colless_like_index_var_asexual :

Colless-like index (variance) that supports polytomies.

alifestd_mark_colless_like_index_sd_asexual :

Colless-like index (std dev) that supports polytomies.

alifestd_mark_colless_index_corrected_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'colless_index_corrected') DataFrame

Add column colless_index_corrected with the corrected Colless index for each subtree.

The output column name can be changed via the mark_as parameter.

The corrected Colless index IC(T) normalizes the Colless index by tree size. For a subtree with n leaves:

IC(T) = 0 if n <= 2 IC(T) = 2 * C(T) / ((n-1)*(n-2)) if n > 2

where C(T) is the Colless index of the subtree.

This function delegates to alifestd_mark_colless_index_asexual to compute the Colless index, and therefore requires strictly bifurcating trees.

Raises ValueError if the tree is not strictly bifurcating. For trees with polytomies, consider computing the generalized Colless index and normalizing separately.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

Parameters

phylogeny_dfpd.DataFrame

Alife standard DataFrame containing the phylogenetic relationships.

mutatebool, optional

If True, modify the input DataFrame in place. Default is False.

Returns

pd.DataFrame

Phylogeny DataFrame with an additional column “colless_index_corrected” containing the corrected Colless imbalance index for the subtree rooted at each node.

Raises

ValueError

If phylogeny_df is not strictly bifurcating.

See Also

alifestd_mark_colless_index_asexual :

Unnormalized Colless index for strictly bifurcating trees.

alifestd_mark_colless_like_index_mdm_asexual :

Colless-like index (MDM) that supports polytomies.

alifestd_mark_colless_like_index_var_asexual :

Colless-like index (variance) that supports polytomies.

alifestd_mark_colless_like_index_sd_asexual :

Colless-like index (std dev) that supports polytomies.

alifestd_mark_colless_index_corrected_polars(phylogeny_df: DataFrame, *, mark_as: str = 'colless_index_corrected') DataFrame

Add column colless_index_corrected with the corrected Colless index for each subtree.

The output column name can be changed via the mark_as parameter.

alifestd_mark_colless_index_polars(phylogeny_df: DataFrame, *, mark_as: str = 'colless_index') DataFrame

Add column colless_index with Colless imbalance index for each subtree.

The output column name can be changed via the mark_as parameter.

Requires strictly bifurcating trees.

alifestd_mark_colless_like_index_mdm_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'colless_like_index_mdm') DataFrame

Add column colless_like_index_mdm with Colless-like index using mean deviation from the median (MDM) as dissimilarity.

The output column name can be changed via the mark_as parameter.

Computes the Colless-like balance index from Mir, Rossello, and Rotger (2018) that supports polytomies. Uses weight function f(k) = ln(k + e) and MDM dissimilarity.

For each internal node v with children v_1, …, v_k:

bal(v) = MDM(delta_f(T_v1), …, delta_f(T_vk))

where delta_f(T) is the f-size of subtree T, defined as the sum of f(deg(u)) over all nodes u in T, and

MDM(x_1, …, x_k) = (1/k) * sum |x_i - median(x)|

The Colless-like index at a node is the sum of balance values across all internal nodes in its subtree.

Leaf nodes will have Colless-like index 0. The root node contains the Colless-like index for the entire tree.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

Parameters

phylogeny_dfpd.DataFrame

Alife standard DataFrame containing the phylogenetic relationships.

mutatebool, optional

If True, modify the input DataFrame in place. Default is False.

Returns

pd.DataFrame

Phylogeny DataFrame with an additional column “colless_like_index_mdm” containing the Colless-like imbalance index for the subtree rooted at each node.

References

Mir, A., Rossello, F., & Rotger, L. (2018). Sound Colless-like balance indices for multifurcating trees. PLOS ONE, 13(9), e0203401. https://doi.org/10.1371/journal.pone.0203401

See Also

alifestd_mark_colless_like_index_var_asexual :

Colless-like index using variance dissimilarity.

alifestd_mark_colless_like_index_sd_asexual :

Colless-like index using standard deviation dissimilarity.

alifestd_mark_colless_index_asexual :

Classic Colless index for strictly bifurcating trees.

alifestd_mark_colless_like_index_mdm_polars(phylogeny_df: DataFrame, *, mark_as: str = 'colless_like_index_mdm') DataFrame

Add column colless_like_index_mdm with Colless-like index using mean deviation from the median (MDM) as dissimilarity.

The output column name can be changed via the mark_as parameter.

alifestd_mark_colless_like_index_sd_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'colless_like_index_sd') DataFrame

Add column colless_like_index_sd with Colless-like index using sample standard deviation as dissimilarity.

The output column name can be changed via the mark_as parameter.

Computes the Colless-like balance index from Mir, Rossello, and Rotger (2018) that supports polytomies. Uses weight function f(k) = ln(k + e) and standard deviation dissimilarity.

For each internal node v with children v_1, …, v_k:

bal(v) = sd(delta_f(T_v1), …, delta_f(T_vk))

where delta_f(T) is the f-size of subtree T, defined as the sum of f(deg(u)) over all nodes u in T, and

sd(x_1, …, x_k) = sqrt(var(x_1, …, x_k)) var(x_1, …, x_k) = (1/(k-1)) * sum (x_i - mean(x))^2

The Colless-like index at a node is the sum of balance values across all internal nodes in its subtree.

Leaf nodes will have Colless-like index 0. The root node contains the Colless-like index for the entire tree.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

Parameters

phylogeny_dfpd.DataFrame

Alife standard DataFrame containing the phylogenetic relationships.

mutatebool, optional

If True, modify the input DataFrame in place. Default is False.

Returns

pd.DataFrame

Phylogeny DataFrame with an additional column “colless_like_index_sd” containing the Colless-like imbalance index for the subtree rooted at each node.

References

Mir, A., Rossello, F., & Rotger, L. (2018). Sound Colless-like balance indices for multifurcating trees. PLOS ONE, 13(9), e0203401. https://doi.org/10.1371/journal.pone.0203401

See Also

alifestd_mark_colless_like_index_mdm_asexual :

Colless-like index using MDM dissimilarity.

alifestd_mark_colless_like_index_var_asexual :

Colless-like index using variance dissimilarity.

alifestd_mark_colless_index_asexual :

Classic Colless index for strictly bifurcating trees.

alifestd_mark_colless_like_index_sd_polars(phylogeny_df: DataFrame, *, mark_as: str = 'colless_like_index_sd') DataFrame

Add column colless_like_index_sd with Colless-like index using sample standard deviation as dissimilarity.

The output column name can be changed via the mark_as parameter.

alifestd_mark_colless_like_index_var_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'colless_like_index_var') DataFrame

Add column colless_like_index_var with Colless-like index using sample variance as dissimilarity.

The output column name can be changed via the mark_as parameter.

Computes the Colless-like balance index from Mir, Rossello, and Rotger (2018) that supports polytomies. Uses weight function f(k) = ln(k + e) and variance dissimilarity.

For each internal node v with children v_1, …, v_k:

bal(v) = var(delta_f(T_v1), …, delta_f(T_vk))

where delta_f(T) is the f-size of subtree T, defined as the sum of f(deg(u)) over all nodes u in T, and

var(x_1, …, x_k) = (1/(k-1)) * sum (x_i - mean(x))^2

The Colless-like index at a node is the sum of balance values across all internal nodes in its subtree.

Leaf nodes will have Colless-like index 0. The root node contains the Colless-like index for the entire tree.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

Parameters

phylogeny_dfpd.DataFrame

Alife standard DataFrame containing the phylogenetic relationships.

mutatebool, optional

If True, modify the input DataFrame in place. Default is False.

Returns

pd.DataFrame

Phylogeny DataFrame with an additional column “colless_like_index_var” containing the Colless-like imbalance index for the subtree rooted at each node.

References

Mir, A., Rossello, F., & Rotger, L. (2018). Sound Colless-like balance indices for multifurcating trees. PLOS ONE, 13(9), e0203401. https://doi.org/10.1371/journal.pone.0203401

See Also

alifestd_mark_colless_like_index_mdm_asexual :

Colless-like index using MDM dissimilarity.

alifestd_mark_colless_like_index_sd_asexual :

Colless-like index using standard deviation dissimilarity.

alifestd_mark_colless_index_asexual :

Classic Colless index for strictly bifurcating trees.

alifestd_mark_colless_like_index_var_polars(phylogeny_df: DataFrame, *, mark_as: str = 'colless_like_index_var') DataFrame

Add column colless_like_index_var with Colless-like index using sample variance as dissimilarity.

The output column name can be changed via the mark_as parameter.

alifestd_mark_csr_children_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'csr_children') DataFrame

Add column csr_children, a flat array of child ids grouped by parent according to CSR offsets from the csr_offsets column.

The output column name can be changed via the mark_as parameter.

Entries are ordered so that node i’s children occupy positions csr_offsets[i] to csr_offsets[i] + num_children[i] (exclusive). Entries beyond the total number of non-root nodes are unused.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_csr_children_polars(phylogeny_df: DataFrame, *, mark_as: str = 'csr_children') DataFrame

Add column csr_children, a flat array of child ids grouped by parent according to CSR offsets from the csr_offsets column.

The output column name can be changed via the mark_as parameter.

alifestd_mark_csr_offsets_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'csr_offsets') DataFrame

Add column csr_offsets, the CSR offset where each node’s children begin in the corresponding csr_children array.

The output column name can be changed via the mark_as parameter.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_csr_offsets_polars(phylogeny_df: DataFrame, *, mark_as: str = 'csr_offsets') DataFrame

Add column csr_offsets, the CSR offset where each node’s children begin in the corresponding csr_children array.

The output column name can be changed via the mark_as parameter.

alifestd_mark_first_child_id_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'first_child_id') DataFrame

Add column first_child_id, the smallest-id child of each node.

The output column name can be changed via the mark_as parameter.

If a node has no children (is a leaf), marks own id.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_first_child_id_polars(phylogeny_df: DataFrame, *, mark_as: str = 'first_child_id') DataFrame

Add column first_child_id, the smallest-id child of each node.

The output column name can be changed via the mark_as parameter.

If a node has no children (is a leaf), marks own id.

alifestd_mark_is_left_child_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'is_left_child') DataFrame

Add column is_left_child, containing for each node whether it is the smaller-id child.

The output column name can be changed via the mark_as parameter.

Root nodes will be marked False. Tree must be strictly bifurcating.

Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_is_left_child_polars(phylogeny_df: DataFrame, *, mark_as: str = 'is_left_child') DataFrame

Add column is_left_child, containing for each node whether it is the smaller-id child.

The output column name can be changed via the mark_as parameter.

Root nodes will be marked False.

alifestd_mark_is_right_child_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'is_right_child') DataFrame

Add column is_right_child, containing for each node whether it is the larger-id child.

The output column name can be changed via the mark_as parameter.

Root nodes will be marked False. Tree must be strictly bifurcating.

Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_is_right_child_polars(phylogeny_df: DataFrame, *, mark_as: str = 'is_right_child') DataFrame

Add column is_right_child, containing for each node whether it is the larger-id child.

The output column name can be changed via the mark_as parameter.

Root nodes will be marked False.

alifestd_mark_leaves(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'is_leaf') DataFrame

What rows are ancestor to no other row?

The output column name can be changed via the mark_as parameter.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_leaves_polars(phylogeny_df: DataFrame, *, mark_as: str = 'is_leaf') DataFrame

Add column is_leaf marking rows that are ancestor to no other row.

The output column name can be changed via the mark_as parameter.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

Returns

polars.DataFrame

The phylogeny with an added is_leaf boolean column.

See Also

alifestd_mark_leaves :

Pandas-based implementation.

alifestd_mark_left_child_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'left_child_id') DataFrame

Add column left_child, containing for each node its smallest-id child.

The output column name can be changed via the mark_as parameter.

Leaf nodes will be marked with their own id.

Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_left_child_polars(phylogeny_df: DataFrame, *, mark_as: str = 'left_child_id') DataFrame

Add column left_child_id, containing for each node its smallest-id child.

The output column name can be changed via the mark_as parameter.

Leaf nodes will be marked with their own id.

alifestd_mark_lineage_cummax_asexual(phylogeny_df: DataFrame, values: str, mutate: bool = False, *, mark_as: str = 'lineage_cummax', reverse: bool = False, skipna: bool = True) DataFrame

Add column with maximum of values along each lineage.

With reverse=False (default), the result at each node is the maximum of values along the path from the root to that node, inclusive. With reverse=True, the result at each node is the maximum of values over the entire clade rooted at that node, inclusive.

The output column name can be changed via the mark_as parameter. NaN values are treated as -inf if skipna (default), else propagate.

Phylogeny must be asexual, topologically sorted, and have contiguous ids; otherwise NotImplementedError is raised.

Input dataframe is not mutated by this operation unless mutate is set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_lineage_cummax_polars(phylogeny_df: DataFrame, values: str | Expr, *, mark_as: str = 'lineage_cummax', reverse: bool = False, skipna: bool = True) DataFrame

Add column with maximum of values along each lineage.

With reverse=False (default), the result at each node is the maximum of values along the path from the root to that node, inclusive. With reverse=True, the result at each node is the maximum of values over the entire clade rooted at that node, inclusive.

See Also

alifestd_mark_lineage_cummax_asexual :

Pandas-based implementation.

alifestd_mark_lineage_cummin_asexual(phylogeny_df: DataFrame, values: str, mutate: bool = False, *, mark_as: str = 'lineage_cummin', reverse: bool = False, skipna: bool = True) DataFrame

Add column with minimum of values along each lineage.

With reverse=False (default), the result at each node is the minimum of values along the path from the root to that node, inclusive. With reverse=True, the result at each node is the minimum of values over the entire clade rooted at that node, inclusive.

The output column name can be changed via the mark_as parameter. NaN values are treated as +inf if skipna (default), else propagate.

Phylogeny must be asexual, topologically sorted, and have contiguous ids; otherwise NotImplementedError is raised.

Input dataframe is not mutated by this operation unless mutate is set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_lineage_cummin_polars(phylogeny_df: DataFrame, values: str | Expr, *, mark_as: str = 'lineage_cummin', reverse: bool = False, skipna: bool = True) DataFrame

Add column with minimum of values along each lineage.

With reverse=False (default), the result at each node is the minimum of values along the path from the root to that node, inclusive. With reverse=True, the result at each node is the minimum of values over the entire clade rooted at that node, inclusive.

See Also

alifestd_mark_lineage_cummin_asexual :

Pandas-based implementation.

alifestd_mark_lineage_cumprod_asexual(phylogeny_df: DataFrame, values: str, mutate: bool = False, *, mark_as: str = 'lineage_cumprod', reverse: bool = False, skipna: bool = True) DataFrame

Add column with cumulative product of values along each lineage.

With reverse=False (default), the result at each node is the product of values along the path from the root to that node, inclusive. With reverse=True, the result at each node is the product of values over the entire clade rooted at that node, inclusive.

The output column name can be changed via the mark_as parameter. NaN values are treated as 1 if skipna (default), else propagate.

Phylogeny must be asexual, topologically sorted, and have contiguous ids; otherwise NotImplementedError is raised.

Input dataframe is not mutated by this operation unless mutate is set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_lineage_cumprod_polars(phylogeny_df: DataFrame, values: str | Expr, *, mark_as: str = 'lineage_cumprod', reverse: bool = False, skipna: bool = True) DataFrame

Add column with cumulative product of values along each lineage.

With reverse=False (default), the result at each node is the product of values along the path from the root to that node, inclusive. With reverse=True, the result at each node is the product of values over the entire clade rooted at that node, inclusive.

See Also

alifestd_mark_lineage_cumprod_asexual :

Pandas-based implementation.

alifestd_mark_lineage_cumsum_asexual(phylogeny_df: DataFrame, values: str, mutate: bool = False, *, mark_as: str = 'lineage_cumsum', reverse: bool = False, skipna: bool = True) DataFrame

Add column with cumulative sum of values along each lineage.

With reverse=False (default), the result at each node is the sum of values along the path from the root to that node, inclusive. With reverse=True, the result at each node is the sum of values over the entire clade rooted at that node, inclusive.

The output column name can be changed via the mark_as parameter. NaN values are treated as 0 if skipna (default), else propagate.

Phylogeny must be asexual, topologically sorted, and have contiguous ids; otherwise NotImplementedError is raised.

Input dataframe is not mutated by this operation unless mutate is set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_lineage_cumsum_polars(phylogeny_df: DataFrame, values: str | Expr, *, mark_as: str = 'lineage_cumsum', reverse: bool = False, skipna: bool = True) DataFrame

Add column with cumulative sum of values along each lineage.

With reverse=False (default), the result at each node is the sum of values along the path from the root to that node, inclusive. With reverse=True, the result at each node is the sum of values over the entire clade rooted at that node, inclusive.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

valuesstr or polars.Expr

Column name or polars expression providing per-node values.

mark_asstr, default “lineage_cumsum”

Output column name.

reversebool, default False

If True, aggregate over clade rooted at each node.

skipnabool, default True

If True, NaN values are treated as identity (0); else propagate.

See Also

alifestd_mark_lineage_cumsum_asexual :

Pandas-based implementation.

alifestd_mark_max_descendant_origin_time_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'max_descendant_origin_time') DataFrame

Add column max_descendant_origin_time, excluding self.

The output column name can be changed via the mark_as parameter.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_max_descendant_origin_time_polars(phylogeny_df: DataFrame, *, mark_as: str = 'max_descendant_origin_time') DataFrame

Add column max_descendant_origin_time, excluding self.

The output column name can be changed via the mark_as parameter.

alifestd_mark_next_sibling_id_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'next_sibling_id') DataFrame

Add column next_sibling_id, the next-highest id sharing the same parent.

The output column name can be changed via the mark_as parameter.

If no such sibling exists, marks own id.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_next_sibling_id_polars(phylogeny_df: DataFrame, *, mark_as: str = 'next_sibling_id') DataFrame

Add column next_sibling_id, the next-highest id sharing the same parent.

The output column name can be changed via the mark_as parameter.

If no such sibling exists, marks own id.

alifestd_mark_node_depth_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'node_depth') DataFrame

Add column node_depth, counting the number of nodes between a node and the root.

The output column name can be changed via the mark_as parameter.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_node_depth_polars(phylogeny_df: DataFrame, *, mark_as: str = 'node_depth') DataFrame

Add column node_depth, counting the number of nodes between a node and the root.

The output column name can be changed via the mark_as parameter.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.

Returns

polars.DataFrame

The phylogeny with an added node_depth integer column.

See Also

alifestd_mark_node_depth_asexual :

Pandas-based implementation.

alifestd_mark_num_children_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'num_children') DataFrame

Add column num_children, counting for each node the number of nodes it is parent to.

The output column name can be changed via the mark_as parameter.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_num_children_polars(phylogeny_df: DataFrame, *, mark_as: str = 'num_children') DataFrame

Add column num_children, counting for each node the number of nodes it is parent to.

The output column name can be changed via the mark_as parameter.

alifestd_mark_num_descendants_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'num_descendants') DataFrame

Add column num_descendants, excluding self.

The output column name can be changed via the mark_as parameter.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_num_descendants_polars(phylogeny_df: DataFrame, *, mark_as: str = 'num_descendants') DataFrame

Add column num_descendants, excluding self.

The output column name can be changed via the mark_as parameter.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

Returns

polars.DataFrame

The phylogeny with an added num_descendants column.

See Also

alifestd_mark_num_descendants_asexual :

Pandas-based implementation.

alifestd_mark_num_leaves_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'num_leaves') DataFrame

Add column num_leaves with count of all descendant leaves, including self if a leaf.

The output column name can be changed via the mark_as parameter.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_num_leaves_polars(phylogeny_df: DataFrame, *, mark_as: str = 'num_leaves') DataFrame

Add column num_leaves with count of all descendant leaves, including self if a leaf.

The output column name can be changed via the mark_as parameter.

alifestd_mark_num_leaves_sibling_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'num_leaves_sibling') DataFrame

Mark the number of leaves descendant from each node’s siblings.

The output column name can be changed via the mark_as parameter.

Nodes with no siblings (e.g., root nodes) will have value 0 marked.

Parameters

phylogeny_dfpd.DataFrame

Alife standard DataFrame containing the phylogenetic relationships.

mutatebool, optional

If True, modify the input DataFrame in place. Default is False.

Returns

pd.DataFrame

Phylogeny DataFrame with an additional column “num_leaves_sibling”

alifestd_mark_num_leaves_sibling_polars(phylogeny_df: DataFrame, *, mark_as: str = 'num_leaves_sibling') DataFrame

Mark the number of leaves descendant from each node’s siblings.

The output column name can be changed via the mark_as parameter.

Nodes with no siblings (e.g., root nodes) will have value 0 marked.

alifestd_mark_num_preceding_leaves_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'num_preceding_leaves') DataFrame

Add column num_preceding_leaves with count of all leaves occurring before the present node in an inorder traversal.

The output column name can be changed via the mark_as parameter.

For internal nodes, the number of leaf nodes prior to the traversal of first (i.e., leftmost) descendant is marked.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Must be a strictly bifurcating tree.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_num_preceding_leaves_polars(phylogeny_df: DataFrame, *, mark_as: str = 'num_preceding_leaves') DataFrame

Add column num_preceding_leaves with count of all leaves occurring before the present node in an inorder traversal.

The output column name can be changed via the mark_as parameter.

alifestd_mark_oldest_root(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'is_oldest_root') DataFrame

Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id.

The output column name can be changed via the mark_as parameter.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_oldest_root_polars(phylogeny_df: DataFrame, *, mark_as: str = 'is_oldest_root') DataFrame

Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id.

The output column name can be changed via the mark_as parameter.

alifestd_mark_origin_time_delta_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'origin_time_delta') DataFrame

Add columns origin_time_delta and ancestor_origin_time.

The output column name can be changed via the mark_as parameter.

Dataframe must provide column origin_time.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_origin_time_delta_polars(phylogeny_df: DataFrame, *, mark_as: str = 'origin_time_delta') DataFrame

Add columns origin_time_delta and ancestor_origin_time.

The output column name can be changed via the mark_as parameter.

Dataframe must provide column origin_time.

alifestd_mark_ot_mrca_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, mark_as: str = 'ot_mrca', progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame

Appends columns characterizing the Most Recent Common Ancestor (MRCA) of the entire extant population at each taxon’s origin_time.

The output column name prefix can be changed via the mark_as parameter.

The extant population is defined in terms of active lineages: any branch of the tree existing at an origin_time which contains at least one descendant at or after that time.

New Columns:

ot_mrca_idint

The unique identifier of the MRCA for the population that was extant at this organism’s origin_time.

ot_mrca_time_ofint or float

The origin_time of that MRCA.

ot_mrca_time_sinceint or float

The duration elapsed between the MRCA’s origin_time and this taxon’s origin_time.

A chronological sort will be applied if phylogeny_df is not chronologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_ot_mrca_polars(phylogeny_df: DataFrame, *, mark_as: str = 'ot_mrca') DataFrame

Appends columns characterizing the Most Recent Common Ancestor (MRCA) of the entire extant population at each taxon’s origin_time.

The output column name prefix can be changed via the mark_as parameter.

The extant population is defined in terms of active lineages: any branch of the tree existing at an origin_time which contains at least one descendant at or after that time.

New Columns

ot_mrca_idint

The unique identifier of the MRCA for the population that was extant at this organism’s origin_time.

ot_mrca_time_ofint or float

The origin_time of that MRCA.

ot_mrca_time_sinceint or float

The duration elapsed between the MRCA’s origin_time and this taxon’s origin_time.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny with a single root.

Returns

polars.DataFrame

The phylogeny with added ot_mrca_id, ot_mrca_time_of, and ot_mrca_time_since columns.

See Also

alifestd_mark_ot_mrca_asexual :

Pandas-based implementation.

alifestd_mark_prev_sibling_id_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'prev_sibling_id') DataFrame

Add column prev_sibling_id, the next-lowest id sharing the same parent.

The output column name can be changed via the mark_as parameter.

If no such sibling exists, marks own id.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_prev_sibling_id_polars(phylogeny_df: DataFrame, *, mark_as: str = 'prev_sibling_id') DataFrame

Add column prev_sibling_id, the next-lowest id sharing the same parent.

The output column name can be changed via the mark_as parameter.

If no such sibling exists, marks own id.

alifestd_mark_right_child_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'right_child_id') DataFrame

Add column right_child, containing for each node its largest-id child.

The output column name can be changed via the mark_as parameter.

Leaf nodes will be marked with their own id.

Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_right_child_polars(phylogeny_df: DataFrame, *, mark_as: str = 'right_child_id') DataFrame

Add column right_child_id, containing for each node its largest-id child.

The output column name can be changed via the mark_as parameter.

Leaf nodes will be marked with their own id.

alifestd_mark_root_id(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, selector: ~typing.Callable = <built-in function min>, *, mark_as: str = 'root_id') DataFrame

Add column root_id, containing the id of entries’ ultimate ancestor.

The output column name can be changed via the mark_as parameter.

For sexual data, the field root_id is chosen according to the selection of callable selector over parents’ root_id values. Note that subsets within a connected component may be marked with different root_id values. To create a component id that is consistent within connected components, a backward pass could be performed that updates ancestors’ values if they are greater than that of each descendant.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_root_id_polars(phylogeny_df: DataFrame, *, mark_as: str = 'root_id') DataFrame

Add column root_id, containing the id of entries’ ultimate ancestor.

The output column name can be changed via the mark_as parameter.

alifestd_mark_roots(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'is_root') DataFrame

Create column is_root to mark rows with no ancestor.

The output column name can be changed via the mark_as parameter.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_roots_polars(phylogeny_df: DataFrame, *, mark_as: str = 'is_root') DataFrame

Create column is_root to mark rows with no ancestor.

The output column name can be changed via the mark_as parameter.

alifestd_mark_sackin_index_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'sackin_index') DataFrame

Add column sackin_index with Sackin index for each subtree.

The output column name can be changed via the mark_as parameter.

Computes the Sackin imbalance index, which is the sum of the depths of all leaves in the subtree. For each internal node, the contribution is the sum of leaf depths in its subtree.

For a node with children c_1, c_2, …, c_k:

sackin[node] = sum_{i} (sackin[c_i] + num_leaves[c_i])

This formula naturally supports both bifurcating trees and trees with polytomies.

Leaf nodes will have Sackin index 0. The root node contains the Sackin index for the entire tree.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

Parameters

phylogeny_dfpd.DataFrame

Alife standard DataFrame containing the phylogenetic relationships.

mutatebool, optional

If True, modify the input DataFrame in place. Default is False.

Returns

pd.DataFrame

Phylogeny DataFrame with an additional column “sackin_index” containing the Sackin imbalance index for the subtree rooted at each node.

See Also

alifestd_mark_colless_index_asexual :

Colless index for strictly bifurcating trees.

alifestd_mark_colless_like_index_mdm_asexual :

Colless-like index that supports polytomies.

alifestd_mark_sackin_index_polars(phylogeny_df: DataFrame, *, mark_as: str = 'sackin_index') DataFrame

Add column sackin_index with Sackin index for each subtree.

The output column name can be changed via the mark_as parameter.

alifestd_mark_sample_tips_asexual(phylogeny_df: DataFrame, n_sample: int, mutate: bool = False, seed: int | None = None, *, mark_as: str = 'alifestd_mark_sample_tips_asexual') DataFrame

Mark a random subsample of n_sample tips.

Adds a boolean column mark_as indicating retained tips.

If n_sample is greater than the number of tips in the phylogeny, all tips are marked.

Only supports asexual phylogenies.

Deprecated since version 0.6.0: Use alifestd_mark_sample_tips_uniform_asexual instead.

alifestd_mark_sample_tips_canopy_asexual(phylogeny_df: DataFrame, n_sample: int | None = None, mutate: bool = False, criterion: str = 'origin_time', *, mark_as: str = 'alifestd_mark_sample_tips_canopy_asexual') DataFrame

Mark the n_sample leaves with the largest criterion values.

Adds a boolean column mark_as indicating retained tips.

If n_sample is None, it defaults to the number of leaves that share the maximum value of the criterion column. If n_sample is greater than or equal to the number of leaves in the phylogeny, all leaves are marked. Ties are broken arbitrarily.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpandas.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_sampleint, optional

Number of tips to mark. If None, defaults to the count of leaves with the maximum criterion value.

mutatebool, default False

Are side effects on the input argument phylogeny_df allowed?

criterionstr, default “origin_time”

Column name used to rank leaves. The n_sample leaves with the largest values in this column are marked. Ties are broken arbitrarily.

mark_asstr, default “alifestd_mark_sample_tips_canopy_asexual”

Column name for the boolean mark.

Raises

ValueError

If criterion is not a column in phylogeny_df.

Returns

pandas.DataFrame

The phylogeny with an added boolean mark column.

alifestd_mark_sample_tips_canopy_polars(phylogeny_df: DataFrame, n_sample: int | None = None, criterion: str | Expr = 'origin_time', *, mark_as: str = 'alifestd_mark_sample_tips_canopy_polars') DataFrame

Mark the n_sample leaves with the largest criterion values.

Adds a boolean column mark_as indicating retained tips.

If n_sample is None, it defaults to the number of leaves that share the maximum value of the criterion column. If n_sample is greater than or equal to the number of leaves in the phylogeny, all leaves are marked. Ties are broken arbitrarily.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_sampleint, optional

Number of tips to mark. If None, defaults to the count of leaves with the maximum criterion value.

criterionstr or polars.Expr, default “origin_time”

Column name or polars expression used to rank leaves. The n_sample leaves with the largest values are marked. Ties are broken arbitrarily.

mark_asstr, default “alifestd_mark_sample_tips_canopy_polars”

Column name for the boolean mark.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column.

ValueError

If criterion is not a column in phylogeny_df.

Returns

polars.DataFrame

The phylogeny with an added boolean mark column.

See Also

alifestd_mark_sample_tips_canopy_asexual :

Pandas-based implementation.

alifestd_mark_sample_tips_clade_asexual(phylogeny_df: DataFrame, n_sample: int, mutate: bool = False, seed: int | None = None, *, mark_as: str = 'alifestd_mark_sample_tips_clade_asexual') DataFrame

Mark tips belonging to a randomly sampled clade of at most n_sample tips.

Adds a boolean column mark_as indicating retained tips. Candidate clades are sampled proportionally to their size.

If n_sample is greater than the number of tips in the phylogeny, all tips are marked.

Only supports asexual phylogenies.

alifestd_mark_sample_tips_clade_polars(phylogeny_df: DataFrame, n_sample: int, seed: int | None = None, *, mark_as: str = 'alifestd_mark_sample_tips_clade_polars') DataFrame

Mark tips belonging to a randomly sampled clade of at most n_sample tips.

Adds a boolean column mark_as indicating retained tips. Candidate clades are sampled proportionally to their size.

If n_sample is greater than the number of tips in the phylogeny, all tips are marked.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_sampleint

Number of tips to mark.

seedint, optional

Integer seed for deterministic behavior.

mark_asstr, default “alifestd_mark_sample_tips_clade_polars”

Column name for the boolean mark.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.

Returns

polars.DataFrame

The phylogeny with an added boolean mark column.

See Also

alifestd_mark_sample_tips_clade_asexual :

Pandas-based implementation.

alifestd_mark_sample_tips_lineage_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, n_sample: int, mutate: bool = False, seed: int | None = None, *, criterion_delta: str = 'origin_time', criterion_target: str = 'origin_time', progress_wrap: ~typing.Callable = <function <lambda>>, mark_as: str = 'alifestd_mark_sample_tips_lineage_asexual') DataFrame

Mark the n_sample leaves closest to the lineage of a target leaf.

Adds a boolean column mark_as indicating retained tips.

Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each leaf, the most recent common ancestor (MRCA) with the target leaf is identified and the “off-lineage delta” is computed as the absolute difference between the leaf’s criterion_delta value and its MRCA’s criterion_delta value. The n_sample leaves with the smallest off-lineage deltas are marked.

If n_sample is greater than or equal to the number of leaves in the phylogeny, all leaves are marked. Ties in off-lineage delta are broken arbitrarily.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpandas.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_sampleint

Number of tips to mark.

mutatebool, default False

Are side effects on the input argument phylogeny_df allowed?

seedint, optional

Random seed for reproducible target-leaf selection when there are ties in criterion_target.

criterion_deltastr, default “origin_time”

Column name used to compute the off-lineage delta for each leaf.

criterion_targetstr, default “origin_time”

Column name used to select the target leaf.

progress_wrapCallable, optional

Pass tqdm or equivalent to display a progress bar.

mark_asstr, default “alifestd_mark_sample_tips_lineage_asexual”

Column name for the boolean mark.

Raises

ValueError

If criterion_delta or criterion_target is not a column in phylogeny_df.

Returns

pandas.DataFrame

The phylogeny with an added boolean mark column.

alifestd_mark_sample_tips_lineage_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, n_sample: int, seed: int | None = None, *, criterion_delta: str | ~polars.expr.expr.Expr = 'origin_time', criterion_target: str | ~polars.expr.expr.Expr = 'origin_time', progress_wrap: ~typing.Callable = <function <lambda>>, mark_as: str = 'alifestd_mark_sample_tips_lineage_polars') DataFrame

Mark the n_sample leaves closest to the lineage of a target leaf.

Adds a boolean column mark_as indicating retained tips.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_sampleint

Number of tips to mark.

seedint, optional

Random seed for reproducible target-leaf selection.

criterion_deltastr or polars.Expr, default “origin_time”

Column name or polars expression used to compute the off-lineage delta for each leaf.

criterion_targetstr or polars.Expr, default “origin_time”

Column name or polars expression used to select the target leaf.

progress_wrapCallable, optional

Pass tqdm or equivalent to display a progress bar.

mark_asstr, default “alifestd_mark_sample_tips_lineage_polars”

Column name for the boolean mark.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.

ValueError

If criterion_delta or criterion_target is not a column in phylogeny_df.

Returns

polars.DataFrame

The phylogeny with an added boolean mark column.

See Also

alifestd_mark_sample_tips_lineage_asexual :

Pandas-based implementation.

alifestd_mark_sample_tips_lineage_stratified_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, n_sample: int | None = None, mutate: bool = False, seed: int | None = None, *, criterion_delta: str = 'origin_time', criterion_stratify: str = 'origin_time', criterion_target: str = 'origin_time', n_tips_per_stratum: int = 1, progress_wrap: ~typing.Callable = <function <lambda>>, mark_as: str = 'alifestd_mark_sample_tips_lineage_stratified_asexual') DataFrame

Mark leaves per stratified group, chosen by proximity to the lineage of a target leaf.

Adds a boolean column mark_as indicating retained tips.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpandas.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_sampleint, optional

Desired number of retained tips. If None, every distinct criterion_stratify value forms its own group.

mutatebool, default False

Are side effects on the input argument phylogeny_df allowed?

seedint, optional

Random seed for reproducible target-leaf selection.

criterion_deltastr, default “origin_time”

Column name used to compute the off-lineage delta for each leaf.

criterion_stratifystr, default “origin_time”

Column name used to stratify leaves into groups.

criterion_targetstr, default “origin_time”

Column name used to select the target leaf.

n_tips_per_stratumint, default 1

Number of tips to retain per stratified group.

progress_wrapCallable, optional

Pass tqdm or equivalent to display a progress bar.

mark_asstr, default “alifestd_mark_sample_tips_lineage_stratified_asexual”

Column name for the boolean mark.

Raises

ValueError

If criterion_delta, criterion_stratify, or criterion_target is not a column in phylogeny_df.

ValueError

If n_sample is not None and n_tips_per_stratum does not evenly divide n_sample.

Returns

pandas.DataFrame

The phylogeny with an added boolean mark column.

alifestd_mark_sample_tips_lineage_stratified_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, n_sample: int | None = None, seed: int | None = None, *, criterion_delta: str | ~polars.expr.expr.Expr = 'origin_time', criterion_stratify: str | ~polars.expr.expr.Expr = 'origin_time', criterion_target: str | ~polars.expr.expr.Expr = 'origin_time', n_tips_per_stratum: int = 1, progress_wrap: ~typing.Callable = <function <lambda>>, mark_as: str = 'alifestd_mark_sample_tips_lineage_stratified_polars') DataFrame

Mark leaves per stratified group, chosen by proximity to the lineage of a target leaf.

Adds a boolean column mark_as indicating retained tips.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_sampleint, optional

Desired number of retained tips. If None, every distinct criterion_stratify value forms its own group.

seedint, optional

Random seed for reproducible target-leaf selection.

criterion_deltastr or polars.Expr, default “origin_time”

Column name or polars expression used to compute the off-lineage delta for each leaf.

criterion_stratifystr or polars.Expr, default “origin_time”

Column name or polars expression used to stratify leaves into groups.

criterion_targetstr or polars.Expr, default “origin_time”

Column name or polars expression used to select the target leaf.

n_tips_per_stratumint, default 1

Number of tips to retain per stratified group.

progress_wrapCallable, optional

Pass tqdm or equivalent to display a progress bar.

mark_asstr, default “alifestd_mark_sample_tips_lineage_stratified_polars”

Column name for the boolean mark.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.

ValueError

If criterion_delta, criterion_stratify, or criterion_target is not a column in phylogeny_df.

ValueError

If n_sample is not None and n_tips_per_stratum does not evenly divide n_sample.

Returns

polars.DataFrame

The phylogeny with an added boolean mark column.

See Also

alifestd_mark_sample_tips_lineage_stratified_asexual :

Pandas-based implementation.

alifestd_mark_sample_tips_polars(phylogeny_df: DataFrame, n_sample: int, seed: int | None = None, *, mark_as: str = 'alifestd_mark_sample_tips_polars') DataFrame

Mark a random subsample of n_sample tips.

Adds a boolean column mark_as indicating retained tips.

If n_sample is greater than the number of tips in the phylogeny, all tips are marked.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_sampleint

Number of tips to mark.

seedint, optional

Integer seed for deterministic behavior.

mark_asstr, default “alifestd_mark_sample_tips_polars”

Column name for the boolean mark.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column.

Returns

polars.DataFrame

The phylogeny with an added boolean mark column.

See Also

alifestd_mark_sample_tips_uniform_polars :

Preferred non-deprecated implementation.

alifestd_mark_sample_tips_asexual :

Pandas-based implementation.

Deprecated since version 0.6.0: Use alifestd_mark_sample_tips_uniform_polars instead.

alifestd_mark_sample_tips_uniform_asexual(phylogeny_df: DataFrame, n_sample: int, mutate: bool = False, seed: int | None = None, *, mark_as: str = 'alifestd_mark_sample_tips_uniform_asexual') DataFrame

Mark a random subsample of n_sample tips.

Adds a boolean column mark_as indicating retained tips.

If n_sample is greater than the number of tips in the phylogeny, all tips are marked.

Only supports asexual phylogenies.

alifestd_mark_sample_tips_uniform_polars(phylogeny_df: DataFrame, n_sample: int, seed: int | None = None, *, mark_as: str = 'alifestd_mark_sample_tips_uniform_polars') DataFrame

Mark a random subsample of n_sample tips.

Adds a boolean column mark_as indicating retained tips.

If n_sample is greater than the number of tips in the phylogeny, all tips are marked.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

n_sampleint

Number of tips to mark.

seedint, optional

Integer seed for deterministic behavior.

mark_asstr, default “alifestd_mark_sample_tips_uniform_polars”

Column name for the boolean mark.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column.

Returns

polars.DataFrame

The phylogeny with an added boolean mark column.

See Also

alifestd_mark_sample_tips_uniform_asexual :

Pandas-based implementation.

alifestd_mark_sister_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'sister_id') DataFrame

Add column sister, containing the id of each node’s sibling.

The output column name can be changed via the mark_as parameter.

Root nodes will be marked with their own id. Phylogeny must be strictly bifurcating.

Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mark_sister_polars(phylogeny_df: DataFrame, *, mark_as: str = 'sister_id') DataFrame

Add column sister_id, containing the id of each node’s sibling.

The output column name can be changed via the mark_as parameter.

Root nodes will be marked with their own id.

alifestd_mask_descendants_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, ancestor_mask: ndarray) DataFrame

For given ancestor nodes, create a mask identifying those nodes and all descendants.

Ancestral nodes are identified by ancestor_mask corresponding to rows in phylogeny_df.

The mask is returned as a new column alifestd_mask_descendants_asexual in the output DataFrame.

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_mask_descendants_polars(phylogeny_df: DataFrame, *, ancestor_mask: ndarray) DataFrame

For given ancestor nodes, create a mask identifying those nodes and all descendants.

Ancestral nodes are identified by ancestor_mask corresponding to rows in phylogeny_df.

The mask is returned as a new column alifestd_mask_descendants_polars in the output DataFrame.

Only supports asexual phylogenies.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny with contiguous, topologically sorted ids and an ancestor_id column.

ancestor_masknumpy.ndarray

Boolean array indicating ancestor nodes to propagate from.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.

Returns

polars.DataFrame

The input DataFrame with an additional boolean column alifestd_mask_descendants_polars.

alifestd_mask_monomorphic_clades_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, trait_mask: ndarray, trait_values: ndarray) DataFrame

Compute a mask marking “monomorphic” clades where all members with a trait defined value share the same trait value.

Clades containing no members with a defined trait value are considered monomorphic. All leaf nodes are considered monomorphic.

Parameters

phylogeny_dfpd.DataFrame

DataFrame containing the phylogeny, including an ancestor_id column.

mutatebool, default=False

If False, operates on a copy of phylogeny_df; if True, modifies phylogeny_df in place (but still returns it).

trait_masknp.ndarray

Boolean array marking the nodes that have a defined trait value, aligned with phylogeny_df.index.

trait_valuesnp.ndarray

Array of trait values aligned with phylogeny_df.index.

Returns

pd.DataFrame

alifestd_parse_ancestor_id(ancestor_list_str: str) int | None

Parse at most a single ancestor id from an ancestor_list field.

alifestd_parse_ancestor_ids(ancestor_list_str: str) List[int]

Parse ancestor ids from an ancestor_list field.

alifestd_pipe_unary_ops(phylogeny_df: ~pandas.core.frame.DataFrame, *unary_ops: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame], progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame

Pipe a phylogeny DataFrame through a sequence of unary operations.

Each operation in unary_ops is applied in order to the DataFrame.

Parameters

phylogeny_dfpandas.DataFrame

The phylogeny as a dataframe in alife standard format.

*unary_opscallable

Zero or more callables, each accepting and returning a DataFrame.

progress_wrapcallable, optional

Optional wrapper for unary_ops to provide progress feedback (e.g. tqdm).

Returns

pandas.DataFrame

The result of piping phylogeny_df through each operation in order.

See Also

alifestd_pipe_unary_ops_polars :

Polars-based implementation.

This function also accepts a polars.DataFrame, for which there is a separate delegated implementation.

alifestd_pipe_unary_ops_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, *unary_ops: ~typing.Callable[[~polars.dataframe.frame.DataFrame], ~polars.dataframe.frame.DataFrame], progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame

Pipe a phylogeny DataFrame through a sequence of unary operations.

Each operation in unary_ops is applied in order to the DataFrame.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

*unary_opscallable

Zero or more callables, each accepting and returning a DataFrame.

progress_wrapcallable, optional

Optional wrapper for unary_ops to provide progress feedback (e.g. tqdm).

Returns

polars.DataFrame

The result of piping phylogeny_df through each operation in order.

See Also

alifestd_pipe_unary_ops :

Pandas-based implementation.

alifestd_prefix_roots(phylogeny_df: DataFrame, *, allow_id_reassign: bool = False, origin_time: Real | None = None, mutate: bool = False) DataFrame

Add new roots to the phylogeny, prefixing existing roots.

An origin time may be specified, in which case only roots with origin times past the specified time will be prefixed. If no origin time is specified, all roots will be prefixed.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_prefix_roots_polars(phylogeny_df: DataFrame, *, allow_id_reassign: bool = False, origin_time: Real | None = None) DataFrame

Add new roots to the phylogeny, prefixing existing roots.

An origin time may be specified, in which case only roots with origin times past the specified time will be prefixed. If no origin time is specified, all roots will be prefixed.

alifestd_prune_extinct_lineages_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, criterion: str = 'extant') DataFrame

Drop taxa without extant descendants.

The criterion column is used to determine extant taxa.

Fastest with records in working format. See alifestd_to_working_format.

Parameters

phylogeny_dfpandas.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

mutatebool, default False

Are side effects on the input argument phylogeny_df allowed?

criterionstr, default “extant”

Column name used to determine extant taxa.

Raises

ValueError

If criterion is not a column in phylogeny_df.

Returns

pandas.DataFrame

The pruned phylogeny in alife standard format.

alifestd_prune_extinct_lineages_polars(phylogeny_df: DataFrame, *, criterion: str = 'extant') DataFrame

Drop taxa without extant descendants.

The criterion column is used to determine extant taxa.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

criterionstr, default “extant”

Column name used to determine extant taxa.

Raises

NotImplementedError

If phylogeny_df has no “ancestor_id” column.

NotImplementedError

If phylogeny_df has non-contiguous ids.

NotImplementedError

If phylogeny_df is not topologically sorted.

ValueError

If criterion is not a column in phylogeny_df.

Returns

polars.DataFrame

The pruned phylogeny in alife standard format.

See Also

alifestd_prune_extinct_lineages_asexual :

Pandas-based implementation.

alifestd_reroot_at_id_asexual(phylogeny_df: DataFrame, new_root_id: int, mutate: bool = False) DataFrame

Reroot phylogeny, preserving topology.

Reverses the descendant-to-ancestor relationships of all ancestors of the new root. Does not update branch_lengths or edge_lengths columns if present.

Parameters

phylogeny_dfpandas.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

new_root_idint

The ID of the node to use as the new root of the phylogeny.

mutatebool, default False

Are side effects on the input argument phylogeny_df allowed?

Returns

pandas.DataFrame

The rerooted phylogeny in alife standard format.

alifestd_reroot_at_id_polars(phylogeny_df: DataFrame, new_root_id: int) DataFrame

Reroot phylogeny at specified node id, preserving topology.

Reverses the descendant-to-ancestor relationships of all ancestors of the new root. Does not update branch_lengths or edge_lengths columns if present.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny.

new_root_idint

The ID of the node to use as the new root of the phylogeny.

Returns

polars.DataFrame

The rerooted phylogeny in alife standard format.

alifestd_sample_triplet_comparisons_asexual(first_df: ~pandas.core.frame.DataFrame, second_df: ~pandas.core.frame.DataFrame, taxon_label_key: str, n: int = 1000, progress_wrap: ~typing.Callable = <function <lambda>>, mutate: bool = False) DataFrame

Sample triplet comparisons between two asexual phylogenetic trees in alife standard form, creating a DataFrame with the triplet categorizations and comparison results as well as corresponding data from MRCA row within the first tree.

The MRCA row corresponds to the most recent common ancestor of two of the three taxa in the triplet.

Parameters

first_dfpd.DataFrame

The DataFrame representing the first phylogenetic tree.

second_dfpd.DataFrame

The DataFrame representing the second phylogenetic tree.

taxon_label_keystr

The key in the DataFrame to identify the taxon labels.

nint, default 1000

The number of samples to take.

Corresponds to number of rows in the returned DataFrame.

progress_wraptyping.Callable, optional

Pass tqdm or equivalent to display a progress bar.

mutatebool, default False

If True, allows mutation of input DataFrames.

Returns

pd.DataFrame

A DataFrame with rows corresponding to sampled triplet comparisons and the following columns: - “triplet code, {first,second}”: the categorization of the triplet in

the first or second tree.

  • “triplet match, {lax,lax/strict,strict,strict/lax}”: whether the triplet categorizations match with differing treatment of polytomies.

  • all columns from the first tree.

Notes

The core comparison is done by sampling triplets of taxa, categorizing them, and comparing these categorizations across the two trees, taking into account the strict and lax parameters for handling polytomies. See alifestd_categorize_triplet_asexual for details.

See Also

alifestd_categorize_triplet_asexual alifestd_estimate_triplet_distance_asexual

alifestd_screen_trait_defined_clades_fisher_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mask_trait_absent: ndarray, mask_trait_present: ndarray) ndarray

Perform a screen for trait-defined clades based on Fisher’s exact test.

This function computes a Fisher’s exact test comparing the trait frequency (number of clade members with the trait, number of clade members without the trait) in a clade with its sister clade. Returned values are one-tailed p-values for the hypothesis that the trait frequency in the clade is greater than in the sister clade.

Root clades will be compared to themselves, as they have no sister clade. As such, root clades will take on p-values > 0.5.

The mask_trait_absent parameter can be used to exclude nodes from consideration, for instance &’ing with alifestd_mark_leaves can be used to only consider trait frequency among descendant leaves.

Returns a numpy array of bool with the same length as the input DataFrame, with array elements as the number of nodes in the clade that have the trait. Returned array matches row order of the input DataFrame.

alifestd_screen_trait_defined_clades_fitch_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, mask_trait_absent: ~numpy.ndarray, mask_trait_present: ~numpy.ndarray, progress_wrap: ~typing.Callable = <function <lambda>>) ndarray

Perform a maximum parsimony screen for trait-defined clades using Fitch’s algorithm.

The mask_trait_absent parameter can be used to exclude nodes from consideration, for instance &’ing with alifestd_mark_leaves can be used to only consider traits on leaves.

Pass tqdm or equivalent as progress_wrap to display a progress bar.

Default root state is assumed to be False.

Returns a numpy array of bool with the same length as the input DataFrame, with array elements as the number of nodes in the clade that have the trait. Returned array matches row order of the input DataFrame.

alifestd_screen_trait_defined_clades_naive_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mask_trait_absent: ndarray, mask_trait_present: ndarray, defining_mut_thresh: float = 0.75, defining_mut_sister_thresh: float = 0.75) ndarray

Perform a naive screen for trait-defined clades.

This function checks if the trait frequency in a clade is above a certain threshold (defining_mut_thresh), and if the trait frequency in the sister clade is below a certain threshold (defining_mut_sister_thresh). Clades are defined as a node and all descendant nodes.

The mask_trait_absent parameter can be used to exclude nodes from consideration, for instance &’ing with alifestd_mark_leaves can be used to only consider trait frequency among descendant leaves.

Returns a numpy array of bool with the same length as the input DataFrame, with array elements as the number of nodes in the clade that have the trait. Returned array matches row order of the input DataFrame.

alifestd_sort_children_asexual(phylogeny_df: DataFrame, criterion: str, reverse: bool = False, mutate: bool = False) DataFrame

Reorder rows so children are sorted by the given criterion column, gathering children into contiguous rows.

Reorders rows so that among siblings, they appear in order of ascending criterion column values. Set reverse=True to sort descending (higher values first).

The criterion column must already be present in the dataframe (e.g., added via alifestd_mark_num_leaves_asexual).

A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

Note: after sorting, ids will no longer be contiguous with respect to row indices. Call alifestd_assign_contiguous_ids on the result to reassign contiguous ids if needed.

Parameters

phylogeny_dfpandas.DataFrame

The phylogeny as a dataframe in alife standard format.

criterionstr

Name of the column to sort children by.

reversebool, default False

If True, sort descending (higher values first).

mutatebool, default False

If True, allow mutation of the input dataframe.

Returns

pandas.DataFrame

The phylogeny with rows reordered by sorted children traversal.

See Also

alifestd_sort_children_polars :

Polars-based implementation.

alifestd_ladderize_asexual :

Convenience wrapper that sorts by num_leaves.

alifestd_assign_contiguous_ids :

Reassign contiguous ids after reordering.

alifestd_sort_children_polars(phylogeny_df: DataFrame, criterion: str | Expr, reverse: bool = False) DataFrame

Reorder rows so children are sorted by the given criterion column, gathering children into contiguous rows.

Reorders rows so that among siblings, they appear in order of ascending criterion column values. Set reverse=True to sort descending (higher values first).

The criterion column must already be present in the dataframe.

Note: after sorting, ids will no longer be contiguous with respect to row indices. Call alifestd_assign_contiguous_ids_polars on the result to reassign contiguous ids if needed.

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.

criterionstr or polars.Expr

Name of the column to sort children by, or a polars expression whose values determine the sort order.

reversebool, default False

If True, sort descending (higher values first).

Returns

polars.DataFrame

The phylogeny with rows reordered by sorted children traversal.

Raises

NotImplementedError

If ids are not contiguous or rows are not topologically sorted.

See Also

alifestd_sort_children_asexual :

Pandas-based implementation.

alifestd_ladderize_polars :

Convenience wrapper that sorts by num_leaves.

alifestd_assign_contiguous_ids_polars :

Reassign contiguous ids after reordering.

alifestd_splay_polytomies(phylogeny_df: DataFrame, mutate: bool = False) DataFrame

Use a simple splay strategy to resolve polytomies, converting them into bifurcations.

For example, ```

1

/|

2 3 4 ` becomes `

1

/

2 5

/

3 4

```

No adjustments to any branch length columns in phylogeny_df are performed. However, origin_time (as well as all other columns) of a polytomy’s parent node are duplicated in splayed-out nodes that resolve that polytomy. So, nodes added to perform the splaying-out will have zero- length subtending branches in this regard (i.e., their origin time will match their parent’s).

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_splay_polytomies_polars(phylogeny_df: DataFrame) DataFrame

Use a simple splay strategy to resolve polytomies, converting them into bifurcations.

No adjustments to any branch length columns are performed. Nodes added to perform the splaying-out will have zero-length subtending branches.

alifestd_sum_origin_time_deltas_asexual(phylogeny_df: DataFrame, mutate: bool = False) Number

Sum differences between taxa origin times and their ancestors’ origin time.

Input dataframe is not mutated by this operation unless mutate set True.

alifestd_sum_origin_time_deltas_polars(phylogeny_df: DataFrame) float

Sum origin_time_delta values.

alifestd_test_leaves_isomorphic_asexual(df1: ~pandas.core.frame.DataFrame, df2: ~pandas.core.frame.DataFrame, taxon_label: str, mutate: bool = False, progress_wrap: callable = <function <lambda>>) bool

Test if phylogenetic relationships between leaf nodes are topologically isomorphic between two phylogenies.

alifestd_test_leaves_isomorphic_polars(df1: DataFrame, df2: DataFrame, taxon_label: str) bool

Test if phylogenetic relationships between leaf nodes are topologically isomorphic between two phylogenies.

See Also

alifestd_test_leaves_isomorphic_asexual :

Pandas-based implementation.

alifestd_to_iplotx_pandas(phylogeny_df: DataFrame, mutate: bool = False) AlifestdIplotxShimPandas

Wrap a pandas phylogeny DataFrame for use with iplotx.

Parameters

phylogeny_dfpd.DataFrame

Asexual phylogeny in alife standard format with contiguous ids and topologically sorted rows.

mutatebool, default False

If True, allow modification of the input dataframe.

Returns

AlifestdIplotxShimPandas

An iplotx-compatible tree provider that can be passed directly to iplotx.tree().

alifestd_to_iplotx_polars(phylogeny_df: DataFrame) AlifestdIplotxShimPolars

Wrap a polars phylogeny DataFrame for use with iplotx.

Parameters

phylogeny_dfpolars.DataFrame

Asexual phylogeny in alife standard format with contiguous ids and topologically sorted rows.

Returns

AlifestdIplotxShimPolars

An iplotx-compatible tree provider that can be passed directly to iplotx.tree().

alifestd_to_working_format(phylogeny_df: DataFrame, mutate: bool = False) DataFrame

Re-encode phylogeny_df to facilitate efficient analysis and transformation operations.

The returned phylogeny dataframe will * be topologically sorted (i.e., organisms appear after all ancestors), * have contiguous ids (i.e., organisms’ ids correspond to row number), * contain an integer datatype ancestor_id column if the phylogeny is asexual (i.e., a more performant representation of ancestor_list).

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_to_working_format_polars(phylogeny_df: DataFrame, keep_ancestor_list: bool = False) DataFrame

Re-encode phylogeny_df to facilitate efficient analysis and transformation operations.

The returned phylogeny dataframe will * be topologically sorted (i.e., organisms appear after all ancestors), * have contiguous ids (i.e., organisms’ ids correspond to row number), * contain an integer datatype ancestor_id column if the phylogeny is asexual (i.e., a more performant representation of ancestor_list).

Parameters

phylogeny_dfpolars.DataFrame

The phylogeny as a dataframe in alife standard format.

keep_ancestor_listbool, default False

If True and ancestor_list was present in the input, regenerate the ancestor_list column from the (reassigned) ancestor_id column. The column is dropped during processing in all cases; it is only restored when this flag is set and the input already had it.

See Also

alifestd_to_working_format :

Pandas-based implementation.

alifestd_topological_sensitivity_warned(*, insert: bool, delete: bool, update: bool) Callable

Decorator that emits a topological sensitivity warning before the wrapped function executes.

The first positional argument of the decorated function must be the phylogeny dataframe (pandas).

The decorated function gains two additional keyword arguments:

  • ignore_topological_sensitivity (bool, default False): If True, suppress the topological sensitivity warning.

  • drop_topological_sensitivity (bool, default False): If True, drop topology-sensitive columns from the result and suppress the warning.

Parameters

insertbool

Whether the operation inserts new nodes.

deletebool

Whether the operation deletes nodes.

updatebool

Whether the operation updates ancestor relationships.

Returns

typing.Callable

A decorator that wraps a function with topological sensitivity warning logic.

See Also

alifestd_topological_sensitivity_warned_polars :

Polars-based implementation.

alifestd_warn_topological_sensitivity :

Underlying warning function.

alifestd_topological_sensitivity_warned_polars(*, insert: bool, delete: bool, update: bool) Callable

Decorator that emits a topological sensitivity warning before the wrapped function executes.

The first positional argument of the decorated function must be the phylogeny dataframe (polars).

The decorated function gains two additional keyword arguments:

  • ignore_topological_sensitivity (bool, default False): If True, suppress the topological sensitivity warning.

  • drop_topological_sensitivity (bool, default False): If True, drop topology-sensitive columns from the result and suppress the warning.

Parameters

insertbool

Whether the operation inserts new nodes.

deletebool

Whether the operation deletes nodes.

updatebool

Whether the operation updates ancestor relationships.

Returns

typing.Callable

A decorator that wraps a function with topological sensitivity warning logic.

See Also

alifestd_topological_sensitivity_warned :

Pandas-based implementation.

alifestd_warn_topological_sensitivity_polars :

Underlying warning function.

alifestd_topological_sort(phylogeny_df: DataFrame, mutate: bool = False) DataFrame

Sort rows so all organisms follow members of their ancestor_list.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_topological_sort_polars(phylogeny_df: DataFrame) DataFrame

Sort rows so all organisms follow members of their ancestor_id.

Uses contiguous id fast path when possible.

alifestd_try_add_ancestor_id_col(phylogeny_df: DataFrame, mutate: bool = False) DataFrame

Add an ancestor_id column to the input DataFrame if the phylogeny is asexual and the column does not already exist.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_try_add_ancestor_id_col_polars(phylogeny_df: DataFrame) DataFrame

Add an ancestor_id column to the input DataFrame if the phylogeny is asexual and the column does not already exist.

alifestd_try_add_ancestor_list_col(phylogeny_df: DataFrame_T, root_ancestor_token: str = 'none', mutate: bool = False) DataFrame_T
Add an ancestor_list column to the input DataFrame if the column does

not already exist.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_make_ancestor_list_col

This function also accepts a polars.DataFrame, for which there is a separate delegated implementation.

alifestd_try_add_ancestor_list_col_polars(phylogeny_df: DataFrame, root_ancestor_token: str = 'none', mutate: bool = False) DataFrame

Add an ancestor_list column to the input DataFrame if the column does not already exist.

Notes

Even allowed by mutate flag, no side effects occur on input dataframe under Polars implementation. Flag is included for API compatibility with Pandas implementation.

See Also

alifestd_try_add_ancestor_list_col :

Pandas-based implementation.

alifestd_ultrametricize(phylogeny_df: DataFrame, mutate: bool = False, *, method: Literal['extend'] = 'extend') DataFrame

Adjust tip origin_time values so all tips share the same time.

With method="extend", each tip’s origin_time is set to the maximum origin_time across all nodes. Internal node times are not modified.

Empty phylogenies are returned unchanged.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_ultrametricize_polars(phylogeny_df: DataFrame, *, method: Literal['extend'] = 'extend') DataFrame

Adjust tip origin_time values so all tips share the same time.

With method="extend", each tip’s origin_time is set to the maximum origin_time across all nodes. Internal node times are not modified.

Empty phylogenies are returned unchanged. Must represent an asexual phylogeny (when is_leaf is not already present).

See Also

alifestd_ultrametricize :

Pandas-based implementation.

alifestd_unfurl_lineage_asexual(phylogeny_df: DataFrame, leaf_id: int, mutate: bool = False) ndarray

List leaf_id and its ancestor id sequence through tree root.

The provided dataframe must be asexual.

alifestd_unfurl_traversal_inorder_asexual(phylogeny_df: DataFrame, mutate: bool = False) ndarray

List id values in semiorder traversal order, with left children visited first.

The provided dataframe must be asexual and strictly bifurcating.

alifestd_unfurl_traversal_inorder_polars(phylogeny_df: DataFrame) ndarray

List node indices in inorder traversal order, with left children visited first.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual, strictly bifurcating phylogeny with contiguous ids and topologically sorted rows.

Returns

np.ndarray

Index array giving inorder traversal order.

See Also

alifestd_unfurl_traversal_inorder_asexual :

Pandas-based implementation.

alifestd_unfurl_traversal_levelorder_asexual(phylogeny_df: DataFrame, mutate: bool = False) ndarray

List id values in levelorder (BFS) traversal order.

The provided dataframe must be asexual.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_unfurl_traversal_levelorder_polars(phylogeny_df: DataFrame) ndarray

List node indices in levelorder (BFS) traversal order.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.

Returns

np.ndarray

Index array giving levelorder (BFS) traversal order.

See Also

alifestd_unfurl_traversal_levelorder_asexual :

Pandas-based implementation.

alifestd_unfurl_traversal_postorder_asexual(phylogeny_df: DataFrame, mutate: bool = False) ndarray

List id values in postorder traversal order.

The provided dataframe must be asexual.

alifestd_unfurl_traversal_postorder_contiguous_asexual(phylogeny_df: DataFrame, mutate: bool = False, child_order: Literal['asc', 'desc'] | None = None) ndarray

List node indices in DFS postorder traversal order, with subtree contiguity.

The provided dataframe must be asexual with contiguous ids and topologically sorted rows.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

Parameters

phylogeny_dfpd.DataFrame

Asexual phylogeny in alife standard format with contiguous ids and topologically sorted rows.

mutatebool, default False

If True, allow modification of the input dataframe.

child_order{“asc”, “desc”, None}, default None

Order in which siblings are visited when descending the tree. "asc" visits smallest-id child first, "desc" visits largest-id child first, and None uses an arbitrary (implementation-defined) order.

alifestd_unfurl_traversal_postorder_contiguous_polars(phylogeny_df: DataFrame) ndarray

List node indices in DFS postorder traversal order, with subtree contiguity.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.

Returns

np.ndarray

Index array giving DFS postorder traversal order.

See Also

alifestd_unfurl_traversal_postorder_asexual :

Pandas-based implementation.

alifestd_unfurl_traversal_postorder_polars(phylogeny_df: DataFrame) ndarray

List node indices in postorder traversal order.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.

Returns

np.ndarray

Index array giving postorder traversal order.

See Also

alifestd_unfurl_traversal_postorder_asexual :

Pandas-based implementation.

alifestd_unfurl_traversal_preorder_asexual(phylogeny_df: DataFrame, mutate: bool = False) ndarray

List id values in DFS preorder traversal order.

The provided dataframe must be asexual.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_unfurl_traversal_preorder_polars(phylogeny_df: DataFrame) ndarray

List node indices in DFS preorder traversal order.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.

Returns

np.ndarray

Index array giving DFS preorder traversal order.

See Also

alifestd_unfurl_traversal_preorder_asexual :

Pandas-based implementation.

alifestd_unfurl_traversal_semiorder_asexual(phylogeny_df: DataFrame, mutate: bool = False) ndarray

List id values in semiorder traversal order.

An inorder traversal where either left child (smaller id) or right child (larger id) may be visited first.

The provided dataframe must be asexual and strictly bifurcating.

alifestd_unfurl_traversal_semiorder_polars(phylogeny_df: DataFrame) ndarray

List node indices in semiorder traversal order.

An inorder traversal where either left child (smaller id) or right child (larger id) may be visited first.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual, strictly bifurcating phylogeny with contiguous ids and topologically sorted rows.

Returns

np.ndarray

Index array giving semiorder traversal order.

See Also

alifestd_unfurl_traversal_semiorder_asexual :

Pandas-based implementation.

alifestd_unfurl_traversal_topological_asexual(phylogeny_df: DataFrame, mutate: bool = False) ndarray

List id values in topological traversal order.

Parents are visited before children. If the dataframe is already topologically sorted, the existing id order is returned directly. Otherwise, a topological ordering is computed.

The provided dataframe must be asexual.

Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.

alifestd_unfurl_traversal_topological_polars(phylogeny_df: DataFrame) ndarray

List node indices in topological traversal order.

Parents are visited before children. If the dataframe is already topologically sorted, the existing row indices are returned directly. Otherwise, a topological ordering is computed.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

Must represent an asexual phylogeny with contiguous ids.

Returns

np.ndarray

Index array giving topological traversal order.

See Also

alifestd_unfurl_traversal_topological_asexual :

Pandas-based implementation.

alifestd_validate(phylogeny_df: DataFrame, mutate: bool = False, diagnose: bool = True) bool

Is the phylogeny compliant to alife data standards?

Input dataframe is not mutated by this operation unless mutate set True. If diagnose is set, the failing validation subcheck will warn.

alifestd_warn_topological_sensitivity(phylogeny_df: DataFrame, caller: str, *, insert: bool, delete: bool, update: bool) None

Emit a warning if phylogeny_df contains columns that may be invalidated by topological operations.

Parameters

phylogeny_dfpandas.DataFrame

The phylogeny as a dataframe in alife standard format.

callerstr

Name of the calling function, included in the warning message.

insertbool

Whether the operation inserts new nodes.

deletebool

Whether the operation deletes nodes.

updatebool

Whether the operation updates ancestor relationships.

Input dataframe is not mutated by this operation.

See Also

alifestd_warn_topological_sensitivity_polars :

Polars-based implementation.

alifestd_warn_topological_sensitivity_polars(phylogeny_df: DataFrame | LazyFrame, caller: str, *, insert: bool, delete: bool, update: bool) None

Emit a warning if phylogeny_df contains columns that may be invalidated by topological operations.

Parameters

phylogeny_dfpolars.DataFrame or polars.LazyFrame

The phylogeny as a dataframe in alife standard format.

callerstr

Name of the calling function, included in the warning message.

insertbool

Whether the operation inserts new nodes.

deletebool

Whether the operation deletes nodes.

updatebool

Whether the operation updates ancestor relationships.

See Also

alifestd_warn_topological_sensitivity :

Pandas-based implementation.