legacy
Functions
|
Add a new global root node that all existing roots point to. |
|
Add a new global root node that all existing roots point to. |
|
For all inner nodes, add a subtending unifurcation ("knuckle"). |
|
For all inner nodes, add a subtending unifurcation ("knuckle"). |
|
Create a zero-length branch with leaf node for each inner node. |
|
For all inner nodes, add a subtending unifurcation, adding a "nibling" leaf as the child of the knuckle. |
|
For all inner nodes, add a subtending unifurcation, adding a "nibling" leaf as the child of the knuckle. |
|
Concatenate independent phylogenies, reassigning organism ids to prevent collisions. |
Concatenate independent phylogenies, reassigning organism ids to prevent collisions. |
|
|
Convert phylogeny dataframe to Newick format. |
|
Convert phylogeny dataframe to Newick format. |
|
Reassign so each organism's id corresponds to its row number. |
Reassign so each organism's id corresponds to its row number. |
|
|
Set root_ancestor_token for "ancestor_list" column. |
Find ancestor ids of nodes that are lookback_n nodes away in the phylogeny. |
|
Find ancestor ids of nodes that precede each phylogeny node by at least lookback_origin_time_delta branch distance. |
|
Count how many nodes within each clade have a given trait. |
|
Calculate what fraction of nodes within each clade have a given trait. |
|
|
Calculate pairwise distances between all taxa via their MRCAs. |
|
Calculate pairwise distances between all taxa via their MRCAs. |
|
Calculate the Most Recent Common Ancestor (MRCA) taxon id for each pair of taxa. |
Calculate the Most Recent Common Ancestor (MRCA) taxon id for each pair of taxa. |
|
|
Calculate the Most Recent Common Ancestor (MRCA) taxon id for target_id and each other taxon. |
Calculate the Most Recent Common Ancestor (MRCA) taxon id for target_id and each other taxon. |
|
|
Count how many fewer inner nodes are contained in phylogeny than expected if strictly bifurcationg. |
Count how many fewer inner nodes are contained in phylogeny than expected if strictly bifurcating. |
|
|
Assess the topological configuration of three id's in phylogeny_df. |
Return names of columns present in phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations. |
|
Return names of columns present in phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations. |
|
|
Sort rows so all organisms appear in chronological order, default origin_time. |
|
Sort rows so all organisms appear in chronological order, default origin_time. |
|
Coarsen a phylogeny by collapsing inner nodes within dilation windows. |
|
Coarsen a phylogeny by collapsing inner nodes within dilation windows. |
|
Pare record to bypass organisms outside mask. |
|
Condense consecutive phylogeny nodes sharing identical trait values, according to values in by column(s). |
Build per-column aggregation rules for asexual taxa coarsening. |
|
For any taxa with origin time preceding its parent's, set origin time to parent's origin time. |
|
|
Collapse entries masked by is_trunk column, keeping only the oldest root. |
|
Collapse entries masked by is_trunk column, keeping only the oldest root. |
|
Pare record to bypass organisms with one ancestor and one descendant. |
Pare record to bypass organisms with one ancestor and one descendant. |
|
|
Set root_ancestor_token for ancestor_list series. |
|
How many taxa are direct descendants of the given parent? |
How many taxa are direct descendants of the given parent? |
|
|
Count how many non-leaf nodes are contained in phylogeny. |
|
Count how many non-leaf nodes are contained in phylogeny. |
|
How many leaf nodes are contained in phylogeny? |
|
How many leaf nodes are contained in phylogeny? |
|
Count how many inner nodes have more than two descendant nodes. |
|
Count how many inner nodes have more than two descendant nodes. |
|
How many root nodes are contained in phylogeny? |
|
How many root nodes are contained in phylogeny? |
How many root nodes with one child are contained in phylogeny? |
|
How many root nodes with one child are contained in phylogeny? |
|
|
Count how many inner nodes have exactly one descendant node. |
|
Count how many inner nodes have exactly one descendant node. |
|
Delete entries masked by is_trunk column. |
Delete entries masked by is_trunk column. |
|
Pare record to bypass root nodes with only one descendant. |
|
Pare record to bypass root nodes with only one descendant. |
|
|
Create a subsample phylogeny containing n_downsample tips. |
Retain the n_downsample leaves with the largest criterion values and prune extinct lineages. |
|
Retain the n_downsample leaves with the largest criterion values and prune extinct lineages. |
|
Create a subsample phylogeny containing at most n_downsample tips, comprising a single clade within the original phylogeny. |
|
|
Create a subsample phylogeny containing at most n_downsample tips, comprising a single clade within the original phylogeny. |
Retain the n_downsample leaves closest to the lineage of a target leaf. |
|
Retain the n_downsample leaves closest to the lineage of a target leaf. |
|
Retain leaves per stratified group, chosen by proximity to the lineage of a target leaf. |
|
Retain leaves per stratified group, chosen by proximity to the lineage of a target leaf. |
|
|
Create a subsample phylogeny containing n_downsample tips. |
Create a subsample phylogeny containing n_downsample tips. |
|
Create a subsample phylogeny containing n_downsample tips. |
|
|
Drop columns from phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations. |
Drop columns from phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations. |
|
Estimate the triplet distance between two asexual phylogenetic trees in alife sampling sets of three leaf taxa and counting the fraction whose phylogenetic connectivity mismatch between trees. |
|
Return the id of a taxon with origin time preceding its parent's, if any are present. |
|
Return the id of a taxon with origin time preceding its parent's, if any are present. |
|
|
What ids are not listed in any ancestor_list? |
|
What ids are ancestor to no other ids? |
|
Find most recent common ancestor of leaf_ids. |
|
Find the pairwise distance between two taxa via their MRCA. |
|
Find the pairwise distance between two taxa via their MRCA. |
|
Find the Most Recent Common Ancestor of two taxa. |
|
Find the Most Recent Common Ancestor of two taxa. |
|
What ids have an empty ancestor_list? |
|
What ids have an empty ancestor_list? |
|
Convert Avida |
|
Convert Avida |
|
Convert a Newick format string to a phylogeny dataframe. |
|
Convert a Newick format string to a phylogeny dataframe. |
|
Are id values between 0 and len(phylogeny_df), in any order? |
|
Are id values between 0 and len(phylogeny_df), in any order? |
|
Do organisms ids' correspond to their row number? |
|
Do organisms ids' correspond to their row number? |
|
Do offspring have larger id values than ancestors? |
|
Do offspring have larger id values than ancestors? |
|
Does the phylogeny two or more root organisms? |
|
Does the phylogeny have two or more root organisms? |
|
Do all organisms in the phylogeny have one or no immediate ancestor? |
|
Do all organisms in the phylogeny have one or no immediate ancestor? |
|
Do any organisms have origin_time`s preceding members of their `ancestor_list? |
Check if all taxa have origin times at or after their ancestor's origin time. |
|
|
Do rows appear in chronological order? |
Do rows appear in chronological order? |
|
|
Do any organisms in the phylogeny have than one immediate ancestor? |
|
Do any organisms in the phylogeny have more than one immediate ancestor? |
Are all organisms listed after members of their ancestor_list? |
|
Are all internal nodes strictly bifurcating (exactly 2 children)? |
|
|
Are all organisms listed after members of their ancestor_list? |
Are all organisms listed after members of their ancestor_list? |
|
|
Do all tips share the same origin_time (within |
|
Do all tips share the same origin_time (within |
|
Test if phylogeny_df is an asexual phylogeny in working format. |
|
Test if phylogeny_df is an asexual phylogeny in working format. |
|
Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id. |
|
Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id. |
|
Reorder rows so children are sorted by number of descendant leaves, gathering children into contiguous rows. |
|
Reorder rows so children are sorted by number of descendant leaves, gathering children into contiguous rows. |
|
Translate ancestor ids from a column of singleton `ancestor_list`s into a pure-integer series representation. |
|
Translate ancestor ids from a column of singleton `ancestor_list`s into a pure-integer series representation. |
|
Translate a column of integer ancestor id values into alife standard |
|
Translate a column of integer ancestor id values into alife standard ancestor_list representation. |
Build a perfectly balanced bifurcating tree of given depth. |
|
Build a perfectly balanced bifurcating tree of given depth. |
|
|
Build a comb/caterpillar tree with n_leaves leaves. |
|
Build a comb/caterpillar tree with n_leaves leaves. |
|
Build a random bifurcating tree via edge-split (PDA) sampling. |
|
Build a random bifurcating tree via edge-split (PDA) sampling. |
|
Create an alife standard phylogeny dataframe with zero rows. |
|
Create an alife standard phylogeny dataframe with zero rows. |
|
Build a random bifurcating tree via leaf-split (Yule) sampling. |
|
Build a random bifurcating tree via leaf-split (Yule) sampling. |
|
Build a star tree with n_leaves leaves. |
|
Build a star tree with n_leaves leaves. |
Add column ancestor_origin_time. |
|
Add column ancestor_origin_time. |
|
|
Add column clade_duration, containing the difference between each the origin_time of each node and the maximum origin_time of its descendants. |
|
Add column clade_duration, containing the difference between each node's origin_time and the maximum origin_time of its descendants. |
Add column clade_duration_ratio_sister, containing the ratio of each clade's duration to that of its sister. |
|
Add column clade_duration_ratio_sister, containing the ratio of each clade's duration to that of its sister. |
|
|
Add column clade_faithpd, containing sum branch length among descendant noes. |
|
Add column clade_faithpd, containing sum branch length among descendant nodes. |
Add column clade_fblr_growth_children, containing the coefficient of a fblr regression fit to origin times of the leaf descendants of each node. |
|
Add column clade_fblr_growth_children, containing the coefficient of a fblr regression fit to origin times of this clade's descendant leaves versus those of its sister clade. |
|
Add column clade_leafcount_ratio_sister, containing the ratio of each clade's leaf count to that of its sister. |
|
Add column clade_leafcount_ratio_sister, containing the ratio of each clade's leaf count to that of its sister. |
|
Add column clade_logistic_growth_children, containing the coefficient of a logistic regression fit to origin times of the leaf descendants of each node. |
|
Add column clade_logistic_growth_children, containing the coefficient of a logistic regression fit to origin times of this clade's descendant leaves versus those of its sister clade. |
|
Add column clade_nodecount_ratio_sister, containing the ratio of each clade size to that of its sister. |
|
Add column clade_nodecount_ratio_sister, containing the ratio of each clade size to that of its sister. |
|
Add column clade_subtended_duration, containing the difference between each the origin_time of each node's ancestor and the maximum origin_time of its descendants. |
|
Add column clade_subtended_duration, containing the difference between each node's ancestor's origin_time and the maximum origin_time of its descendants. |
|
|
Add column clade_subtended_duration_ratio_sister, containing the ratio of each clade's subtended duration to that of its sister. |
|
Add column clade_subtended_duration_ratio_sister, containing the ratio of each clade's subtended duration to that of its sister. |
|
Add column colless_index with Colless imbalance index for each subtree. |
Add column colless_index_corrected with the corrected Colless index for each subtree. |
|
Add column colless_index_corrected with the corrected Colless index for each subtree. |
|
|
Add column colless_index with Colless imbalance index for each subtree. |
Add column colless_like_index_mdm with Colless-like index using mean deviation from the median (MDM) as dissimilarity. |
|
Add column colless_like_index_mdm with Colless-like index using mean deviation from the median (MDM) as dissimilarity. |
|
Add column colless_like_index_sd with Colless-like index using sample standard deviation as dissimilarity. |
|
Add column colless_like_index_sd with Colless-like index using sample standard deviation as dissimilarity. |
|
Add column colless_like_index_var with Colless-like index using sample variance as dissimilarity. |
|
Add column colless_like_index_var with Colless-like index using sample variance as dissimilarity. |
|
|
Add column csr_children, a flat array of child ids grouped by parent according to CSR offsets from the csr_offsets column. |
|
Add column csr_children, a flat array of child ids grouped by parent according to CSR offsets from the csr_offsets column. |
|
Add column csr_offsets, the CSR offset where each node's children begin in the corresponding csr_children array. |
|
Add column csr_offsets, the CSR offset where each node's children begin in the corresponding csr_children array. |
|
Add column first_child_id, the smallest-id child of each node. |
|
Add column first_child_id, the smallest-id child of each node. |
|
Add column is_left_child, containing for each node whether it is the smaller-id child. |
|
Add column is_left_child, containing for each node whether it is the smaller-id child. |
|
Add column is_right_child, containing for each node whether it is the larger-id child. |
|
Add column is_right_child, containing for each node whether it is the larger-id child. |
|
What rows are ancestor to no other row? |
|
Add column is_leaf marking rows that are ancestor to no other row. |
|
Add column left_child, containing for each node its smallest-id child. |
|
Add column left_child_id, containing for each node its smallest-id child. |
|
Add column with maximum of |
|
Add column with maximum of |
|
Add column with minimum of |
|
Add column with minimum of |
|
Add column with cumulative product of |
|
Add column with cumulative product of |
|
Add column with cumulative sum of |
|
Add column with cumulative sum of |
Add column max_descendant_origin_time, excluding self. |
|
Add column max_descendant_origin_time, excluding self. |
|
|
Add column next_sibling_id, the next-highest id sharing the same parent. |
|
Add column next_sibling_id, the next-highest id sharing the same parent. |
|
Add column node_depth, counting the number of nodes between a node and the root. |
|
Add column node_depth, counting the number of nodes between a node and the root. |
|
Add column num_children, counting for each node the number of nodes it is parent to. |
|
Add column num_children, counting for each node the number of nodes it is parent to. |
|
Add column num_descendants, excluding self. |
|
Add column num_descendants, excluding self. |
|
Add column num_leaves with count of all descendant leaves, including self if a leaf. |
|
Add column num_leaves with count of all descendant leaves, including self if a leaf. |
Mark the number of leaves descendant from each node's siblings. |
|
Mark the number of leaves descendant from each node's siblings. |
|
Add column num_preceding_leaves with count of all leaves occurring before the present node in an inorder traversal. |
|
Add column num_preceding_leaves with count of all leaves occurring before the present node in an inorder traversal. |
|
|
Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id. |
|
Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id. |
Add columns origin_time_delta and ancestor_origin_time. |
|
Add columns origin_time_delta and ancestor_origin_time. |
|
|
Appends columns characterizing the Most Recent Common Ancestor (MRCA) of the entire extant population at each taxon's origin_time. |
|
Appends columns characterizing the Most Recent Common Ancestor (MRCA) of the entire extant population at each taxon's origin_time. |
|
Add column prev_sibling_id, the next-lowest id sharing the same parent. |
|
Add column prev_sibling_id, the next-lowest id sharing the same parent. |
|
Add column right_child, containing for each node its largest-id child. |
|
Add column right_child_id, containing for each node its largest-id child. |
|
Add column root_id, containing the id of entries' ultimate ancestor. |
|
Add column root_id, containing the id of entries' ultimate ancestor. |
|
Create column is_root to mark rows with no ancestor. |
|
Create column is_root to mark rows with no ancestor. |
|
Add column sackin_index with Sackin index for each subtree. |
|
Add column sackin_index with Sackin index for each subtree. |
|
Mark a random subsample of n_sample tips. |
Mark the n_sample leaves with the largest criterion values. |
|
Mark the n_sample leaves with the largest criterion values. |
|
Mark tips belonging to a randomly sampled clade of at most n_sample tips. |
|
Mark tips belonging to a randomly sampled clade of at most n_sample tips. |
|
Mark the n_sample leaves closest to the lineage of a target leaf. |
|
Mark the n_sample leaves closest to the lineage of a target leaf. |
|
Mark leaves per stratified group, chosen by proximity to the lineage of a target leaf. |
|
Mark leaves per stratified group, chosen by proximity to the lineage of a target leaf. |
|
|
Mark a random subsample of n_sample tips. |
Mark a random subsample of n_sample tips. |
|
Mark a random subsample of n_sample tips. |
|
|
Add column sister, containing the id of each node's sibling. |
|
Add column sister_id, containing the id of each node's sibling. |
|
For given ancestor nodes, create a mask identifying those nodes and all descendants. |
For given ancestor nodes, create a mask identifying those nodes and all descendants. |
|
Compute a mask marking "monomorphic" clades where all members with a trait defined value share the same trait value. |
|
|
Parse at most a single ancestor id from an ancestor_list field. |
|
Parse ancestor ids from an ancestor_list field. |
|
Pipe a phylogeny DataFrame through a sequence of unary operations. |
|
Pipe a phylogeny DataFrame through a sequence of unary operations. |
|
Add new roots to the phylogeny, prefixing existing roots. |
|
Add new roots to the phylogeny, prefixing existing roots. |
Drop taxa without extant descendants. |
|
Drop taxa without extant descendants. |
|
|
Reroot phylogeny, preserving topology. |
|
Reroot phylogeny at specified node id, preserving topology. |
Sample triplet comparisons between two asexual phylogenetic trees in alife standard form, creating a DataFrame with the triplet categorizations and comparison results as well as corresponding data from MRCA row within the first tree. |
|
Perform a screen for trait-defined clades based on Fisher's exact test. |
|
Perform a maximum parsimony screen for trait-defined clades using Fitch's algorithm. |
|
Perform a naive screen for trait-defined clades. |
|
|
Reorder rows so children are sorted by the given criterion column, gathering children into contiguous rows. |
|
Reorder rows so children are sorted by the given criterion column, gathering children into contiguous rows. |
|
Use a simple splay strategy to resolve polytomies, converting them into bifurcations. |
|
Use a simple splay strategy to resolve polytomies, converting them into bifurcations. |
Sum differences between taxa origin times and their ancestors' origin time. |
|
Sum origin_time_delta values. |
|
|
Test if phylogenetic relationships between leaf nodes are topologically isomorphic between two phylogenies. |
|
Test if phylogenetic relationships between leaf nodes are topologically isomorphic between two phylogenies. |
|
Wrap a pandas phylogeny DataFrame for use with iplotx. |
|
Wrap a polars phylogeny DataFrame for use with iplotx. |
|
Re-encode phylogeny_df to facilitate efficient analysis and transformation operations. |
|
Re-encode phylogeny_df to facilitate efficient analysis and transformation operations. |
Decorator that emits a topological sensitivity warning before the wrapped function executes. |
|
Decorator that emits a topological sensitivity warning before the wrapped function executes. |
|
|
Sort rows so all organisms follow members of their ancestor_list. |
|
Sort rows so all organisms follow members of their ancestor_id. |
|
Add an ancestor_id column to the input DataFrame if the phylogeny is asexual and the column does not already exist. |
Add an ancestor_id column to the input DataFrame if the phylogeny is asexual and the column does not already exist. |
|
|
Add an ancestor_list column to the input DataFrame if the column does |
Add an ancestor_list column to the input DataFrame if the column does not already exist. |
|
|
Adjust tip origin_time values so all tips share the same time. |
|
Adjust tip origin_time values so all tips share the same time. |
|
List leaf_id and its ancestor id sequence through tree root. |
List id values in semiorder traversal order, with left children visited first. |
|
List node indices in inorder traversal order, with left children visited first. |
|
List id values in levelorder (BFS) traversal order. |
|
List node indices in levelorder (BFS) traversal order. |
|
List id values in postorder traversal order. |
|
List node indices in DFS postorder traversal order, with subtree contiguity. |
|
List node indices in DFS postorder traversal order, with subtree contiguity. |
|
List node indices in postorder traversal order. |
|
List id values in DFS preorder traversal order. |
|
List node indices in DFS preorder traversal order. |
|
List id values in semiorder traversal order. |
|
List node indices in semiorder traversal order. |
|
List id values in topological traversal order. |
|
List node indices in topological traversal order. |
|
|
Is the phylogeny compliant to alife data standards? |
Emit a warning if phylogeny_df contains columns that may be invalidated by topological operations. |
|
Emit a warning if phylogeny_df contains columns that may be invalidated by topological operations. |
Classes
Numpy-backed iplotx |
|
Iplotx |
|
Iplotx |
- class AlifestdIplotxShimNumpy
Numpy-backed iplotx
TreeDataProviderfor alife-standard data.This class assumes contiguous ids (
id == row index) and topologically sorted rows (ancestors appear before descendants).Parameters
- ancestor_idsnp.ndarray
Integer array of ancestor ids; roots satisfy
ancestor_ids[i] == i.- namesnp.ndarray, optional
Per-node name strings.
- branch_lengthsnp.ndarray, optional
Per-node branch lengths (edge from parent to this node).
- __init__(ancestor_ids: ndarray, names: ndarray | None = None, branch_lengths: ndarray | None = None) None[source]
- get_subtree(node: _AlifestdNode) AlifestdIplotxShimNumpy[source]
- class AlifestdIplotxShimPandas
Iplotx
TreeDataProviderfor pandas alife-standard dataframes.The dataframe must be asexual with contiguous ids and topologically sorted rows. An
ancestor_idcolumn will be derived fromancestor_listif needed.Parameters
- treepd.DataFrame
Pandas phylogeny dataframe in alife standard format.
- mutatebool, default False
If True, allow modification of the input dataframe.
- class AlifestdIplotxShimPolars
Iplotx
TreeDataProviderfor polars alife-standard dataframes.The dataframe must be asexual with contiguous ids and topologically sorted rows.
Parameters
- treepolars.DataFrame
Polars phylogeny dataframe in alife standard format.
- alifestd_add_global_root(phylogeny_df: DataFrame, mutate: bool = False, root_attrs: Mapping[str, Any] = mappingproxy({})) DataFrame
Add a new global root node that all existing roots point to.
The new root node will have columns id, ancestor_id (if applicable), ancestor_list (if applicable), and any columns specified in root_attrs. All other columns will be NaN for the new root row.
Parameters
- phylogeny_dfpd.DataFrame
Phylogeny dataframe in alife standard format.
- mutatebool, default False
If True, allows mutation of the input dataframe.
- root_attrsMapping[str, Any], default {}
Column values to set on the new global root row, e.g.,
{"origin_time": 0.0, "taxon_label": "root"}.Keys
"id","ancestor_id", and"ancestor_list"are reserved and may not be specified; a ValueError is raised if any are present.
Returns
- pd.DataFrame
The phylogeny dataframe with a new global root added.
Raises
- ValueError
If root_attrs contains reserved keys.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_add_global_root_polars(phylogeny_df: DataFrame) DataFrame
Add a new global root node that all existing roots point to.
- alifestd_add_inner_knuckles_asexual(phylogeny_df: DataFrame, mutate: bool = False) DataFrame
For all inner nodes, add a subtending unifurcation (“knuckle”).
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_add_inner_knuckles_polars(phylogeny_df: DataFrame) DataFrame
For all inner nodes, add a subtending unifurcation (“knuckle”).
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny with contiguous ids and topological sort order.
Returns
- polars.DataFrame
The phylogeny with knuckle nodes added for each inner node.
See Also
- alifestd_add_inner_knuckles_asexual :
Pandas-based implementation.
- alifestd_add_inner_leaves(phylogeny_df: DataFrame, mutate: bool = False) DataFrame
Create a zero-length branch with leaf node for each inner node.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_add_inner_niblings_asexual(phylogeny_df: DataFrame, mutate: bool = False) DataFrame
For all inner nodes, add a subtending unifurcation, adding a “nibling” leaf as the child of the knuckle.
Here, “nibling” refers to a leaf that is a neice/nephew of the inner node. If not topologically sorted, a topological sort will be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_add_inner_niblings_polars(phylogeny_df: DataFrame) DataFrame
For all inner nodes, add a subtending unifurcation, adding a “nibling” leaf as the child of the knuckle.
Here, “nibling” refers to a leaf that is a niece/nephew of the inner node.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.
Returns
- polars.DataFrame
The phylogeny with inner niblings added.
See Also
- alifestd_add_inner_niblings_asexual :
Pandas-based implementation.
- alifestd_aggregate_phylogenies(phylogeny_dfs: List[DataFrame], mutate: bool = False) DataFrame
Concatenate independent phylogenies, reassigning organism ids to prevent collisions.
Inputs dataframe are not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_aggregate_phylogenies_polars(phylogeny_dfs: List[DataFrame]) DataFrame
Concatenate independent phylogenies, reassigning organism ids to prevent collisions.
Assumes asexual phylogenies with contiguous ids, topologically sorted, and with an
ancestor_idcolumn (notancestor_list).See Also
- alifestd_aggregate_phylogenies :
Pandas-based implementation.
- alifestd_as_newick_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, taxon_label: str | None = None, progress_wrap: ~typing.Callable = <function <lambda>>) str
Convert phylogeny dataframe to Newick format.
Parameters
- phylogeny_dfpd.DataFrame
Phylogeny dataframe in Alife standard format.
- mutatebool, optional
Allow in-place mutations of the input dataframe, by default False.
- taxon_labelstr, optional
Column to use for taxon labels, by default None.
- progress_wraptyping.Callable, optional
Pass tqdm or equivalent to display a progress bar.
- alifestd_as_newick_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, *, taxon_label: str | None = None, progress_wrap: ~typing.Callable = <function <lambda>>) str
Convert phylogeny dataframe to Newick format.
Parameters
- phylogeny_dfpolars.DataFrame
Phylogeny dataframe in Alife standard format.
- taxon_labelstr, optional
Column to use for taxon labels, by default None.
- progress_wraptyping.Callable, optional
Pass tqdm or equivalent to display a progress bar.
See Also
- alifestd_as_newick_asexual :
Pandas-based implementation.
- alifestd_assign_contiguous_ids(phylogeny_df: DataFrame, mutate: bool = False) DataFrame
Reassign so each organism’s id corresponds to its row number.
Organisms retain the same row location; only id numbers change. Input dataframe is not mutated by this operation unless mutate True.
- alifestd_assign_contiguous_ids_polars(phylogeny_df: DataFrame) DataFrame
Reassign so each organism’s id corresponds to its row number.
Organisms retain the same row location; only id numbers change.
- alifestd_assign_root_ancestor_token(phylogeny_df: DataFrame, root_ancestor_token: str, mutate: bool = False) DataFrame
Set root_ancestor_token for “ancestor_list” column.
The option root_ancestor_token will be sandwiched in brackets to create the ancestor list entry for genesis organisms. For example, the token “None” will yield the entry “[None]” and the token “” will yield the entry
- alifestd_calc_clade_lookback_n_asexual(phylogeny_df: DataFrame, lookback_n: int, mutate: bool = False) ndarray
Find ancestor ids of nodes that are lookback_n nodes away in the phylogeny.
The root node will be returned if the lookback distance exceeds available nodes.
Returns a numpy array of the same length as the input DataFrame, with array elements as the number of nodes in the clade that have the trait. Returned array matches row order of the input DataFrame.
- alifestd_calc_clade_lookback_origin_time_delta_asexual(phylogeny_df: DataFrame, lookback_origin_time_delta: float, mutate: bool = False) ndarray
Find ancestor ids of nodes that precede each phylogeny node by at least lookback_origin_time_delta branch distance.
The root node will be returned if the lookback distance exceeds available nodes.
Returns a numpy array of the same length as the input DataFrame, with array elements as the number of nodes in the clade that have the trait. Returned array matches row order of the input DataFrame.
- alifestd_calc_clade_trait_count_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, trait_mask: ndarray) ndarray
Count how many nodes within each clade have a given trait.
Clades are defined as a node and all descendant nodes.
Returns a numpy array of the same length as the input DataFrame, with array elements as the number of nodes in the clade that have the trait. Returned array matches row order of the input DataFrame.
- alifestd_calc_clade_trait_frequency_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mask_trait_absent: ndarray, mask_trait_present: ndarray) ndarray
Calculate what fraction of nodes within each clade have a given trait.
Clades are defined as a node and all descendant nodes. The mask_trait_absent parameter can be used to exclude nodes from consideration, for instance &’ing with alifestd_mark_leaves can be used to only consider trait frequency among descendant leaves.
Returns a numpy array of the same length as the input DataFrame, with array elements as the number of nodes in the clade that have the trait. Returned array matches row order of the input DataFrame.
- alifestd_calc_distance_matrix_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, criterion: str = 'origin_time', progress_wrap: ~typing.Callable = <function <lambda>>) ndarray
Calculate pairwise distances between all taxa via their MRCAs.
The distance between two taxa is computed as the sum of criterion differences between each taxon and their Most Recent Common Ancestor (MRCA):
- distance[i, j] = (criterion[i] - criterion[mrca])
(criterion[j] - criterion[mrca])
Taxa sharing no common ancestor will have distance NaN.
Pass tqdm or equivalent as progress_wrap to display a progress bar.
Input dataframe is not mutated by this operation unless mutate set True.
Parameters
- phylogeny_dfpd.DataFrame
Phylogeny in alife standard format.
- mutatebool, default False
If True, allows in-place modification of phylogeny_df.
- criterionstr, default “origin_time”
Column name used to measure distance between taxa and their MRCA.
- progress_wrapcallable, optional
Wrapper for progress display (e.g., tqdm).
Returns
- np.ndarray
n x n float64 matrix of pairwise distances. Entry [i, j] is NaN when organisms i and j share no common ancestor.
See Also
- alifestd_calc_mrca_id_matrix_asexual :
Computes the MRCA id matrix used internally by this function.
- alifestd_find_pair_distance_asexual :
Computes distance for a single pair of taxa.
- alifestd_calc_distance_matrix_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, *, criterion: str | ~polars.expr.expr.Expr = 'origin_time', progress_wrap: ~typing.Callable = <function <lambda>>) ndarray
Calculate pairwise distances between all taxa via their MRCAs.
The distance between two taxa is computed as the sum of criterion differences between each taxon and their Most Recent Common Ancestor (MRCA):
- distance[i, j] = (criterion[i] - criterion[mrca])
(criterion[j] - criterion[mrca])
Taxa sharing no common ancestor will have distance NaN.
Pass tqdm or equivalent as progress_wrap to display a progress bar.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny in working format (i.e., topologically sorted with contiguous ids and an ancestor_id column, or an ancestor_list column from which ancestor_id can be derived).
- criterionstr or polars.Expr, default “origin_time”
Column name or polars expression used to measure distance between taxa and their MRCA.
- progress_wrapcallable, optional
Wrapper for progress display (e.g., tqdm).
Returns
- numpy.ndarray
Array of shape (n, n) with dtype float64, containing pairwise distances. Entries are NaN where organisms share no common ancestor.
See Also
- alifestd_calc_distance_matrix_asexual :
Pandas-based implementation.
- alifestd_calc_mrca_id_matrix_asexual_polars :
Computes the underlying MRCA id matrix.
- alifestd_find_pair_distance_polars :
Computes distance for a single pair of taxa.
- alifestd_calc_mrca_id_matrix_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, progress_wrap: ~typing.Callable = <function <lambda>>) ndarray
Calculate the Most Recent Common Ancestor (MRCA) taxon id for each pair of taxa.
Taxa sharing no common ancestor will have MRCA id -1.
Pass tqdm or equivalent as progress_wrap to display a progress bar.
Input dataframe is not mutated by this operation unless mutate set True.
- alifestd_calc_mrca_id_matrix_asexual_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, *, progress_wrap: ~typing.Callable = <function <lambda>>) ndarray
Calculate the Most Recent Common Ancestor (MRCA) taxon id for each pair of taxa.
Taxa sharing no common ancestor will have MRCA id -1.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny in working format (i.e., topologically sorted with contiguous ids and an ancestor_id column, or an ancestor_list column from which ancestor_id can be derived).
- progress_wrapcallable, optional
Wrapper for progress display (e.g., tqdm).
Returns
- numpy.ndarray
Array of shape (n, n) with dtype int64, containing MRCA ids for each pair of organisms. Entries are -1 where organisms share no common ancestor.
See Also
- alifestd_calc_mrca_id_matrix_asexual :
Pandas-based implementation.
- alifestd_calc_mrca_id_vector_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, target_id: int, progress_wrap: ~typing.Callable = <function <lambda>>) ndarray
Calculate the Most Recent Common Ancestor (MRCA) taxon id for target_id and each other taxon.
Taxa sharing no common ancestor will have MRCA id -1.
Pass tqdm or equivalent as progress_wrap to display a progress bar.
Input dataframe is not mutated by this operation unless mutate set True.
- alifestd_calc_mrca_id_vector_asexual_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, *, target_id: int, progress_wrap: ~typing.Callable = <function <lambda>>) ndarray
Calculate the Most Recent Common Ancestor (MRCA) taxon id for target_id and each other taxon.
Taxa sharing no common ancestor will have MRCA id -1.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny in working format (i.e., topologically sorted with contiguous ids and an ancestor_id column, or an ancestor_list column from which ancestor_id can be derived).
- target_idint
The target organism id to compute MRCA against.
- progress_wrapcallable, optional
Wrapper for progress display (e.g., tqdm).
Returns
- numpy.ndarray
Array of shape (n,) with dtype int64, containing MRCA ids for each organism with the target. Entries are -1 where organisms share no common ancestor with the target.
See Also
- alifestd_calc_mrca_id_vector_asexual :
Pandas-based implementation.
- alifestd_calc_polytomic_index(phylogeny_df: DataFrame) int
Count how many fewer inner nodes are contained in phylogeny than expected if strictly bifurcationg.
Excludes unifurcations from calculation.
- alifestd_calc_polytomic_index_polars(phylogeny_df: DataFrame) int
Count how many fewer inner nodes are contained in phylogeny than expected if strictly bifurcating.
Excludes unifurcations from calculation.
- alifestd_categorize_triplet_asexual(phylogeny_df: DataFrame, triplet_ids: Iterable[int], mutate: bool = False) int
Assess the topological configuration of three id’s in phylogeny_df.
If polytomy, return -1. Else, return index of outgroup id.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
See Also
alifestd_estimate_triplet_distance_asexual alifestd_sample_triplet_comparisons_asexual
- alifestd_check_topological_sensitivity(phylogeny_df: DataFrame, *, insert: bool, delete: bool, update: bool) List[str]
Return names of columns present in phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations.
If no such columns exist, returns an empty list.
Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
- insertbool
Whether the operation inserts new nodes.
- deletebool
Whether the operation deletes nodes.
- updatebool
Whether the operation updates ancestor relationships.
Input dataframe is not mutated by this operation.
See Also
- alifestd_check_topological_sensitivity_polars :
Polars-based implementation.
- alifestd_check_topological_sensitivity_polars(phylogeny_df: DataFrame | LazyFrame, *, insert: bool, delete: bool, update: bool) List[str]
Return names of columns present in phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations.
Accepts polars DataFrames and LazyFrames.
If no such columns exist, returns an empty list.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
- insertbool
Whether the operation inserts new nodes.
- deletebool
Whether the operation deletes nodes.
- updatebool
Whether the operation updates ancestor relationships.
See Also
- alifestd_check_topological_sensitivity :
Pandas-based implementation.
- alifestd_chronological_sort(phylogeny_df: DataFrame, how: str = 'origin_time', mutate: bool = False) DataFrame
Sort rows so all organisms appear in chronological order, default origin_time.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_chronological_sort_polars(phylogeny_df: DataFrame, how: str = 'origin_time') DataFrame
Sort rows so all organisms appear in chronological order, default origin_time.
- alifestd_coarsen_dilate_asexual(phylogeny_df: DataFrame, *, criterion: str = 'origin_time', dilation: int = 1, mutate: bool = False) DataFrame
Coarsen a phylogeny by collapsing inner nodes within dilation windows.
All inner (non-leaf) nodes with criterion values in the half-open interval
[n, n + dilation), wheren % dilation == 0, are collapsed to a single inner node atn.Tip nodes are never moved. The MRCA of two tips may only shift backward (never forward), by at most
dilationunits, and never across an % dilation == 0boundary.Parameters
- phylogeny_dfpd.DataFrame
Input phylogeny in alife standard format.
- criterionstr, default “origin_time”
Column whose values define the time axis for dilation.
- dilationint
Width of the dilation window. Must be a positive integer.
- mutatebool, default False
If True, allow in-place mutation of the input dataframe.
Returns
- pd.DataFrame
Coarsened phylogeny in alife standard format.
Raises
- NotImplementedError
If input is not topologically sorted with contiguous ids.
- ValueError
If dilation is not a positive integer, if criterion is not present in phylogeny_df, or if criterion is
"id"or"ancestor_id".
- alifestd_coarsen_dilate_polars(phylogeny_df: DataFrame | LazyFrame, *, criterion: str = 'origin_time', dilation: int = 1) DataFrame
Coarsen a phylogeny by collapsing inner nodes within dilation windows.
All inner (non-leaf) nodes with criterion values in the half-open interval
[n, n + dilation), wheren % dilation == 0, are collapsed to a single inner node atn.Tip nodes are never moved. The MRCA of two tips may only shift backward (never forward), by at most
dilationunits, and never across an % dilation == 0boundary.Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
Input phylogeny in alife standard format.
- criterionstr, default “origin_time”
Column whose values define the time axis for dilation.
- dilationint
Width of the dilation window. Must be a positive integer.
Returns
- polars.DataFrame
Coarsened phylogeny in alife standard format.
Raises
- NotImplementedError
If input is not topologically sorted with contiguous ids.
- ValueError
If dilation is not a positive integer, if criterion is not present in phylogeny_df, or if criterion is
"id"or"ancestor_id".
See Also
- alifestd_coarsen_dilate_asexual :
Pandas-based implementation.
- alifestd_coarsen_mask(phylogeny_df: ~pandas.core.frame.DataFrame, mask: ~pandas.core.series.Series, mutate: bool = False, progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame
Pare record to bypass organisms outside mask.
The root ancestor token will be adopted from phylogeny_df.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_coarsen_taxa_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, agg: Dict[str, str] | None = None, by: str | Sequence[str]) DataFrame
Condense consecutive phylogeny nodes sharing identical trait values, according to values in by column(s).
The manner in which consecutive nodes with identical traits are condensed may be fine-tuned on a column-by-column basis through the optional agg kwarg, a dict mapping column names to a Pandas GroupBy aggregation operation (e.g., “first”, “min”, “max”, etc.).
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
Dataframe reindexing (e.g., df.index) may be applied.
See Also
- alifestd_coarsen_taxa_asexual_make_agg :
Helper function to generate default agg dict, which may be customized before being passed to alifestd_coarsen_taxa_asexual.
- alifestd_coarsen_taxa_asexual_make_agg(phylogeny_df: DataFrame, default_agg: str = 'first') Dict[str, str]
Build per-column aggregation rules for asexual taxa coarsening.
Parameters
- phylogeny_dfpd.DataFrame
Input phylogeny table.
- default_aggstr, default “first”
Aggregation function to apply to any column not in the hard-coded overrides.
Returns
- Dict[str, str]
Mapping of column name to aggregation method. Four columns are overridden as follows:
“destruction_time”: “last”
“is_root”: “first”
“origin_time”: “first”
Columns named
“ancestor_id”
“ancestor_list”
“branch_length”
“edge_length”
“id”
“is_leaf”
will be excluded from the result. All other (non-excluded) columns use default_agg.
- alifestd_coerce_chronological_consistency(phylogeny_df: DataFrame, mutate: bool = False) DataFrame
For any taxa with origin time preceding its parent’s, set origin time to parent’s origin time.
If an inconsistency is detected, the corrected phylogeny will be returned sorted in topological order.
- alifestd_collapse_trunk_asexual(phylogeny_df: DataFrame, mutate: bool = False) DataFrame
Collapse entries masked by is_trunk column, keeping only the oldest root.
Masked entries must be contiguous, meaning that no non-trunk entry can be an ancestor of a trunk entry.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
See Also
alifestd_delete_trunk_asexual
- alifestd_collapse_trunk_polars(phylogeny_df: DataFrame) DataFrame
Collapse entries masked by is_trunk column, keeping only the oldest root.
- alifestd_collapse_unifurcations(phylogeny_df: DataFrame, mutate: bool = False, root_ancestor_token: str = 'none') DataFrame
Pare record to bypass organisms with one ancestor and one descendant.
May leave a root unifurcation present. See alifestd_delete_unifurcating_roots_asexual.
The option root_ancestor_token will be sandwiched in brackets to create the ancestor list entry for genesis organisms. For example, the token “None” will yield the entry “[None]” and the token “” will yield the entry “[]”. Default “none”.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
See Also
- alifestd_collapse_unifurcations_polars :
Polars-based implementation.
- alifestd_collapse_unifurcations_polars(phylogeny_df: DataFrame) DataFrame
Pare record to bypass organisms with one ancestor and one descendant.
See Also
- alifestd_collapse_unifurcations :
Pandas-based implementation.
- alifestd_convert_root_ancestor_token(ancestor_list: Series, root_ancestor_token: str, mutate: bool = False) Series
Set root_ancestor_token for ancestor_list series.
The option root_ancestor_token will be sandwiched in brackets to create the ancestor list entry for genesis organisms. For example, the token “None” will yield the entry “[None]” and the token “” will yield the entry
- alifestd_count_children_of_asexual(phylogeny_df: DataFrame, parent: int, mutate: bool = False) int
How many taxa are direct descendants of the given parent?
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_count_children_of_polars(phylogeny_df: DataFrame, parent: int) int
How many taxa are direct descendants of the given parent?
- alifestd_count_inner_nodes(phylogeny_df: DataFrame, mutate: bool = False) int
Count how many non-leaf nodes are contained in phylogeny.
- alifestd_count_inner_nodes_polars(phylogeny_df: DataFrame) int
Count how many non-leaf nodes are contained in phylogeny.
- alifestd_count_leaf_nodes(phylogeny_df: DataFrame) int
How many leaf nodes are contained in phylogeny?
- alifestd_count_leaf_nodes_polars(phylogeny_df: DataFrame) int
How many leaf nodes are contained in phylogeny?
- alifestd_count_polytomies(phylogeny_df: DataFrame) int
Count how many inner nodes have more than two descendant nodes.
Only supports asexual phylogenies.
- alifestd_count_polytomies_polars(phylogeny_df: DataFrame) int
Count how many inner nodes have more than two descendant nodes.
Only supports asexual phylogenies.
- alifestd_count_root_nodes(phylogeny_df: DataFrame) int
How many root nodes are contained in phylogeny?
- alifestd_count_root_nodes_polars(phylogeny_df: DataFrame) int
How many root nodes are contained in phylogeny?
- alifestd_count_unifurcating_roots_asexual(phylogeny_df: DataFrame, mutate: bool = False) int
How many root nodes with one child are contained in phylogeny?
- alifestd_count_unifurcating_roots_polars(phylogeny_df: DataFrame) int
How many root nodes with one child are contained in phylogeny?
- alifestd_count_unifurcations(phylogeny_df: DataFrame) int
Count how many inner nodes have exactly one descendant node.
Only supports asexual phylogenies.
- alifestd_count_unifurcations_polars(phylogeny_df: DataFrame) int
Count how many inner nodes have exactly one descendant node.
Only supports asexual phylogenies.
- alifestd_delete_trunk_asexual(phylogeny_df: DataFrame, mutate: bool = False) DataFrame
Delete entries masked by is_trunk column.
Masked entries must be contiguous, meaning that no non-trunk entry can be an ancestor of a trunk entry. Children of deleted entries will become roots.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
See Also
alifestd_collapse_trunk_asexual
- alifestd_delete_trunk_asexual_polars(phylogeny_df: DataFrame) DataFrame
Delete entries masked by is_trunk column.
Masked entries must be contiguous, meaning that no non-trunk entry can be an ancestor of a trunk entry. Children of deleted entries will become roots.
See Also
alifestd_collapse_trunk_asexual
- alifestd_delete_unifurcating_roots_asexual(phylogeny_df: DataFrame, mutate: bool = False, root_ancestor_token: str = 'none') DataFrame
Pare record to bypass root nodes with only one descendant.
Dataframe reindexing (e.g., df.index) may be applied.
See also alifestd_collapse_unifurcations.
The option root_ancestor_token will be sandwiched in brackets to create the ancestor list entry for genesis organisms. For example, the token “None” will yield the entry “[None]” and the token “” will yield the entry “[]”. Default “none”.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_delete_unifurcating_roots_polars(phylogeny_df: DataFrame) DataFrame
Pare record to bypass root nodes with only one descendant.
- alifestd_downsample_tips_asexual(phylogeny_df: DataFrame, n_downsample: int, mutate: bool = False, seed: int | None = None, **kwargs) DataFrame
Create a subsample phylogeny containing n_downsample tips.
If n_downsample is greater than the number of tips in the phylogeny, the whole phylogeny is returned.
Only supports asexual phylogenies.
Deprecated since version 0.6.0: Use alifestd_downsample_tips_uniform_asexual instead.
- alifestd_downsample_tips_canopy_asexual(phylogeny_df: DataFrame, n_downsample: int | None = None, mutate: bool = False, criterion: str = 'origin_time') DataFrame
Retain the n_downsample leaves with the largest criterion values and prune extinct lineages.
If n_downsample is
None, it defaults to the number of leaves that share the maximum value of the criterion column. If n_downsample is greater than or equal to the number of leaves in the phylogeny, the whole phylogeny is returned. Ties are broken arbitrarily.Only supports asexual phylogenies.
Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_downsampleint, optional
Number of tips to retain. If
None, defaults to the count of leaves with the maximum criterion value.- mutatebool, default False
Are side effects on the input argument phylogeny_df allowed?
- criterionstr, default “origin_time”
Column name used to rank leaves. The n_downsample leaves with the largest values in this column are retained. Ties are broken arbitrarily.
Raises
- ValueError
If criterion is not a column in phylogeny_df.
Returns
- pandas.DataFrame
The pruned phylogeny in alife standard format.
- alifestd_downsample_tips_canopy_polars(phylogeny_df: DataFrame, n_downsample: int | None = None, criterion: str | Expr = 'origin_time') DataFrame
Retain the n_downsample leaves with the largest criterion values and prune extinct lineages.
If n_downsample is
None, it defaults to the number of leaves that share the maximum value of the criterion column. If n_downsample is greater than or equal to the number of leaves in the phylogeny, the whole phylogeny is returned. Ties are broken arbitrarily.Only supports asexual phylogenies.
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_downsampleint, optional
Number of tips to retain. If
None, defaults to the count of leaves with the maximum criterion value.- criterionstr or polars.Expr, default “origin_time”
Column name or polars expression used to rank leaves. The n_downsample leaves with the largest values are retained. Ties are broken arbitrarily.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column.
- ValueError
If criterion is not a column in phylogeny_df.
Returns
- polars.DataFrame
The pruned phylogeny in alife standard format.
See Also
- alifestd_downsample_tips_canopy_asexual :
Pandas-based implementation.
- alifestd_downsample_tips_clade_asexual(phylogeny_df: DataFrame, n_downsample: int, mutate: bool = False, seed: int | None = None) DataFrame
Create a subsample phylogeny containing at most n_downsample tips, comprising a single clade within the original phylogeny. Candidate clades are sampled proportionally to their size.
If n_downsample is greater than the number of tips in the phylogeny, the whole phylogeny is returned.
Only supports asexual phylogenies.
- alifestd_downsample_tips_clade_polars(phylogeny_df: DataFrame, n_downsample: int, seed: int | None = None) DataFrame
Create a subsample phylogeny containing at most n_downsample tips, comprising a single clade within the original phylogeny. Candidate clades are sampled proportionally to their size.
If n_downsample is greater than the number of tips in the phylogeny, the whole phylogeny is returned.
Only supports asexual phylogenies.
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_downsampleint
Number of tips to retain.
- seedint, optional
Integer seed for deterministic behavior.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.
Returns
- polars.DataFrame
The downsampled phylogeny in alife standard format.
See Also
- alifestd_downsample_tips_clade_asexual :
Pandas-based implementation.
- alifestd_downsample_tips_lineage_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, n_downsample: int, mutate: bool = False, seed: int | None = None, *, criterion_delta: str = 'origin_time', criterion_target: str = 'origin_time', progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame
Retain the n_downsample leaves closest to the lineage of a target leaf.
Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each leaf, the most recent common ancestor (MRCA) with the target leaf is identified and the “off-lineage delta” is computed as the absolute difference between the leaf’s criterion_delta value and its MRCA’s criterion_delta value. The n_downsample leaves with the smallest off-lineage deltas are retained.
If n_downsample is greater than or equal to the number of leaves in the phylogeny, the whole phylogeny is returned. Ties in off-lineage delta are broken arbitrarily.
Only supports asexual phylogenies.
Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_downsampleint
Number of tips to retain.
- mutatebool, default False
Are side effects on the input argument phylogeny_df allowed?
- seedint, optional
Random seed for reproducible target-leaf selection when there are ties in criterion_target.
- criterion_deltastr, default “origin_time”
Column name used to compute the off-lineage delta for each leaf. The delta is the absolute difference between a leaf’s value and its MRCA’s value in this column.
- criterion_targetstr, default “origin_time”
Column name used to select the target leaf. The leaf with the largest value in this column is chosen as the target. Note that ties are broken by random sample, allowing a seed to be provided.
- progress_wrapCallable, optional
Pass tqdm or equivalent to display a progress bar.
Raises
- ValueError
If criterion_delta or criterion_target is not a column in phylogeny_df.
Returns
- pandas.DataFrame
The pruned phylogeny in alife standard format.
- alifestd_downsample_tips_lineage_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, n_downsample: int, seed: int | None = None, *, criterion_delta: str | ~polars.expr.expr.Expr = 'origin_time', criterion_target: str | ~polars.expr.expr.Expr = 'origin_time', progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame
Retain the n_downsample leaves closest to the lineage of a target leaf.
Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each leaf, the most recent common ancestor (MRCA) with the target leaf is identified and the “off-lineage delta” is computed as the absolute difference between the leaf’s criterion_delta value and its MRCA’s criterion_delta value. The n_downsample leaves with the smallest off-lineage deltas are retained.
If n_downsample is greater than or equal to the number of leaves in the phylogeny, the whole phylogeny is returned. Ties in off-lineage delta are broken arbitrarily.
Only supports asexual phylogenies.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_downsampleint
Number of tips to retain.
- seedint, optional
Random seed for reproducible target-leaf selection when there are ties in criterion_target.
- criterion_deltastr or polars.Expr, default “origin_time”
Column name or polars expression used to compute the off-lineage delta for each leaf. The delta is the absolute difference between a leaf’s value and its MRCA’s value.
- criterion_targetstr or polars.Expr, default “origin_time”
Column name or polars expression used to select the target leaf. The leaf with the largest value is chosen as the target. Note that ties are broken by random sample, allowing a seed to be provided.
- progress_wrapCallable, optional
Pass tqdm or equivalent to display a progress bar.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.
- ValueError
If criterion_delta or criterion_target is not a column in phylogeny_df.
Returns
- polars.DataFrame
The pruned phylogeny in alife standard format.
See Also
- alifestd_downsample_tips_lineage_asexual :
Pandas-based implementation.
- alifestd_downsample_tips_lineage_stratified_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, n_downsample: int | None = None, mutate: bool = False, seed: int | None = None, *, criterion_delta: str = 'origin_time', criterion_stratify: str = 'origin_time', criterion_target: str = 'origin_time', n_tips_per_stratum: int = 1, progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame
Retain leaves per stratified group, chosen by proximity to the lineage of a target leaf.
Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each non-target leaf, the most recent common ancestor (MRCA) of that leaf and the target leaf is identified, and the “off-lineage delta” is computed as the absolute difference between that leaf’s criterion_delta value and the MRCA’s criterion_delta value.
Leaves are grouped by their criterion_stratify value. When n_downsample is an integer, stratified values are coarsened by ranking and integer-dividing to form exactly
n_downsample // n_tips_per_stratumgroups. When n_downsample isNone, each distinct stratified value forms its own group (without ranking). Within each group, then_tips_per_stratumleaves with the smallest off-lineage delta are retained.Only supports asexual phylogenies.
Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_downsampleint, optional
Desired number of retained tips. If
None, every distinctcriterion_stratifyvalue forms its own group.- mutatebool, default False
Are side effects on the input argument phylogeny_df allowed?
- seedint, optional
Random seed for reproducible target-leaf selection when there are ties in criterion_target.
- criterion_deltastr, default “origin_time”
Column name used to compute the off-lineage delta for each leaf. The delta is the absolute difference between a leaf’s value and its MRCA’s value in this column.
- criterion_stratifystr, default “origin_time”
Column name used to stratify leaves into groups.
- criterion_targetstr, default “origin_time”
Column name used to select the target leaf. The leaf with the largest value in this column is chosen as the target. Note that ties are broken by random sample, allowing a seed to be provided.
- n_tips_per_stratumint, default 1
Number of tips to retain per stratified group. Must evenly divide
n_downsamplewhenn_downsampleis notNone.- progress_wrapCallable, optional
Pass tqdm or equivalent to display a progress bar.
Raises
- ValueError
If criterion_delta, criterion_stratify, or criterion_target is not a column in phylogeny_df.
- ValueError
If
n_downsampleis notNoneandn_tips_per_stratumdoes not evenly dividen_downsample.
Returns
- pandas.DataFrame
The pruned phylogeny in alife standard format.
- alifestd_downsample_tips_lineage_stratified_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, n_downsample: int | None = None, seed: int | None = None, *, criterion_delta: str | ~polars.expr.expr.Expr = 'origin_time', criterion_stratify: str | ~polars.expr.expr.Expr = 'origin_time', criterion_target: str | ~polars.expr.expr.Expr = 'origin_time', n_tips_per_stratum: int = 1, progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame
Retain leaves per stratified group, chosen by proximity to the lineage of a target leaf.
Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each non-target leaf, the most recent common ancestor (MRCA) of that leaf and the target leaf is identified, and the “off-lineage delta” is computed as the absolute difference between that leaf’s criterion_delta value and the MRCA’s criterion_delta value.
Leaves are grouped by their criterion_stratify value. When n_downsample is an integer, stratified values are coarsened by ranking and integer-dividing to form exactly
n_downsample // n_tips_per_stratumgroups. When n_downsample isNone, each distinct stratified value forms its own group (without ranking). Within each group, then_tips_per_stratumleaves with the smallest off-lineage delta are retained.Only supports asexual phylogenies.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_downsampleint, optional
Desired number of retained tips. If
None, every distinctcriterion_stratifyvalue forms its own group.- seedint, optional
Random seed for reproducible target-leaf selection when there are ties in criterion_target.
- criterion_deltastr or polars.Expr, default “origin_time”
Column name or polars expression used to compute the off-lineage delta for each leaf. The delta is the absolute difference between a leaf’s value and its MRCA’s value.
- criterion_stratifystr or polars.Expr, default “origin_time”
Column name or polars expression used to stratify leaves into groups.
- criterion_targetstr or polars.Expr, default “origin_time”
Column name or polars expression used to select the target leaf. The leaf with the largest value is chosen as the target. Note that ties are broken by random sample, allowing a seed to be provided.
- n_tips_per_stratumint, default 1
Number of tips to retain per stratified group. Must evenly divide
n_downsamplewhenn_downsampleis notNone.- progress_wrapCallable, optional
Pass tqdm or equivalent to display a progress bar.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.
- ValueError
If criterion_delta, criterion_stratify, or criterion_target is not a column in phylogeny_df.
- ValueError
If
n_downsampleis notNoneandn_tips_per_stratumdoes not evenly dividen_downsample.
Returns
- polars.DataFrame
The pruned phylogeny in alife standard format.
See Also
- alifestd_downsample_tips_lineage_stratified_asexual :
Pandas-based implementation.
- alifestd_downsample_tips_polars(phylogeny_df: DataFrame, n_downsample: int, seed: int | None = None, **kwargs) DataFrame
Create a subsample phylogeny containing n_downsample tips.
If n_downsample is greater than the number of tips in the phylogeny, the whole phylogeny is returned.
Only supports asexual phylogenies.
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_downsampleint
Number of tips to retain.
- seedint, optional
Integer seed for deterministic behavior.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column.
Returns
- polars.DataFrame
The downsampled phylogeny in alife standard format.
See Also
- alifestd_downsample_tips_uniform_polars :
Preferred non-deprecated implementation.
- alifestd_downsample_tips_asexual :
Pandas-based implementation.
Deprecated since version 0.6.0: Use alifestd_downsample_tips_uniform_polars instead.
- alifestd_downsample_tips_uniform_asexual(phylogeny_df: DataFrame, n_downsample: int, mutate: bool = False, seed: int | None = None) DataFrame
Create a subsample phylogeny containing n_downsample tips.
If n_downsample is greater than the number of tips in the phylogeny, the whole phylogeny is returned.
Only supports asexual phylogenies.
- alifestd_downsample_tips_uniform_polars(phylogeny_df: DataFrame, n_downsample: int, seed: int | None = None) DataFrame
Create a subsample phylogeny containing n_downsample tips.
If n_downsample is greater than the number of tips in the phylogeny, the whole phylogeny is returned.
Only supports asexual phylogenies.
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_downsampleint
Number of tips to retain.
- seedint, optional
Integer seed for deterministic behavior.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column.
Returns
- polars.DataFrame
The downsampled phylogeny in alife standard format.
See Also
- alifestd_downsample_tips_uniform_asexual :
Pandas-based implementation.
- alifestd_drop_topological_sensitivity(phylogeny_df: DataFrame, mutate: bool = False, *, insert: bool = True, delete: bool = True, update: bool = True) DataFrame
Drop columns from phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations.
Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
- mutatebool, default False
Are side effects on the input argument allowed?
- insertbool, default True
Drop columns sensitive to node insertion.
- deletebool, default True
Drop columns sensitive to node deletion.
- updatebool, default True
Drop columns sensitive to ancestor relationship updates.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
See Also
- alifestd_drop_topological_sensitivity_polars :
Polars-based implementation.
- alifestd_drop_topological_sensitivity_polars(phylogeny_df: DataFrame, *, insert: bool = True, delete: bool = True, update: bool = True) DataFrame
Drop columns from phylogeny_df that may be invalidated by topological operations such as collapsing unifurcations.
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
- insertbool, default True
Drop columns sensitive to node insertion.
- deletebool, default True
Drop columns sensitive to node deletion.
- updatebool, default True
Drop columns sensitive to ancestor relationship updates.
See Also
- alifestd_drop_topological_sensitivity :
Pandas-based implementation.
- alifestd_estimate_triplet_distance_asexual(first_df: ~pandas.core.frame.DataFrame, second_df: ~pandas.core.frame.DataFrame, taxon_label_key: str, confidence: float = 0.99, precision: float = 0.01, strict: bool | ~typing.Tuple[bool, bool] = True, detail: bool = False, progress_wrap: ~typing.Callable = <function <lambda>>, mutate: bool = False) float | Tuple[float, Tuple[float, float, int]]
Estimate the triplet distance between two asexual phylogenetic trees in alife sampling sets of three leaf taxa and counting the fraction whose phylogenetic connectivity mismatch between trees.
Parameters
- first_dfpd.DataFrame
The DataFrame representing the first phylogenetic tree.
- second_dfpd.DataFrame
The DataFrame representing the second phylogenetic tree.
- taxon_label_keystr
The key in the DataFrame to identify the taxon labels.
- confidencefloat, default 0.99
The confidence level for the estimation.
See estimate_binomial_p for details.
- precisionfloat, default 0.01
The precision of the estimation.
See estimate_binomial_p for details.
- strictbool or Tuple[bool, bool], default True
A flag or a tuple of flags indicating how to treat tuples.
If False, triplets that form a polytomy in either tree are not counted as mismatching. If True, they are counted as mismatching. If a tuple is given, polytomies in the first and second trees are treated according to the first and second elements of the tuple, respectively.
- detailbool, default False
If True, returns a detailed result including the estimated distance, confidence interval, and sample size.
- progress_wraptyping.Callable, optional
Pass tqdm or equivalent to display a progress bar.
- mutatebool, default False
If True, allows mutation of input DataFrames.
Returns
- float or Tuple[float, Tuple[float, float, int]]
The estimated distance between the two trees.
If detail is True, returns a tuple containing the estimated distance, the confidence interval, and the sample size.
Notes
The core comparison is done by sampling triplets of taxa, categorizing them, and comparing these categorizations across the two trees, taking into account the strict and lax parameters for handling polytomies. See alifestd_categorize_triplet_asexual for details.
See Also
alifestd_categorize_triplet_asexual alifestd_sample_triplet_comparisons_asexual
- alifestd_find_chronological_inconsistency(phylogeny_df: DataFrame) int | None
Return the id of a taxon with origin time preceding its parent’s, if any are present.
- alifestd_find_chronological_inconsistency_polars(phylogeny_df: DataFrame) int | None
Return the id of a taxon with origin time preceding its parent’s, if any are present.
- alifestd_find_leaf_ids(phylogeny_df: DataFrame) ndarray
What ids are not listed in any ancestor_list?
Input dataframe is not mutated by this operation.
- alifestd_find_leaf_ids_polars(phylogeny_df: DataFrame) ndarray
What ids are ancestor to no other ids?
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must have contiguous ids and represent an asexual phylogeny.
Returns
- numpy.ndarray
Array of leaf node ids.
See Also
- alifestd_find_leaf_ids :
Pandas-based implementation.
- alifestd_find_mrca_id_asexual(phylogeny_df: DataFrame, leaf_ids: Iterable[int], mutate: bool = False) int
Find most recent common ancestor of leaf_ids.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_find_pair_distance_asexual(phylogeny_df: DataFrame, first: int, second: int, *, criterion: str = 'origin_time', mutate: bool = False) float | None
Find the pairwise distance between two taxa via their MRCA.
The distance is computed as the sum of criterion differences between each taxon and their Most Recent Common Ancestor (MRCA):
- distance = (criterion[first] - criterion[mrca])
(criterion[second] - criterion[mrca])
Parameters
- phylogeny_dfpd.DataFrame
Phylogeny in alife standard format.
- firstint
First taxon id.
- secondint
Second taxon id.
- criterionstr, default “origin_time”
Column name used to measure distance between taxa and their MRCA.
- mutatebool, default False
If True, allows in-place modification of phylogeny_df.
Returns
- float or None
The pairwise distance between the two taxa, or None if they have no common ancestor.
See Also
- alifestd_find_pair_mrca_id_asexual :
Finds the MRCA id used internally by this function.
- alifestd_find_pair_distance_polars :
Polars-based implementation.
- alifestd_find_pair_distance_polars(phylogeny_df: DataFrame, first: int, second: int, *, criterion: str | Expr = 'origin_time') float | None
Find the pairwise distance between two taxa via their MRCA.
The distance is computed as the sum of criterion differences between each taxon and their Most Recent Common Ancestor (MRCA):
- distance = (criterion[first] - criterion[mrca])
(criterion[second] - criterion[mrca])
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.
- firstint
First taxon id.
- secondint
Second taxon id.
- criterionstr or polars.Expr, default “origin_time”
Column name or polars expression used to measure distance between taxa and their MRCA.
Returns
- float or None
The pairwise distance between the two taxa, or None if they have no common ancestor.
See Also
- alifestd_find_pair_mrca_id_polars :
Finds the MRCA id used internally by this function.
- alifestd_find_pair_distance_asexual :
Pandas-based implementation.
- alifestd_find_pair_mrca_id_asexual(phylogeny_df: DataFrame, first: int, second: int, *, mutate: bool = False, is_topologically_sorted: bool | None = None, has_contiguous_ids: bool | None = None) int | None
Find the Most Recent Common Ancestor of two taxa.
Parameters
- phylogeny_dfpd.DataFrame
Phylogeny in alife standard format.
- firstint
First taxon id.
- secondint
Second taxon id.
- mutatebool, default False
If True, allows in-place modification of phylogeny_df.
- is_topologically_sortedbool, optional
If provided, skips the topological sort check. If None (default), the check is performed automatically.
- has_contiguous_idsbool, optional
If provided, skips the contiguous ids check. If None (default), the check is performed automatically.
Returns
- int or None
The id of the most recent common ancestor, or None if no common ancestor exists.
- alifestd_find_pair_mrca_id_polars(phylogeny_df: DataFrame, first: int, second: int, *, is_topologically_sorted: bool | None = None, has_contiguous_ids: bool | None = None) int | None
Find the Most Recent Common Ancestor of two taxa.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.
- firstint
First taxon id.
- secondint
Second taxon id.
- is_topologically_sortedbool, optional
If provided, skips the topological sort check. If None (default), the check is performed automatically.
- has_contiguous_idsbool, optional
If provided, skips the contiguous ids check. If None (default), the check is performed automatically.
Returns
- int or None
The id of the most recent common ancestor, or None if no common ancestor exists.
See Also
- alifestd_find_pair_mrca_id_asexual :
Pandas-based implementation.
- alifestd_find_root_ids(phylogeny_df: DataFrame) ndarray
What ids have an empty ancestor_list?
Input dataframe is not mutated by this operation.
- alifestd_find_root_ids_polars(phylogeny_df: DataFrame) ndarray
What ids have an empty ancestor_list?
- alifestd_from_avida_spop(spop_text: str, *, create_ancestor_list: bool = True, dtype_id: type | None = <class 'numpy.int64'>) DataFrame
Convert Avida
.spoppopulation snapshot text to a phylogeny dataframe.Parses the text content of an Avida
.spop(structured population) file and returns a pandas DataFrame in alife standard format.Parameters
- spop_textstr
Full text content of an Avida
.spopfile.- create_ancestor_listbool, default True
If True, include an
ancestor_listcolumn in the result.- dtype_idtype or None, default np.int64
Numpy dtype for the
idcolumn. If None, the smallest signed integer dtype is chosen automatically based on the number of rows in the data.
Returns
- pd.DataFrame
Phylogeny dataframe in alife standard format.
See Also
- alifestd_from_avida_spop_polars :
Polars-based implementation.
Raises
- ValueError
If the
#formatheader is missing from the spop text.
- alifestd_from_avida_spop_polars(spop_text: str, *, create_ancestor_list: bool = True, dtype_id: DataType | None = Int64) DataFrame
Convert Avida
.spoppopulation snapshot text to a phylogeny dataframe.Parses the text content of an Avida
.spop(structured population) file and returns a polars DataFrame in alife standard format.Parameters
- spop_textstr
Full text content of an Avida
.spopfile.- create_ancestor_listbool, default True
If True, include an
ancestor_listcolumn in the result.- dtype_idpl.DataType or None, default pl.Int64
Polars dtype for the
idcolumn. If None, the smallest signed integer dtype is chosen automatically based on the number of rows in the data.
Returns
- pl.DataFrame
Phylogeny dataframe in alife standard format.
See Also
- alifestd_from_avida_spop :
Pandas-based implementation.
Raises
- ValueError
If the
#formatheader is missing from the spop text.
- alifestd_from_newick(newick: str, *, branch_length_dtype: type = <class 'float'>, create_ancestor_list: bool = False, dtype_id: type | None = <class 'numpy.int64'>) DataFrame
Convert a Newick format string to a phylogeny dataframe.
Parses a Newick tree string and returns a pandas DataFrame in alife standard format with columns: id, ancestor_id, taxon_label, origin_time_delta, and branch_length. Optionally includes ancestor_list.
Parameters
- newickstr
A phylogeny in Newick format.
- branch_length_dtypetype, default float
Dtype for branch length values. Use
intto get nullable integer columns (pd.Int64Dtype). Missing branch lengths will bepd.NAfor integer dtypes orNaNfor float dtypes.- create_ancestor_listbool, default False
If True, include an
ancestor_listcolumn in the result.- dtype_idtype or None, default np.int64
Numpy dtype for the
idandancestor_idcolumns. If None, the smallest signed integer dtype is chosen automatically based on the number of commas in the Newick string.
Returns
- pd.DataFrame
Phylogeny dataframe in alife standard format.
See Also
- alifestd_from_newick_polars :
Polars-based implementation.
- alifestd_as_newick_asexual :
Inverse conversion, from alife standard to Newick format.
- alifestd_from_newick_polars(newick: str, *, branch_length_dtype: type = <class 'float'>, create_ancestor_list: bool = False, dtype_id: ~polars.datatypes.classes.DataType | None = Int64) DataFrame
Convert a Newick format string to a phylogeny dataframe.
Parses a Newick tree string and returns a polars DataFrame in alife standard format with columns: id, ancestor_id, taxon_label, origin_time_delta, and branch_length. Optionally includes ancestor_list.
Parameters
- newickstr
A phylogeny in Newick format.
- branch_length_dtypetype, default float
Dtype for branch length values. Use
intto get nullable integer columns (pl.Int64). Missing branch lengths will benullfor integer dtypes orNaNfor float dtypes.- create_ancestor_listbool, default False
If True, include an
ancestor_listcolumn in the result.- dtype_idpl.DataType or None, default pl.Int64
Polars dtype for the
idandancestor_idcolumns. If None, the smallest signed integer dtype is chosen automatically based on the number of commas in the Newick string.
Returns
- pl.DataFrame
Phylogeny dataframe in alife standard format.
See Also
- alifestd_from_newick :
Pandas-based implementation.
- alifestd_as_newick_asexual :
Inverse conversion, from alife standard to Newick format.
- alifestd_has_compact_ids(phylogeny_df: DataFrame) bool
Are id values between 0 and len(phylogeny_df), in any order?
Input dataframe is not mutated by this operation.
- alifestd_has_compact_ids_polars(phylogeny_df: DataFrame) bool
Are id values between 0 and len(phylogeny_df), in any order?
- alifestd_has_contiguous_ids(phylogeny_df: DataFrame) bool
Do organisms ids’ correspond to their row number?
Input dataframe is not mutated by this operation.
- alifestd_has_contiguous_ids_polars(phylogeny_df: DataFrame) bool
Do organisms ids’ correspond to their row number?
- alifestd_has_increasing_ids(phylogeny_df: DataFrame) bool
Do offspring have larger id values than ancestors?
Input dataframe is not mutated by this operation.
- alifestd_has_increasing_ids_polars(phylogeny_df: DataFrame) bool
Do offspring have larger id values than ancestors?
Requires ancestor_id column.
- alifestd_has_multiple_roots(phylogeny_df: DataFrame) bool
Does the phylogeny two or more root organisms?
Input dataframe is not mutated by this operation.
- alifestd_has_multiple_roots_polars(phylogeny_df: DataFrame) bool
Does the phylogeny have two or more root organisms?
- alifestd_is_asexual(phylogeny_df: DataFrame) bool
Do all organisms in the phylogeny have one or no immediate ancestor?
Input dataframe is not mutated by this operation.
- alifestd_is_asexual_polars(phylogeny_df: DataFrame) bool
Do all organisms in the phylogeny have one or no immediate ancestor?
- alifestd_is_chronologically_ordered(phylogeny_df: DataFrame, diagnose: bool = True) bool
Do any organisms have origin_time`s preceding members of their `ancestor_list?
Input dataframe is not mutated by this operation.
- alifestd_is_chronologically_ordered_polars(phylogeny_df: DataFrame) bool
Check if all taxa have origin times at or after their ancestor’s origin time.
- alifestd_is_chronologically_sorted(phylogeny_df: DataFrame, how: str = 'origin_time') bool
Do rows appear in chronological order?
Defaults to origin_time. Input dataframe is not mutated by this operation.
- alifestd_is_chronologically_sorted_polars(phylogeny_df: DataFrame, how: str = 'origin_time') bool
Do rows appear in chronological order?
Defaults to origin_time.
- alifestd_is_sexual(phylogeny_df: DataFrame) bool
Do any organisms in the phylogeny have than one immediate ancestor?
Input dataframe is not mutated by this operation.
- alifestd_is_sexual_polars(phylogeny_df: DataFrame) bool
Do any organisms in the phylogeny have more than one immediate ancestor?
- alifestd_is_strictly_bifurcating_asexual(phylogeny_df: DataFrame, mutate: bool = False) bool
Are all organisms listed after members of their ancestor_list?
Input dataframe is not mutated by this operation.
- alifestd_is_strictly_bifurcating_polars(phylogeny_df: DataFrame) bool
Are all internal nodes strictly bifurcating (exactly 2 children)?
- alifestd_is_topologically_sorted(phylogeny_df: DataFrame) bool
Are all organisms listed after members of their ancestor_list?
Input dataframe is not mutated by this operation.
- alifestd_is_topologically_sorted_polars(phylogeny_df: DataFrame) bool
Are all organisms listed after members of their ancestor_list?
- alifestd_is_ultrametric(phylogeny_df: DataFrame, mutate: bool = False, *, atol: float = 0.0) bool
Do all tips share the same origin_time (within
atol)?Tests the peak-to-peak (
ptp) range oforigin_timeamong tips againstatol. ReturnsTruefor empty phylogenies. RaisesValueErrorif any tip’sorigin_timeis null/NaN.Input dataframe is not mutated by this operation unless mutate set True.
- alifestd_is_ultrametric_polars(phylogeny_df: DataFrame, *, atol: float = 0.0) bool
Do all tips share the same origin_time (within
atol)?Tests the peak-to-peak (
ptp) range oforigin_timeamong tips againstatol. ReturnsTruefor empty phylogenies. RaisesValueErrorif any tip’sorigin_timeis null/NaN. Must represent an asexual phylogeny (whenis_leafis not already present).
- alifestd_is_working_format_asexual(phylogeny_df, mutate: bool = False) DataFrame
Test if phylogeny_df is an asexual phylogeny in working format.
- The working format is a dataframe with the following properties:
topologically sorted (i.e., organisms appear after all ancestors),
contiguous ids (i.e., organisms’ ids correspond to row number), and
contains an integer datatype ancestor_id column.
- alifestd_is_working_format_polars(phylogeny_df: DataFrame) bool
Test if phylogeny_df is an asexual phylogeny in working format.
- The working format is a dataframe with the following properties:
contains an integer datatype ancestor_id column,
topologically sorted (organisms appear after all ancestors), and
contiguous ids (organisms’ ids correspond to row number).
- alifestd_join_roots(phylogeny_df: DataFrame, mutate: bool = False) DataFrame
Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_join_roots_polars(phylogeny_df: DataFrame) DataFrame
Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id.
- alifestd_ladderize_asexual(phylogeny_df: DataFrame, reverse: bool = False, mutate: bool = False) DataFrame
Reorder rows so children are sorted by number of descendant leaves, gathering children into contiguous rows.
By default, subtrees with fewer leaves come first (ascending). Set
reverse=Trueto sort descending (more leaves first).A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
Note: after ladderizing, ids will no longer be contiguous with respect to row indices. Call
alifestd_assign_contiguous_idson the result to reassign contiguous ids if needed.
- alifestd_ladderize_polars(phylogeny_df: DataFrame, reverse: bool = False) DataFrame
Reorder rows so children are sorted by number of descendant leaves, gathering children into contiguous rows.
By default, subtrees with fewer leaves come first (ascending). Set
reverse=Trueto sort descending (more leaves first).Note: after ladderizing, ids will no longer be contiguous with respect to row indices. Call
alifestd_assign_contiguous_ids_polarson the result to reassign contiguous ids if needed.Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.
- reversebool, default False
If True, sort descending (more leaves first).
Returns
- polars.DataFrame
The phylogeny with rows reordered in ladderized order.
Raises
- NotImplementedError
If ids are not contiguous or rows are not topologically sorted.
See Also
- alifestd_ladderize_asexual :
Pandas-based implementation.
- alifestd_make_ancestor_id_col(ids: Series, ancestor_lists: Series) Series
Translate ancestor ids from a column of singleton `ancestor_list`s into a pure-integer series representation.
Each organism must have one or zero ancestors (i.e., asexualasexual data). In the returned series, ancestor id will be assigned to own id for no- ancestor organisms.
- alifestd_make_ancestor_id_col_polars(ids: Series, ancestor_lists: Series) Series
Translate ancestor ids from a column of singleton `ancestor_list`s into a pure-integer series representation.
Each organism must have one or zero ancestors (i.e., asexual data). In the returned series, ancestor id will be assigned to own id for no-ancestor organisms.
- alifestd_make_ancestor_list_col(ids: Series_T, ancestor_ids: Series_T, root_ancestor_token: str = 'none') Series_T
- Translate a column of integer ancestor id values into alife standard
ancestor_list representation.
The option root_ancestor_token will be sandwiched in brackets to create the ancestor list entry for genesis organisms. For example, the token “None” will yield the entry “[None]” and the token “” will yield the entry “[]”. Default “none”.
This function also accepts a polars.DataFrame, for which there is a separate delegated implementation.
- alifestd_make_ancestor_list_col_polars(ids: Series, ancestor_ids: Series, root_ancestor_token: str = 'none') Series
Translate a column of integer ancestor id values into alife standard ancestor_list representation.
The option root_ancestor_token will be sandwiched in brackets to create the ancestor list entry for genesis organisms. For example, the token “None” will yield the entry “[None]” and the token “” will yield the entry “[]”. Default “none”.
- alifestd_make_balanced_bifurcating(depth: int) DataFrame
Build a perfectly balanced bifurcating tree of given depth.
Parameters
- depthint
Depth of the tree, where depth=1 is a single root node.
depth=0 -> empty tree (no nodes)
depth=1 -> 1 node (root only)
depth=2 -> 3 nodes (root + 2 leaves)
depth=3 -> 7 nodes (4 leaves)
depth=4 -> 15 nodes (8 leaves)
Returns
- pd.DataFrame
Alife-standard phylogeny dataframe with ‘id’ and ‘ancestor_list’ columns.
Raises
- ValueError
If depth is negative.
- alifestd_make_balanced_bifurcating_polars(depth: int) DataFrame
Build a perfectly balanced bifurcating tree of given depth.
Parameters
- depthint
Depth of the tree, where depth=1 is a single root node.
Returns
- pl.DataFrame
Phylogeny dataframe with ‘id’ and ‘ancestor_id’ columns.
- alifestd_make_comb(n_leaves: int) DataFrame
Build a comb/caterpillar tree with n_leaves leaves.
Structure (e.g., n_leaves=4):
0 / \ 1 2 / \ 3 4 / \ 5 6
Internal nodes: 0, 2, 4, … Leaves: 1, 3, 5, …
Parameters
- n_leavesint
Number of leaf nodes in the resulting tree.
Returns
- pd.DataFrame
Alife-standard phylogeny dataframe with ‘id’ and ‘ancestor_list’ columns.
Raises
- ValueError
If n_leaves is negative.
- alifestd_make_comb_polars(n_leaves: int) DataFrame
Build a comb/caterpillar tree with n_leaves leaves.
Parameters
- n_leavesint
Number of leaf nodes in the resulting tree.
Returns
- pl.DataFrame
Phylogeny dataframe with ‘id’ and ‘ancestor_id’ columns.
- alifestd_make_edge_split(n_leaves: int, seed: int | None = None) DataFrame
Build a random bifurcating tree via edge-split (PDA) sampling.
At each step, a uniformly chosen existing edge is split by inserting a new internal node, with a new leaf attached as its sibling. This produces samples from the Proportional-to-Distinguishable-Arrangements (PDA) distribution over rooted bifurcating tree shapes.
Ids are contiguous but not topologically sorted; inserted internal nodes may have ids greater than some of their descendants. Pass the result through
alifestd_topological_sortif topological id order is needed.Parameters
- n_leavesint
Number of leaf nodes in the resulting tree.
- seedint, optional
Integer seed for deterministic behavior.
Returns
- pd.DataFrame
Alife-standard phylogeny dataframe with ‘id’ and ‘ancestor_list’ columns.
Raises
- ValueError
If n_leaves is negative.
- alifestd_make_edge_split_polars(n_leaves: int, seed: int | None = None) DataFrame
Build a random bifurcating tree via edge-split (PDA) sampling.
At each step, a uniformly chosen existing edge is split by inserting a new internal node, with a new leaf attached as its sibling. This produces samples from the Proportional-to-Distinguishable-Arrangements (PDA) distribution over rooted bifurcating tree shapes.
Ids are contiguous but not topologically sorted; inserted internal nodes may have ids greater than some of their descendants. Pass the result through
alifestd_topological_sort_polarsif topological id order is needed.Parameters
- n_leavesint
Number of leaf nodes in the resulting tree.
- seedint, optional
Integer seed for deterministic behavior.
Returns
- pl.DataFrame
Phylogeny dataframe with ‘id’ and ‘ancestor_id’ columns.
- alifestd_make_empty(ancestor_id: bool = False) DataFrame
Create an alife standard phylogeny dataframe with zero rows.
- alifestd_make_empty_polars(ancestor_id: bool = True) DataFrame
Create an alife standard phylogeny dataframe with zero rows.
- alifestd_make_leaf_split(n_leaves: int, seed: int | None = None) DataFrame
Build a random bifurcating tree via leaf-split (Yule) sampling.
At each step, a uniformly chosen leaf is replaced by an internal node with two new leaf children. This produces samples from the Yule (pure- birth) distribution over rooted bifurcating tree shapes.
Parameters
- n_leavesint
Number of leaf nodes in the resulting tree.
- seedint, optional
Integer seed for deterministic behavior.
Returns
- pd.DataFrame
Alife-standard phylogeny dataframe with ‘id’ and ‘ancestor_list’ columns.
Raises
- ValueError
If n_leaves is negative.
- alifestd_make_leaf_split_polars(n_leaves: int, seed: int | None = None) DataFrame
Build a random bifurcating tree via leaf-split (Yule) sampling.
At each step, a uniformly chosen leaf is replaced by an internal node with two new leaf children. This produces samples from the Yule (pure- birth) distribution over rooted bifurcating tree shapes.
Parameters
- n_leavesint
Number of leaf nodes in the resulting tree.
- seedint, optional
Integer seed for deterministic behavior.
Returns
- pl.DataFrame
Phylogeny dataframe with ‘id’ and ‘ancestor_id’ columns.
- alifestd_make_star(n_leaves: int) DataFrame
Build a star tree with n_leaves leaves.
Structure (e.g., n_leaves=4):
0 / | \ \ 1 2 3 4
The root (id 0) has every leaf as a direct child.
Parameters
- n_leavesint
Number of leaf nodes in the resulting tree.
Returns
- pd.DataFrame
Alife-standard phylogeny dataframe with ‘id’ and ‘ancestor_list’ columns.
Raises
- ValueError
If n_leaves is negative.
- alifestd_make_star_polars(n_leaves: int) DataFrame
Build a star tree with n_leaves leaves.
Structure (e.g., n_leaves=4):
0 / | \ \ 1 2 3 4
The root (id 0) has every leaf as a direct child.
Parameters
- n_leavesint
Number of leaf nodes in the resulting tree.
Returns
- pl.DataFrame
Phylogeny dataframe with ‘id’ and ‘ancestor_id’ columns.
- alifestd_mark_ancestor_origin_time_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'ancestor_origin_time') DataFrame
Add column ancestor_origin_time.
The output column name can be changed via the
mark_asparameter.Dataframe must provide column origin_time.
A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_ancestor_origin_time_polars(phylogeny_df: DataFrame, *, mark_as: str = 'ancestor_origin_time') DataFrame
Add column ancestor_origin_time.
The output column name can be changed via the
mark_asparameter.Dataframe must provide column origin_time.
- alifestd_mark_clade_duration_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'clade_duration') DataFrame
Add column clade_duration, containing the difference between each the origin_time of each node and the maximum origin_time of its descendants.
The output column name can be changed via the
mark_asparameter.Leaf nodes will have duration 0.
Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_clade_duration_polars(phylogeny_df: DataFrame, *, mark_as: str = 'clade_duration') DataFrame
Add column clade_duration, containing the difference between each node’s origin_time and the maximum origin_time of its descendants.
The output column name can be changed via the
mark_asparameter.Leaf nodes will have duration 0.
- alifestd_mark_clade_duration_ratio_sister_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'clade_duration_ratio_sister') DataFrame
Add column clade_duration_ratio_sister, containing the ratio of each clade’s duration to that of its sister.
The output column name can be changed via the
mark_asparameter.Root nodes will have ratio 1, unless also a leaf node. Leaf nodes and leaf-sisters may have ratio inf or NaN.
Tree must be strictly bifurcating.
Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_clade_duration_ratio_sister_polars(phylogeny_df: DataFrame, *, mark_as: str = 'clade_duration_ratio_sister') DataFrame
Add column clade_duration_ratio_sister, containing the ratio of each clade’s duration to that of its sister.
The output column name can be changed via the
mark_asparameter.Tree must be strictly bifurcating.
- alifestd_mark_clade_faithpd_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'clade_faithpd') DataFrame
Add column clade_faithpd, containing sum branch length among descendant noes.
The output column name can be changed via the
mark_asparameter.Branch length is defined as the difference between the origin time of the node and the origin time of its ancestor.
A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_clade_faithpd_polars(phylogeny_df: DataFrame, *, mark_as: str = 'clade_faithpd') DataFrame
Add column clade_faithpd, containing sum branch length among descendant nodes.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_clade_fblr_growth_children_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, mark_as: str = 'clade_fblr_growth_children', parallel_backend: str | None = None, progress_wrap: ~typing.Callable = <function <lambda>>, work_mask: ~numpy.ndarray | None = None) DataFrame
Add column clade_fblr_growth_children, containing the coefficient of a fblr regression fit to origin times of the leaf descendants of each node.
The output column name can be changed via the
mark_asparameter.Nodes with left/right child clades with equal growth rates will have value approximately 0.0. If left child clade has greater growth rate, value will be negative. If right child clade has greater growth rate, value will be positive.
Pass “loky” to parallel_backend to use joblib with loky backend.
Leaf nodes will have value NaN. If provided, any nodes not included in work_mask will also have value NaN.
Tree must be strictly bifurcating and single-rooted.
Dataframe reindexing (e.g., df.index) may be applied.
Input phylogeny_df and work_mask are not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
References
- Bonetti Franceschi V and Volz E. Phylogenetic signatures reveal
multilevel selection and fitness costs in SARS-CoV-2 [version 2; peer review: 2 approved, 1 approved with reservations]. Wellcome Open Res 2024, 9:85 (https://doi.org/10.12688/wellcomeopenres.20704.2)
- Volz, E. Fitness, growth and transmissibility of SARS-CoV-2 genetic
variants. Nat Rev Genet 24, 724-734 (2023). https://doi.org/10.1038/s41576-023-00610-z
- Saran NA, Nar F. 2025. Fast binary logistic regression. PeerJ Computer
Science 11:e2579 https://doi.org/10.7717/peerj-cs.2579
- alifestd_mark_clade_fblr_growth_sister_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, mark_as: str = 'clade_fblr_growth_sister', parallel_backend: str | None = None, progress_wrap: ~typing.Callable = <function <lambda>>, work_mask: ~numpy.ndarray | None = None) DataFrame
Add column clade_fblr_growth_children, containing the coefficient of a fblr regression fit to origin times of this clade’s descendant leaves versus those of its sister clade.
The output column name can be changed via the
mark_asparameter.Clades with equal growth rate to their sister will have value approximately 0.0. Clades growing faster than their sister clade will have value greater than 0.0. Clades growing slower than their sister clade will have value less than 0.0.
Pass “loky” to parallel_backend to use joblib with loky backend.
Root nodes will have value NaN. If provided, any nodes not included in work_mask will also have value NaN.
Tree must be strictly bifurcating and single-rooted.
Dataframe reindexing (e.g., df.index) may be applied.
Input phylogeny_df and work_mask are not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
References
- Bonetti Franceschi V and Volz E. Phylogenetic signatures reveal
multilevel selection and fitness costs in SARS-CoV-2 [version 2; peer review: 2 approved, 1 approved with reservations]. Wellcome Open Res 2024, 9:85 (https://doi.org/10.12688/wellcomeopenres.20704.2)
- Volz, E. Fitness, growth and transmissibility of SARS-CoV-2 genetic
variants. Nat Rev Genet 24, 724-734 (2023). https://doi.org/10.1038/s41576-023-00610-z
- Saran NA, Nar F. 2025. Fast binary logistic regression. PeerJ Computer
Science 11:e2579 https://doi.org/10.7717/peerj-cs.2579
- alifestd_mark_clade_leafcount_ratio_sister_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'clade_leafcount_ratio_sister') DataFrame
Add column clade_leafcount_ratio_sister, containing the ratio of each clade’s leaf count to that of its sister.
The output column name can be changed via the
mark_asparameter.Root nodes will have ratio 1. Tree must be strictly bifurcating.
Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_clade_leafcount_ratio_sister_polars(phylogeny_df: DataFrame, *, mark_as: str = 'clade_leafcount_ratio_sister') DataFrame
Add column clade_leafcount_ratio_sister, containing the ratio of each clade’s leaf count to that of its sister.
The output column name can be changed via the
mark_asparameter.Tree must be strictly bifurcating.
- alifestd_mark_clade_logistic_growth_children_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, mark_as: str = 'clade_logistic_growth_children', parallel_backend: str | None = None, progress_wrap: ~typing.Callable = <function <lambda>>, work_mask: ~numpy.ndarray | None = None) DataFrame
Add column clade_logistic_growth_children, containing the coefficient of a logistic regression fit to origin times of the leaf descendants of each node.
The output column name can be changed via the
mark_asparameter.Nodes with left/right child clades with equal growth rates will have value approximately 0.0. If left child clade has greater growth rate, value will be negative. If right child clade has greater growth rate, value will be positive.
Pass “loky” to parallel_backend to use joblib with loky backend.
Leaf nodes will have value NaN. If provided, any nodes not included in work_mask will also have value NaN.
Tree must be strictly bifurcating and single-rooted.
Dataframe reindexing (e.g., df.index) may be applied.
Input phylogeny_df and work_mask are not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
References
- Bonetti Franceschi V and Volz E. Phylogenetic signatures reveal
multilevel selection and fitness costs in SARS-CoV-2 [version 2; peer review: 2 approved, 1 approved with reservations]. Wellcome Open Res 2024, 9:85 (https://doi.org/10.12688/wellcomeopenres.20704.2)
- Volz, E. Fitness, growth and transmissibility of SARS-CoV-2 genetic
variants. Nat Rev Genet 24, 724-734 (2023). https://doi.org/10.1038/s41576-023-00610-z
- alifestd_mark_clade_logistic_growth_sister_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, mark_as: str = 'clade_logistic_growth_sister', parallel_backend: str | None = None, progress_wrap: ~typing.Callable = <function <lambda>>, work_mask: ~numpy.ndarray | None = None) DataFrame
Add column clade_logistic_growth_children, containing the coefficient of a logistic regression fit to origin times of this clade’s descendant leaves versus those of its sister clade.
The output column name can be changed via the
mark_asparameter.Clades with equal growth rate to their sister will have value approximately 0.0. Clades growing faster than their sister clade will have value greater than 0.0. Clades growing slower than their sister clade will have value less than 0.0.
Pass “loky” to parallel_backend to use joblib with loky backend.
Root nodes will have value NaN. If provided, any nodes not included in work_mask will also have value NaN.
Tree must be strictly bifurcating and single-rooted.
Dataframe reindexing (e.g., df.index) may be applied.
Input phylogeny_df and work_mask are not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
References
- Bonetti Franceschi V and Volz E. Phylogenetic signatures reveal
multilevel selection and fitness costs in SARS-CoV-2 [version 2; peer review: 2 approved, 1 approved with reservations]. Wellcome Open Res 2024, 9:85 (https://doi.org/10.12688/wellcomeopenres.20704.2)
- Volz, E. Fitness, growth and transmissibility of SARS-CoV-2 genetic
variants. Nat Rev Genet 24, 724-734 (2023). https://doi.org/10.1038/s41576-023-00610-z
- alifestd_mark_clade_nodecount_ratio_sister_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'clade_nodecount_ratio_sister') DataFrame
Add column clade_nodecount_ratio_sister, containing the ratio of each clade size to that of its sister.
The output column name can be changed via the
mark_asparameter.Root nodes will have ratio 1. Tree must be strictly bifurcating.
Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_clade_nodecount_ratio_sister_polars(phylogeny_df: DataFrame, *, mark_as: str = 'clade_nodecount_ratio_sister') DataFrame
Add column clade_nodecount_ratio_sister, containing the ratio of each clade size to that of its sister.
The output column name can be changed via the
mark_asparameter.Tree must be strictly bifurcating.
- alifestd_mark_clade_subtended_duration_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'clade_subtended_duration') DataFrame
Add column clade_subtended_duration, containing the difference between each the origin_time of each node’s ancestor and the maximum origin_time of its descendants.
The output column name can be changed via the
mark_asparameter.Ancestor origin time for root nodes will be 0.
Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_clade_subtended_duration_polars(phylogeny_df: DataFrame, *, mark_as: str = 'clade_subtended_duration') DataFrame
Add column clade_subtended_duration, containing the difference between each node’s ancestor’s origin_time and the maximum origin_time of its descendants.
The output column name can be changed via the
mark_asparameter.Ancestor origin time for root nodes will be 0.
- alifestd_mark_clade_subtended_duration_ratio_sister_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'clade_subtended_duration_ratio_sister') DataFrame
Add column clade_subtended_duration_ratio_sister, containing the ratio of each clade’s subtended duration to that of its sister.
The output column name can be changed via the
mark_asparameter.Root nodes will have ratio 1, unless also a leaf node. Leaf nodes and leaf-sisters may have ratio inf or NaN.
Tree must be strictly bifurcating.
Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_clade_subtended_duration_ratio_sister_polars(phylogeny_df: DataFrame, *, mark_as: str = 'clade_subtended_duration_ratio_sister') DataFrame
Add column clade_subtended_duration_ratio_sister, containing the ratio of each clade’s subtended duration to that of its sister.
The output column name can be changed via the
mark_asparameter.Tree must be strictly bifurcating.
- alifestd_mark_colless_index_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'colless_index') DataFrame
Add column colless_index with Colless imbalance index for each subtree.
The output column name can be changed via the
mark_asparameter.Computes the classic Colless index for strictly bifurcating trees. For each internal node with exactly two children, the local contribution is |L - R| where L and R are leaf counts in left and right subtrees. The value at each node represents the total Colless index for the subtree rooted at that node.
Raises ValueError if the tree is not strictly bifurcating. For trees with polytomies, use alifestd_mark_colless_like_index_mdm_asexual for the Colless-like index instead.
Leaf nodes will have Colless index 0 (no imbalance in subtree of size 1). The root node contains the Colless index for the entire tree.
A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
Parameters
- phylogeny_dfpd.DataFrame
Alife standard DataFrame containing the phylogenetic relationships.
- mutatebool, optional
If True, modify the input DataFrame in place. Default is False.
Returns
- pd.DataFrame
Phylogeny DataFrame with an additional column “colless_index” containing the Colless imbalance index for the subtree rooted at each node.
Raises
- ValueError
If phylogeny_df is not strictly bifurcating.
See Also
- alifestd_mark_colless_index_corrected_asexual :
Normalized Colless index (corrected for tree size).
- alifestd_mark_colless_like_index_mdm_asexual :
Colless-like index (MDM) that supports polytomies.
- alifestd_mark_colless_like_index_var_asexual :
Colless-like index (variance) that supports polytomies.
- alifestd_mark_colless_like_index_sd_asexual :
Colless-like index (std dev) that supports polytomies.
- alifestd_mark_colless_index_corrected_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'colless_index_corrected') DataFrame
Add column colless_index_corrected with the corrected Colless index for each subtree.
The output column name can be changed via the
mark_asparameter.The corrected Colless index IC(T) normalizes the Colless index by tree size. For a subtree with n leaves:
IC(T) = 0 if n <= 2 IC(T) = 2 * C(T) / ((n-1)*(n-2)) if n > 2
where C(T) is the Colless index of the subtree.
This function delegates to alifestd_mark_colless_index_asexual to compute the Colless index, and therefore requires strictly bifurcating trees.
Raises ValueError if the tree is not strictly bifurcating. For trees with polytomies, consider computing the generalized Colless index and normalizing separately.
A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
Parameters
- phylogeny_dfpd.DataFrame
Alife standard DataFrame containing the phylogenetic relationships.
- mutatebool, optional
If True, modify the input DataFrame in place. Default is False.
Returns
- pd.DataFrame
Phylogeny DataFrame with an additional column “colless_index_corrected” containing the corrected Colless imbalance index for the subtree rooted at each node.
Raises
- ValueError
If phylogeny_df is not strictly bifurcating.
See Also
- alifestd_mark_colless_index_asexual :
Unnormalized Colless index for strictly bifurcating trees.
- alifestd_mark_colless_like_index_mdm_asexual :
Colless-like index (MDM) that supports polytomies.
- alifestd_mark_colless_like_index_var_asexual :
Colless-like index (variance) that supports polytomies.
- alifestd_mark_colless_like_index_sd_asexual :
Colless-like index (std dev) that supports polytomies.
- alifestd_mark_colless_index_corrected_polars(phylogeny_df: DataFrame, *, mark_as: str = 'colless_index_corrected') DataFrame
Add column colless_index_corrected with the corrected Colless index for each subtree.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_colless_index_polars(phylogeny_df: DataFrame, *, mark_as: str = 'colless_index') DataFrame
Add column colless_index with Colless imbalance index for each subtree.
The output column name can be changed via the
mark_asparameter.Requires strictly bifurcating trees.
- alifestd_mark_colless_like_index_mdm_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'colless_like_index_mdm') DataFrame
Add column colless_like_index_mdm with Colless-like index using mean deviation from the median (MDM) as dissimilarity.
The output column name can be changed via the
mark_asparameter.Computes the Colless-like balance index from Mir, Rossello, and Rotger (2018) that supports polytomies. Uses weight function f(k) = ln(k + e) and MDM dissimilarity.
- For each internal node v with children v_1, …, v_k:
bal(v) = MDM(delta_f(T_v1), …, delta_f(T_vk))
where delta_f(T) is the f-size of subtree T, defined as the sum of f(deg(u)) over all nodes u in T, and
MDM(x_1, …, x_k) = (1/k) * sum |x_i - median(x)|
The Colless-like index at a node is the sum of balance values across all internal nodes in its subtree.
Leaf nodes will have Colless-like index 0. The root node contains the Colless-like index for the entire tree.
A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
Parameters
- phylogeny_dfpd.DataFrame
Alife standard DataFrame containing the phylogenetic relationships.
- mutatebool, optional
If True, modify the input DataFrame in place. Default is False.
Returns
- pd.DataFrame
Phylogeny DataFrame with an additional column “colless_like_index_mdm” containing the Colless-like imbalance index for the subtree rooted at each node.
References
Mir, A., Rossello, F., & Rotger, L. (2018). Sound Colless-like balance indices for multifurcating trees. PLOS ONE, 13(9), e0203401. https://doi.org/10.1371/journal.pone.0203401
See Also
- alifestd_mark_colless_like_index_var_asexual :
Colless-like index using variance dissimilarity.
- alifestd_mark_colless_like_index_sd_asexual :
Colless-like index using standard deviation dissimilarity.
- alifestd_mark_colless_index_asexual :
Classic Colless index for strictly bifurcating trees.
- alifestd_mark_colless_like_index_mdm_polars(phylogeny_df: DataFrame, *, mark_as: str = 'colless_like_index_mdm') DataFrame
Add column colless_like_index_mdm with Colless-like index using mean deviation from the median (MDM) as dissimilarity.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_colless_like_index_sd_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'colless_like_index_sd') DataFrame
Add column colless_like_index_sd with Colless-like index using sample standard deviation as dissimilarity.
The output column name can be changed via the
mark_asparameter.Computes the Colless-like balance index from Mir, Rossello, and Rotger (2018) that supports polytomies. Uses weight function f(k) = ln(k + e) and standard deviation dissimilarity.
- For each internal node v with children v_1, …, v_k:
bal(v) = sd(delta_f(T_v1), …, delta_f(T_vk))
where delta_f(T) is the f-size of subtree T, defined as the sum of f(deg(u)) over all nodes u in T, and
sd(x_1, …, x_k) = sqrt(var(x_1, …, x_k)) var(x_1, …, x_k) = (1/(k-1)) * sum (x_i - mean(x))^2
The Colless-like index at a node is the sum of balance values across all internal nodes in its subtree.
Leaf nodes will have Colless-like index 0. The root node contains the Colless-like index for the entire tree.
A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
Parameters
- phylogeny_dfpd.DataFrame
Alife standard DataFrame containing the phylogenetic relationships.
- mutatebool, optional
If True, modify the input DataFrame in place. Default is False.
Returns
- pd.DataFrame
Phylogeny DataFrame with an additional column “colless_like_index_sd” containing the Colless-like imbalance index for the subtree rooted at each node.
References
Mir, A., Rossello, F., & Rotger, L. (2018). Sound Colless-like balance indices for multifurcating trees. PLOS ONE, 13(9), e0203401. https://doi.org/10.1371/journal.pone.0203401
See Also
- alifestd_mark_colless_like_index_mdm_asexual :
Colless-like index using MDM dissimilarity.
- alifestd_mark_colless_like_index_var_asexual :
Colless-like index using variance dissimilarity.
- alifestd_mark_colless_index_asexual :
Classic Colless index for strictly bifurcating trees.
- alifestd_mark_colless_like_index_sd_polars(phylogeny_df: DataFrame, *, mark_as: str = 'colless_like_index_sd') DataFrame
Add column colless_like_index_sd with Colless-like index using sample standard deviation as dissimilarity.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_colless_like_index_var_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'colless_like_index_var') DataFrame
Add column colless_like_index_var with Colless-like index using sample variance as dissimilarity.
The output column name can be changed via the
mark_asparameter.Computes the Colless-like balance index from Mir, Rossello, and Rotger (2018) that supports polytomies. Uses weight function f(k) = ln(k + e) and variance dissimilarity.
- For each internal node v with children v_1, …, v_k:
bal(v) = var(delta_f(T_v1), …, delta_f(T_vk))
where delta_f(T) is the f-size of subtree T, defined as the sum of f(deg(u)) over all nodes u in T, and
var(x_1, …, x_k) = (1/(k-1)) * sum (x_i - mean(x))^2
The Colless-like index at a node is the sum of balance values across all internal nodes in its subtree.
Leaf nodes will have Colless-like index 0. The root node contains the Colless-like index for the entire tree.
A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
Parameters
- phylogeny_dfpd.DataFrame
Alife standard DataFrame containing the phylogenetic relationships.
- mutatebool, optional
If True, modify the input DataFrame in place. Default is False.
Returns
- pd.DataFrame
Phylogeny DataFrame with an additional column “colless_like_index_var” containing the Colless-like imbalance index for the subtree rooted at each node.
References
Mir, A., Rossello, F., & Rotger, L. (2018). Sound Colless-like balance indices for multifurcating trees. PLOS ONE, 13(9), e0203401. https://doi.org/10.1371/journal.pone.0203401
See Also
- alifestd_mark_colless_like_index_mdm_asexual :
Colless-like index using MDM dissimilarity.
- alifestd_mark_colless_like_index_sd_asexual :
Colless-like index using standard deviation dissimilarity.
- alifestd_mark_colless_index_asexual :
Classic Colless index for strictly bifurcating trees.
- alifestd_mark_colless_like_index_var_polars(phylogeny_df: DataFrame, *, mark_as: str = 'colless_like_index_var') DataFrame
Add column colless_like_index_var with Colless-like index using sample variance as dissimilarity.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_csr_children_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'csr_children') DataFrame
Add column csr_children, a flat array of child ids grouped by parent according to CSR offsets from the csr_offsets column.
The output column name can be changed via the
mark_asparameter.Entries are ordered so that node i’s children occupy positions
csr_offsets[i]tocsr_offsets[i] + num_children[i](exclusive). Entries beyond the total number of non-root nodes are unused.A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_csr_children_polars(phylogeny_df: DataFrame, *, mark_as: str = 'csr_children') DataFrame
Add column csr_children, a flat array of child ids grouped by parent according to CSR offsets from the csr_offsets column.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_csr_offsets_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'csr_offsets') DataFrame
Add column csr_offsets, the CSR offset where each node’s children begin in the corresponding csr_children array.
The output column name can be changed via the
mark_asparameter.A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_csr_offsets_polars(phylogeny_df: DataFrame, *, mark_as: str = 'csr_offsets') DataFrame
Add column csr_offsets, the CSR offset where each node’s children begin in the corresponding csr_children array.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_first_child_id_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'first_child_id') DataFrame
Add column first_child_id, the smallest-id child of each node.
The output column name can be changed via the
mark_asparameter.If a node has no children (is a leaf), marks own id.
A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_first_child_id_polars(phylogeny_df: DataFrame, *, mark_as: str = 'first_child_id') DataFrame
Add column first_child_id, the smallest-id child of each node.
The output column name can be changed via the
mark_asparameter.If a node has no children (is a leaf), marks own id.
- alifestd_mark_is_left_child_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'is_left_child') DataFrame
Add column is_left_child, containing for each node whether it is the smaller-id child.
The output column name can be changed via the
mark_asparameter.Root nodes will be marked False. Tree must be strictly bifurcating.
Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_is_left_child_polars(phylogeny_df: DataFrame, *, mark_as: str = 'is_left_child') DataFrame
Add column is_left_child, containing for each node whether it is the smaller-id child.
The output column name can be changed via the
mark_asparameter.Root nodes will be marked False.
- alifestd_mark_is_right_child_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'is_right_child') DataFrame
Add column is_right_child, containing for each node whether it is the larger-id child.
The output column name can be changed via the
mark_asparameter.Root nodes will be marked False. Tree must be strictly bifurcating.
Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_is_right_child_polars(phylogeny_df: DataFrame, *, mark_as: str = 'is_right_child') DataFrame
Add column is_right_child, containing for each node whether it is the larger-id child.
The output column name can be changed via the
mark_asparameter.Root nodes will be marked False.
- alifestd_mark_leaves(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'is_leaf') DataFrame
What rows are ancestor to no other row?
The output column name can be changed via the
mark_asparameter.Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_leaves_polars(phylogeny_df: DataFrame, *, mark_as: str = 'is_leaf') DataFrame
Add column is_leaf marking rows that are ancestor to no other row.
The output column name can be changed via the
mark_asparameter.Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
Returns
- polars.DataFrame
The phylogeny with an added is_leaf boolean column.
See Also
- alifestd_mark_leaves :
Pandas-based implementation.
- alifestd_mark_left_child_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'left_child_id') DataFrame
Add column left_child, containing for each node its smallest-id child.
The output column name can be changed via the
mark_asparameter.Leaf nodes will be marked with their own id.
Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_left_child_polars(phylogeny_df: DataFrame, *, mark_as: str = 'left_child_id') DataFrame
Add column left_child_id, containing for each node its smallest-id child.
The output column name can be changed via the
mark_asparameter.Leaf nodes will be marked with their own id.
- alifestd_mark_lineage_cummax_asexual(phylogeny_df: DataFrame, values: str, mutate: bool = False, *, mark_as: str = 'lineage_cummax', reverse: bool = False, skipna: bool = True) DataFrame
Add column with maximum of
valuesalong each lineage.With
reverse=False(default), the result at each node is the maximum ofvaluesalong the path from the root to that node, inclusive. Withreverse=True, the result at each node is the maximum ofvaluesover the entire clade rooted at that node, inclusive.The output column name can be changed via the
mark_asparameter. NaN values are treated as -inf ifskipna(default), else propagate.Phylogeny must be asexual, topologically sorted, and have contiguous ids; otherwise
NotImplementedErroris raised.Input dataframe is not mutated by this operation unless
mutateis set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_lineage_cummax_polars(phylogeny_df: DataFrame, values: str | Expr, *, mark_as: str = 'lineage_cummax', reverse: bool = False, skipna: bool = True) DataFrame
Add column with maximum of
valuesalong each lineage.With
reverse=False(default), the result at each node is the maximum ofvaluesalong the path from the root to that node, inclusive. Withreverse=True, the result at each node is the maximum ofvaluesover the entire clade rooted at that node, inclusive.See Also
- alifestd_mark_lineage_cummax_asexual :
Pandas-based implementation.
- alifestd_mark_lineage_cummin_asexual(phylogeny_df: DataFrame, values: str, mutate: bool = False, *, mark_as: str = 'lineage_cummin', reverse: bool = False, skipna: bool = True) DataFrame
Add column with minimum of
valuesalong each lineage.With
reverse=False(default), the result at each node is the minimum ofvaluesalong the path from the root to that node, inclusive. Withreverse=True, the result at each node is the minimum ofvaluesover the entire clade rooted at that node, inclusive.The output column name can be changed via the
mark_asparameter. NaN values are treated as +inf ifskipna(default), else propagate.Phylogeny must be asexual, topologically sorted, and have contiguous ids; otherwise
NotImplementedErroris raised.Input dataframe is not mutated by this operation unless
mutateis set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_lineage_cummin_polars(phylogeny_df: DataFrame, values: str | Expr, *, mark_as: str = 'lineage_cummin', reverse: bool = False, skipna: bool = True) DataFrame
Add column with minimum of
valuesalong each lineage.With
reverse=False(default), the result at each node is the minimum ofvaluesalong the path from the root to that node, inclusive. Withreverse=True, the result at each node is the minimum ofvaluesover the entire clade rooted at that node, inclusive.See Also
- alifestd_mark_lineage_cummin_asexual :
Pandas-based implementation.
- alifestd_mark_lineage_cumprod_asexual(phylogeny_df: DataFrame, values: str, mutate: bool = False, *, mark_as: str = 'lineage_cumprod', reverse: bool = False, skipna: bool = True) DataFrame
Add column with cumulative product of
valuesalong each lineage.With
reverse=False(default), the result at each node is the product ofvaluesalong the path from the root to that node, inclusive. Withreverse=True, the result at each node is the product ofvaluesover the entire clade rooted at that node, inclusive.The output column name can be changed via the
mark_asparameter. NaN values are treated as 1 ifskipna(default), else propagate.Phylogeny must be asexual, topologically sorted, and have contiguous ids; otherwise
NotImplementedErroris raised.Input dataframe is not mutated by this operation unless
mutateis set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_lineage_cumprod_polars(phylogeny_df: DataFrame, values: str | Expr, *, mark_as: str = 'lineage_cumprod', reverse: bool = False, skipna: bool = True) DataFrame
Add column with cumulative product of
valuesalong each lineage.With
reverse=False(default), the result at each node is the product ofvaluesalong the path from the root to that node, inclusive. Withreverse=True, the result at each node is the product ofvaluesover the entire clade rooted at that node, inclusive.See Also
- alifestd_mark_lineage_cumprod_asexual :
Pandas-based implementation.
- alifestd_mark_lineage_cumsum_asexual(phylogeny_df: DataFrame, values: str, mutate: bool = False, *, mark_as: str = 'lineage_cumsum', reverse: bool = False, skipna: bool = True) DataFrame
Add column with cumulative sum of
valuesalong each lineage.With
reverse=False(default), the result at each node is the sum ofvaluesalong the path from the root to that node, inclusive. Withreverse=True, the result at each node is the sum ofvaluesover the entire clade rooted at that node, inclusive.The output column name can be changed via the
mark_asparameter. NaN values are treated as 0 ifskipna(default), else propagate.Phylogeny must be asexual, topologically sorted, and have contiguous ids; otherwise
NotImplementedErroris raised.Input dataframe is not mutated by this operation unless
mutateis set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_lineage_cumsum_polars(phylogeny_df: DataFrame, values: str | Expr, *, mark_as: str = 'lineage_cumsum', reverse: bool = False, skipna: bool = True) DataFrame
Add column with cumulative sum of
valuesalong each lineage.With
reverse=False(default), the result at each node is the sum ofvaluesalong the path from the root to that node, inclusive. Withreverse=True, the result at each node is the sum ofvaluesover the entire clade rooted at that node, inclusive.Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- valuesstr or polars.Expr
Column name or polars expression providing per-node values.
- mark_asstr, default “lineage_cumsum”
Output column name.
- reversebool, default False
If True, aggregate over clade rooted at each node.
- skipnabool, default True
If True, NaN values are treated as identity (0); else propagate.
See Also
- alifestd_mark_lineage_cumsum_asexual :
Pandas-based implementation.
- alifestd_mark_max_descendant_origin_time_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'max_descendant_origin_time') DataFrame
Add column max_descendant_origin_time, excluding self.
The output column name can be changed via the
mark_asparameter.A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_max_descendant_origin_time_polars(phylogeny_df: DataFrame, *, mark_as: str = 'max_descendant_origin_time') DataFrame
Add column max_descendant_origin_time, excluding self.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_next_sibling_id_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'next_sibling_id') DataFrame
Add column next_sibling_id, the next-highest id sharing the same parent.
The output column name can be changed via the
mark_asparameter.If no such sibling exists, marks own id.
A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_next_sibling_id_polars(phylogeny_df: DataFrame, *, mark_as: str = 'next_sibling_id') DataFrame
Add column next_sibling_id, the next-highest id sharing the same parent.
The output column name can be changed via the
mark_asparameter.If no such sibling exists, marks own id.
- alifestd_mark_node_depth_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'node_depth') DataFrame
Add column node_depth, counting the number of nodes between a node and the root.
The output column name can be changed via the
mark_asparameter.A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_node_depth_polars(phylogeny_df: DataFrame, *, mark_as: str = 'node_depth') DataFrame
Add column node_depth, counting the number of nodes between a node and the root.
The output column name can be changed via the
mark_asparameter.Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.
Returns
- polars.DataFrame
The phylogeny with an added node_depth integer column.
See Also
- alifestd_mark_node_depth_asexual :
Pandas-based implementation.
- alifestd_mark_num_children_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'num_children') DataFrame
Add column num_children, counting for each node the number of nodes it is parent to.
The output column name can be changed via the
mark_asparameter.A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_num_children_polars(phylogeny_df: DataFrame, *, mark_as: str = 'num_children') DataFrame
Add column num_children, counting for each node the number of nodes it is parent to.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_num_descendants_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'num_descendants') DataFrame
Add column num_descendants, excluding self.
The output column name can be changed via the
mark_asparameter.A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_num_descendants_polars(phylogeny_df: DataFrame, *, mark_as: str = 'num_descendants') DataFrame
Add column num_descendants, excluding self.
The output column name can be changed via the
mark_asparameter.Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
Returns
- polars.DataFrame
The phylogeny with an added num_descendants column.
See Also
- alifestd_mark_num_descendants_asexual :
Pandas-based implementation.
- alifestd_mark_num_leaves_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'num_leaves') DataFrame
Add column num_leaves with count of all descendant leaves, including self if a leaf.
The output column name can be changed via the
mark_asparameter.A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_num_leaves_polars(phylogeny_df: DataFrame, *, mark_as: str = 'num_leaves') DataFrame
Add column num_leaves with count of all descendant leaves, including self if a leaf.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_num_leaves_sibling_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'num_leaves_sibling') DataFrame
Mark the number of leaves descendant from each node’s siblings.
The output column name can be changed via the
mark_asparameter.Nodes with no siblings (e.g., root nodes) will have value 0 marked.
Parameters
- phylogeny_dfpd.DataFrame
Alife standard DataFrame containing the phylogenetic relationships.
- mutatebool, optional
If True, modify the input DataFrame in place. Default is False.
Returns
- pd.DataFrame
Phylogeny DataFrame with an additional column “num_leaves_sibling”
- alifestd_mark_num_leaves_sibling_polars(phylogeny_df: DataFrame, *, mark_as: str = 'num_leaves_sibling') DataFrame
Mark the number of leaves descendant from each node’s siblings.
The output column name can be changed via the
mark_asparameter.Nodes with no siblings (e.g., root nodes) will have value 0 marked.
- alifestd_mark_num_preceding_leaves_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'num_preceding_leaves') DataFrame
Add column num_preceding_leaves with count of all leaves occurring before the present node in an inorder traversal.
The output column name can be changed via the
mark_asparameter.For internal nodes, the number of leaf nodes prior to the traversal of first (i.e., leftmost) descendant is marked.
A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Must be a strictly bifurcating tree.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_num_preceding_leaves_polars(phylogeny_df: DataFrame, *, mark_as: str = 'num_preceding_leaves') DataFrame
Add column num_preceding_leaves with count of all leaves occurring before the present node in an inorder traversal.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_oldest_root(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'is_oldest_root') DataFrame
Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id.
The output column name can be changed via the
mark_asparameter.Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_oldest_root_polars(phylogeny_df: DataFrame, *, mark_as: str = 'is_oldest_root') DataFrame
Point all other roots to oldest root, measured by lowest origin_time (if available) or otherwise lowest id.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_origin_time_delta_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'origin_time_delta') DataFrame
Add columns origin_time_delta and ancestor_origin_time.
The output column name can be changed via the
mark_asparameter.Dataframe must provide column origin_time.
A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_origin_time_delta_polars(phylogeny_df: DataFrame, *, mark_as: str = 'origin_time_delta') DataFrame
Add columns origin_time_delta and ancestor_origin_time.
The output column name can be changed via the
mark_asparameter.Dataframe must provide column origin_time.
- alifestd_mark_ot_mrca_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, mark_as: str = 'ot_mrca', progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame
Appends columns characterizing the Most Recent Common Ancestor (MRCA) of the entire extant population at each taxon’s origin_time.
The output column name prefix can be changed via the
mark_asparameter.The extant population is defined in terms of active lineages: any branch of the tree existing at an origin_time which contains at least one descendant at or after that time.
New Columns:
- ot_mrca_idint
The unique identifier of the MRCA for the population that was extant at this organism’s origin_time.
- ot_mrca_time_ofint or float
The origin_time of that MRCA.
- ot_mrca_time_sinceint or float
The duration elapsed between the MRCA’s origin_time and this taxon’s origin_time.
A chronological sort will be applied if phylogeny_df is not chronologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_ot_mrca_polars(phylogeny_df: DataFrame, *, mark_as: str = 'ot_mrca') DataFrame
Appends columns characterizing the Most Recent Common Ancestor (MRCA) of the entire extant population at each taxon’s origin_time.
The output column name prefix can be changed via the
mark_asparameter.The extant population is defined in terms of active lineages: any branch of the tree existing at an origin_time which contains at least one descendant at or after that time.
New Columns
- ot_mrca_idint
The unique identifier of the MRCA for the population that was extant at this organism’s origin_time.
- ot_mrca_time_ofint or float
The origin_time of that MRCA.
- ot_mrca_time_sinceint or float
The duration elapsed between the MRCA’s origin_time and this taxon’s origin_time.
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny with a single root.
Returns
- polars.DataFrame
The phylogeny with added ot_mrca_id, ot_mrca_time_of, and ot_mrca_time_since columns.
See Also
- alifestd_mark_ot_mrca_asexual :
Pandas-based implementation.
- alifestd_mark_prev_sibling_id_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'prev_sibling_id') DataFrame
Add column prev_sibling_id, the next-lowest id sharing the same parent.
The output column name can be changed via the
mark_asparameter.If no such sibling exists, marks own id.
A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_prev_sibling_id_polars(phylogeny_df: DataFrame, *, mark_as: str = 'prev_sibling_id') DataFrame
Add column prev_sibling_id, the next-lowest id sharing the same parent.
The output column name can be changed via the
mark_asparameter.If no such sibling exists, marks own id.
- alifestd_mark_right_child_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'right_child_id') DataFrame
Add column right_child, containing for each node its largest-id child.
The output column name can be changed via the
mark_asparameter.Leaf nodes will be marked with their own id.
Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_right_child_polars(phylogeny_df: DataFrame, *, mark_as: str = 'right_child_id') DataFrame
Add column right_child_id, containing for each node its largest-id child.
The output column name can be changed via the
mark_asparameter.Leaf nodes will be marked with their own id.
- alifestd_mark_root_id(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, selector: ~typing.Callable = <built-in function min>, *, mark_as: str = 'root_id') DataFrame
Add column root_id, containing the id of entries’ ultimate ancestor.
The output column name can be changed via the
mark_asparameter.For sexual data, the field root_id is chosen according to the selection of callable selector over parents’ root_id values. Note that subsets within a connected component may be marked with different root_id values. To create a component id that is consistent within connected components, a backward pass could be performed that updates ancestors’ values if they are greater than that of each descendant.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_root_id_polars(phylogeny_df: DataFrame, *, mark_as: str = 'root_id') DataFrame
Add column root_id, containing the id of entries’ ultimate ancestor.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_roots(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'is_root') DataFrame
Create column is_root to mark rows with no ancestor.
The output column name can be changed via the
mark_asparameter.Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_roots_polars(phylogeny_df: DataFrame, *, mark_as: str = 'is_root') DataFrame
Create column is_root to mark rows with no ancestor.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_sackin_index_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'sackin_index') DataFrame
Add column sackin_index with Sackin index for each subtree.
The output column name can be changed via the
mark_asparameter.Computes the Sackin imbalance index, which is the sum of the depths of all leaves in the subtree. For each internal node, the contribution is the sum of leaf depths in its subtree.
- For a node with children c_1, c_2, …, c_k:
sackin[node] = sum_{i} (sackin[c_i] + num_leaves[c_i])
This formula naturally supports both bifurcating trees and trees with polytomies.
Leaf nodes will have Sackin index 0. The root node contains the Sackin index for the entire tree.
A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
Parameters
- phylogeny_dfpd.DataFrame
Alife standard DataFrame containing the phylogenetic relationships.
- mutatebool, optional
If True, modify the input DataFrame in place. Default is False.
Returns
- pd.DataFrame
Phylogeny DataFrame with an additional column “sackin_index” containing the Sackin imbalance index for the subtree rooted at each node.
See Also
- alifestd_mark_colless_index_asexual :
Colless index for strictly bifurcating trees.
- alifestd_mark_colless_like_index_mdm_asexual :
Colless-like index that supports polytomies.
- alifestd_mark_sackin_index_polars(phylogeny_df: DataFrame, *, mark_as: str = 'sackin_index') DataFrame
Add column sackin_index with Sackin index for each subtree.
The output column name can be changed via the
mark_asparameter.
- alifestd_mark_sample_tips_asexual(phylogeny_df: DataFrame, n_sample: int, mutate: bool = False, seed: int | None = None, *, mark_as: str = 'alifestd_mark_sample_tips_asexual') DataFrame
Mark a random subsample of n_sample tips.
Adds a boolean column
mark_asindicating retained tips.If n_sample is greater than the number of tips in the phylogeny, all tips are marked.
Only supports asexual phylogenies.
Deprecated since version 0.6.0: Use alifestd_mark_sample_tips_uniform_asexual instead.
- alifestd_mark_sample_tips_canopy_asexual(phylogeny_df: DataFrame, n_sample: int | None = None, mutate: bool = False, criterion: str = 'origin_time', *, mark_as: str = 'alifestd_mark_sample_tips_canopy_asexual') DataFrame
Mark the n_sample leaves with the largest criterion values.
Adds a boolean column
mark_asindicating retained tips.If n_sample is
None, it defaults to the number of leaves that share the maximum value of the criterion column. If n_sample is greater than or equal to the number of leaves in the phylogeny, all leaves are marked. Ties are broken arbitrarily.Only supports asexual phylogenies.
Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_sampleint, optional
Number of tips to mark. If
None, defaults to the count of leaves with the maximum criterion value.- mutatebool, default False
Are side effects on the input argument phylogeny_df allowed?
- criterionstr, default “origin_time”
Column name used to rank leaves. The n_sample leaves with the largest values in this column are marked. Ties are broken arbitrarily.
- mark_asstr, default “alifestd_mark_sample_tips_canopy_asexual”
Column name for the boolean mark.
Raises
- ValueError
If criterion is not a column in phylogeny_df.
Returns
- pandas.DataFrame
The phylogeny with an added boolean mark column.
- alifestd_mark_sample_tips_canopy_polars(phylogeny_df: DataFrame, n_sample: int | None = None, criterion: str | Expr = 'origin_time', *, mark_as: str = 'alifestd_mark_sample_tips_canopy_polars') DataFrame
Mark the n_sample leaves with the largest criterion values.
Adds a boolean column
mark_asindicating retained tips.If n_sample is
None, it defaults to the number of leaves that share the maximum value of the criterion column. If n_sample is greater than or equal to the number of leaves in the phylogeny, all leaves are marked. Ties are broken arbitrarily.Only supports asexual phylogenies.
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_sampleint, optional
Number of tips to mark. If
None, defaults to the count of leaves with the maximum criterion value.- criterionstr or polars.Expr, default “origin_time”
Column name or polars expression used to rank leaves. The n_sample leaves with the largest values are marked. Ties are broken arbitrarily.
- mark_asstr, default “alifestd_mark_sample_tips_canopy_polars”
Column name for the boolean mark.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column.
- ValueError
If criterion is not a column in phylogeny_df.
Returns
- polars.DataFrame
The phylogeny with an added boolean mark column.
See Also
- alifestd_mark_sample_tips_canopy_asexual :
Pandas-based implementation.
- alifestd_mark_sample_tips_clade_asexual(phylogeny_df: DataFrame, n_sample: int, mutate: bool = False, seed: int | None = None, *, mark_as: str = 'alifestd_mark_sample_tips_clade_asexual') DataFrame
Mark tips belonging to a randomly sampled clade of at most n_sample tips.
Adds a boolean column
mark_asindicating retained tips. Candidate clades are sampled proportionally to their size.If n_sample is greater than the number of tips in the phylogeny, all tips are marked.
Only supports asexual phylogenies.
- alifestd_mark_sample_tips_clade_polars(phylogeny_df: DataFrame, n_sample: int, seed: int | None = None, *, mark_as: str = 'alifestd_mark_sample_tips_clade_polars') DataFrame
Mark tips belonging to a randomly sampled clade of at most n_sample tips.
Adds a boolean column
mark_asindicating retained tips. Candidate clades are sampled proportionally to their size.If n_sample is greater than the number of tips in the phylogeny, all tips are marked.
Only supports asexual phylogenies.
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_sampleint
Number of tips to mark.
- seedint, optional
Integer seed for deterministic behavior.
- mark_asstr, default “alifestd_mark_sample_tips_clade_polars”
Column name for the boolean mark.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.
Returns
- polars.DataFrame
The phylogeny with an added boolean mark column.
See Also
- alifestd_mark_sample_tips_clade_asexual :
Pandas-based implementation.
- alifestd_mark_sample_tips_lineage_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, n_sample: int, mutate: bool = False, seed: int | None = None, *, criterion_delta: str = 'origin_time', criterion_target: str = 'origin_time', progress_wrap: ~typing.Callable = <function <lambda>>, mark_as: str = 'alifestd_mark_sample_tips_lineage_asexual') DataFrame
Mark the n_sample leaves closest to the lineage of a target leaf.
Adds a boolean column
mark_asindicating retained tips.Selects a target leaf as the leaf with the largest criterion_target value (ties broken randomly). For each leaf, the most recent common ancestor (MRCA) with the target leaf is identified and the “off-lineage delta” is computed as the absolute difference between the leaf’s criterion_delta value and its MRCA’s criterion_delta value. The n_sample leaves with the smallest off-lineage deltas are marked.
If n_sample is greater than or equal to the number of leaves in the phylogeny, all leaves are marked. Ties in off-lineage delta are broken arbitrarily.
Only supports asexual phylogenies.
Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_sampleint
Number of tips to mark.
- mutatebool, default False
Are side effects on the input argument phylogeny_df allowed?
- seedint, optional
Random seed for reproducible target-leaf selection when there are ties in criterion_target.
- criterion_deltastr, default “origin_time”
Column name used to compute the off-lineage delta for each leaf.
- criterion_targetstr, default “origin_time”
Column name used to select the target leaf.
- progress_wrapCallable, optional
Pass tqdm or equivalent to display a progress bar.
- mark_asstr, default “alifestd_mark_sample_tips_lineage_asexual”
Column name for the boolean mark.
Raises
- ValueError
If criterion_delta or criterion_target is not a column in phylogeny_df.
Returns
- pandas.DataFrame
The phylogeny with an added boolean mark column.
- alifestd_mark_sample_tips_lineage_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, n_sample: int, seed: int | None = None, *, criterion_delta: str | ~polars.expr.expr.Expr = 'origin_time', criterion_target: str | ~polars.expr.expr.Expr = 'origin_time', progress_wrap: ~typing.Callable = <function <lambda>>, mark_as: str = 'alifestd_mark_sample_tips_lineage_polars') DataFrame
Mark the n_sample leaves closest to the lineage of a target leaf.
Adds a boolean column
mark_asindicating retained tips.Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_sampleint
Number of tips to mark.
- seedint, optional
Random seed for reproducible target-leaf selection.
- criterion_deltastr or polars.Expr, default “origin_time”
Column name or polars expression used to compute the off-lineage delta for each leaf.
- criterion_targetstr or polars.Expr, default “origin_time”
Column name or polars expression used to select the target leaf.
- progress_wrapCallable, optional
Pass tqdm or equivalent to display a progress bar.
- mark_asstr, default “alifestd_mark_sample_tips_lineage_polars”
Column name for the boolean mark.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.
- ValueError
If criterion_delta or criterion_target is not a column in phylogeny_df.
Returns
- polars.DataFrame
The phylogeny with an added boolean mark column.
See Also
- alifestd_mark_sample_tips_lineage_asexual :
Pandas-based implementation.
- alifestd_mark_sample_tips_lineage_stratified_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, n_sample: int | None = None, mutate: bool = False, seed: int | None = None, *, criterion_delta: str = 'origin_time', criterion_stratify: str = 'origin_time', criterion_target: str = 'origin_time', n_tips_per_stratum: int = 1, progress_wrap: ~typing.Callable = <function <lambda>>, mark_as: str = 'alifestd_mark_sample_tips_lineage_stratified_asexual') DataFrame
Mark leaves per stratified group, chosen by proximity to the lineage of a target leaf.
Adds a boolean column
mark_asindicating retained tips.Only supports asexual phylogenies.
Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_sampleint, optional
Desired number of retained tips. If
None, every distinctcriterion_stratifyvalue forms its own group.- mutatebool, default False
Are side effects on the input argument phylogeny_df allowed?
- seedint, optional
Random seed for reproducible target-leaf selection.
- criterion_deltastr, default “origin_time”
Column name used to compute the off-lineage delta for each leaf.
- criterion_stratifystr, default “origin_time”
Column name used to stratify leaves into groups.
- criterion_targetstr, default “origin_time”
Column name used to select the target leaf.
- n_tips_per_stratumint, default 1
Number of tips to retain per stratified group.
- progress_wrapCallable, optional
Pass tqdm or equivalent to display a progress bar.
- mark_asstr, default “alifestd_mark_sample_tips_lineage_stratified_asexual”
Column name for the boolean mark.
Raises
- ValueError
If criterion_delta, criterion_stratify, or criterion_target is not a column in phylogeny_df.
- ValueError
If
n_sampleis notNoneandn_tips_per_stratumdoes not evenly dividen_sample.
Returns
- pandas.DataFrame
The phylogeny with an added boolean mark column.
- alifestd_mark_sample_tips_lineage_stratified_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, n_sample: int | None = None, seed: int | None = None, *, criterion_delta: str | ~polars.expr.expr.Expr = 'origin_time', criterion_stratify: str | ~polars.expr.expr.Expr = 'origin_time', criterion_target: str | ~polars.expr.expr.Expr = 'origin_time', n_tips_per_stratum: int = 1, progress_wrap: ~typing.Callable = <function <lambda>>, mark_as: str = 'alifestd_mark_sample_tips_lineage_stratified_polars') DataFrame
Mark leaves per stratified group, chosen by proximity to the lineage of a target leaf.
Adds a boolean column
mark_asindicating retained tips.Only supports asexual phylogenies.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_sampleint, optional
Desired number of retained tips. If
None, every distinctcriterion_stratifyvalue forms its own group.- seedint, optional
Random seed for reproducible target-leaf selection.
- criterion_deltastr or polars.Expr, default “origin_time”
Column name or polars expression used to compute the off-lineage delta for each leaf.
- criterion_stratifystr or polars.Expr, default “origin_time”
Column name or polars expression used to stratify leaves into groups.
- criterion_targetstr or polars.Expr, default “origin_time”
Column name or polars expression used to select the target leaf.
- n_tips_per_stratumint, default 1
Number of tips to retain per stratified group.
- progress_wrapCallable, optional
Pass tqdm or equivalent to display a progress bar.
- mark_asstr, default “alifestd_mark_sample_tips_lineage_stratified_polars”
Column name for the boolean mark.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.
- ValueError
If criterion_delta, criterion_stratify, or criterion_target is not a column in phylogeny_df.
- ValueError
If
n_sampleis notNoneandn_tips_per_stratumdoes not evenly dividen_sample.
Returns
- polars.DataFrame
The phylogeny with an added boolean mark column.
See Also
- alifestd_mark_sample_tips_lineage_stratified_asexual :
Pandas-based implementation.
- alifestd_mark_sample_tips_polars(phylogeny_df: DataFrame, n_sample: int, seed: int | None = None, *, mark_as: str = 'alifestd_mark_sample_tips_polars') DataFrame
Mark a random subsample of n_sample tips.
Adds a boolean column
mark_asindicating retained tips.If n_sample is greater than the number of tips in the phylogeny, all tips are marked.
Only supports asexual phylogenies.
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_sampleint
Number of tips to mark.
- seedint, optional
Integer seed for deterministic behavior.
- mark_asstr, default “alifestd_mark_sample_tips_polars”
Column name for the boolean mark.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column.
Returns
- polars.DataFrame
The phylogeny with an added boolean mark column.
See Also
- alifestd_mark_sample_tips_uniform_polars :
Preferred non-deprecated implementation.
- alifestd_mark_sample_tips_asexual :
Pandas-based implementation.
Deprecated since version 0.6.0: Use alifestd_mark_sample_tips_uniform_polars instead.
- alifestd_mark_sample_tips_uniform_asexual(phylogeny_df: DataFrame, n_sample: int, mutate: bool = False, seed: int | None = None, *, mark_as: str = 'alifestd_mark_sample_tips_uniform_asexual') DataFrame
Mark a random subsample of n_sample tips.
Adds a boolean column
mark_asindicating retained tips.If n_sample is greater than the number of tips in the phylogeny, all tips are marked.
Only supports asexual phylogenies.
- alifestd_mark_sample_tips_uniform_polars(phylogeny_df: DataFrame, n_sample: int, seed: int | None = None, *, mark_as: str = 'alifestd_mark_sample_tips_uniform_polars') DataFrame
Mark a random subsample of n_sample tips.
Adds a boolean column
mark_asindicating retained tips.If n_sample is greater than the number of tips in the phylogeny, all tips are marked.
Only supports asexual phylogenies.
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- n_sampleint
Number of tips to mark.
- seedint, optional
Integer seed for deterministic behavior.
- mark_asstr, default “alifestd_mark_sample_tips_uniform_polars”
Column name for the boolean mark.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column.
Returns
- polars.DataFrame
The phylogeny with an added boolean mark column.
See Also
- alifestd_mark_sample_tips_uniform_asexual :
Pandas-based implementation.
- alifestd_mark_sister_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mark_as: str = 'sister_id') DataFrame
Add column sister, containing the id of each node’s sibling.
The output column name can be changed via the
mark_asparameter.Root nodes will be marked with their own id. Phylogeny must be strictly bifurcating.
Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mark_sister_polars(phylogeny_df: DataFrame, *, mark_as: str = 'sister_id') DataFrame
Add column sister_id, containing the id of each node’s sibling.
The output column name can be changed via the
mark_asparameter.Root nodes will be marked with their own id.
- alifestd_mask_descendants_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, ancestor_mask: ndarray) DataFrame
For given ancestor nodes, create a mask identifying those nodes and all descendants.
Ancestral nodes are identified by ancestor_mask corresponding to rows in phylogeny_df.
The mask is returned as a new column alifestd_mask_descendants_asexual in the output DataFrame.
A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_mask_descendants_polars(phylogeny_df: DataFrame, *, ancestor_mask: ndarray) DataFrame
For given ancestor nodes, create a mask identifying those nodes and all descendants.
Ancestral nodes are identified by ancestor_mask corresponding to rows in phylogeny_df.
The mask is returned as a new column
alifestd_mask_descendants_polarsin the output DataFrame.Only supports asexual phylogenies.
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny with contiguous, topologically sorted ids and an
ancestor_idcolumn.- ancestor_masknumpy.ndarray
Boolean array indicating ancestor nodes to propagate from.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column or if ids are non-contiguous or not topologically sorted.
Returns
- polars.DataFrame
The input DataFrame with an additional boolean column
alifestd_mask_descendants_polars.
- alifestd_mask_monomorphic_clades_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, trait_mask: ndarray, trait_values: ndarray) DataFrame
Compute a mask marking “monomorphic” clades where all members with a trait defined value share the same trait value.
Clades containing no members with a defined trait value are considered monomorphic. All leaf nodes are considered monomorphic.
Parameters
- phylogeny_dfpd.DataFrame
DataFrame containing the phylogeny, including an ancestor_id column.
- mutatebool, default=False
If False, operates on a copy of phylogeny_df; if True, modifies phylogeny_df in place (but still returns it).
- trait_masknp.ndarray
Boolean array marking the nodes that have a defined trait value, aligned with phylogeny_df.index.
- trait_valuesnp.ndarray
Array of trait values aligned with phylogeny_df.index.
Returns
pd.DataFrame
- alifestd_parse_ancestor_id(ancestor_list_str: str) int | None
Parse at most a single ancestor id from an ancestor_list field.
- alifestd_parse_ancestor_ids(ancestor_list_str: str) List[int]
Parse ancestor ids from an ancestor_list field.
- alifestd_pipe_unary_ops(phylogeny_df: ~pandas.core.frame.DataFrame, *unary_ops: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame], progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame
Pipe a phylogeny DataFrame through a sequence of unary operations.
Each operation in unary_ops is applied in order to the DataFrame.
Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
- *unary_opscallable
Zero or more callables, each accepting and returning a DataFrame.
- progress_wrapcallable, optional
Optional wrapper for unary_ops to provide progress feedback (e.g. tqdm).
Returns
- pandas.DataFrame
The result of piping phylogeny_df through each operation in order.
See Also
- alifestd_pipe_unary_ops_polars :
Polars-based implementation.
This function also accepts a polars.DataFrame, for which there is a separate delegated implementation.
- alifestd_pipe_unary_ops_polars(phylogeny_df: ~polars.dataframe.frame.DataFrame, *unary_ops: ~typing.Callable[[~polars.dataframe.frame.DataFrame], ~polars.dataframe.frame.DataFrame], progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame
Pipe a phylogeny DataFrame through a sequence of unary operations.
Each operation in unary_ops is applied in order to the DataFrame.
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
- *unary_opscallable
Zero or more callables, each accepting and returning a DataFrame.
- progress_wrapcallable, optional
Optional wrapper for unary_ops to provide progress feedback (e.g. tqdm).
Returns
- polars.DataFrame
The result of piping phylogeny_df through each operation in order.
See Also
- alifestd_pipe_unary_ops :
Pandas-based implementation.
- alifestd_prefix_roots(phylogeny_df: DataFrame, *, allow_id_reassign: bool = False, origin_time: Real | None = None, mutate: bool = False) DataFrame
Add new roots to the phylogeny, prefixing existing roots.
An origin time may be specified, in which case only roots with origin times past the specified time will be prefixed. If no origin time is specified, all roots will be prefixed.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_prefix_roots_polars(phylogeny_df: DataFrame, *, allow_id_reassign: bool = False, origin_time: Real | None = None) DataFrame
Add new roots to the phylogeny, prefixing existing roots.
An origin time may be specified, in which case only roots with origin times past the specified time will be prefixed. If no origin time is specified, all roots will be prefixed.
- alifestd_prune_extinct_lineages_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, criterion: str = 'extant') DataFrame
Drop taxa without extant descendants.
The criterion column is used to determine extant taxa.
Fastest with records in working format. See alifestd_to_working_format.
Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- mutatebool, default False
Are side effects on the input argument phylogeny_df allowed?
- criterionstr, default “extant”
Column name used to determine extant taxa.
Raises
- ValueError
If criterion is not a column in phylogeny_df.
Returns
- pandas.DataFrame
The pruned phylogeny in alife standard format.
- alifestd_prune_extinct_lineages_polars(phylogeny_df: DataFrame, *, criterion: str = 'extant') DataFrame
Drop taxa without extant descendants.
The criterion column is used to determine extant taxa.
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- criterionstr, default “extant”
Column name used to determine extant taxa.
Raises
- NotImplementedError
If phylogeny_df has no “ancestor_id” column.
- NotImplementedError
If phylogeny_df has non-contiguous ids.
- NotImplementedError
If phylogeny_df is not topologically sorted.
- ValueError
If criterion is not a column in phylogeny_df.
Returns
- polars.DataFrame
The pruned phylogeny in alife standard format.
See Also
- alifestd_prune_extinct_lineages_asexual :
Pandas-based implementation.
- alifestd_reroot_at_id_asexual(phylogeny_df: DataFrame, new_root_id: int, mutate: bool = False) DataFrame
Reroot phylogeny, preserving topology.
Reverses the descendant-to-ancestor relationships of all ancestors of the new root. Does not update branch_lengths or edge_lengths columns if present.
Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- new_root_idint
The ID of the node to use as the new root of the phylogeny.
- mutatebool, default False
Are side effects on the input argument phylogeny_df allowed?
Returns
- pandas.DataFrame
The rerooted phylogeny in alife standard format.
- alifestd_reroot_at_id_polars(phylogeny_df: DataFrame, new_root_id: int) DataFrame
Reroot phylogeny at specified node id, preserving topology.
Reverses the descendant-to-ancestor relationships of all ancestors of the new root. Does not update branch_lengths or edge_lengths columns if present.
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny.
- new_root_idint
The ID of the node to use as the new root of the phylogeny.
Returns
- polars.DataFrame
The rerooted phylogeny in alife standard format.
- alifestd_sample_triplet_comparisons_asexual(first_df: ~pandas.core.frame.DataFrame, second_df: ~pandas.core.frame.DataFrame, taxon_label_key: str, n: int = 1000, progress_wrap: ~typing.Callable = <function <lambda>>, mutate: bool = False) DataFrame
Sample triplet comparisons between two asexual phylogenetic trees in alife standard form, creating a DataFrame with the triplet categorizations and comparison results as well as corresponding data from MRCA row within the first tree.
The MRCA row corresponds to the most recent common ancestor of two of the three taxa in the triplet.
Parameters
- first_dfpd.DataFrame
The DataFrame representing the first phylogenetic tree.
- second_dfpd.DataFrame
The DataFrame representing the second phylogenetic tree.
- taxon_label_keystr
The key in the DataFrame to identify the taxon labels.
- nint, default 1000
The number of samples to take.
Corresponds to number of rows in the returned DataFrame.
- progress_wraptyping.Callable, optional
Pass tqdm or equivalent to display a progress bar.
- mutatebool, default False
If True, allows mutation of input DataFrames.
Returns
- pd.DataFrame
A DataFrame with rows corresponding to sampled triplet comparisons and the following columns: - “triplet code, {first,second}”: the categorization of the triplet in
the first or second tree.
“triplet match, {lax,lax/strict,strict,strict/lax}”: whether the triplet categorizations match with differing treatment of polytomies.
all columns from the first tree.
Notes
The core comparison is done by sampling triplets of taxa, categorizing them, and comparing these categorizations across the two trees, taking into account the strict and lax parameters for handling polytomies. See alifestd_categorize_triplet_asexual for details.
See Also
alifestd_categorize_triplet_asexual alifestd_estimate_triplet_distance_asexual
- alifestd_screen_trait_defined_clades_fisher_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mask_trait_absent: ndarray, mask_trait_present: ndarray) ndarray
Perform a screen for trait-defined clades based on Fisher’s exact test.
This function computes a Fisher’s exact test comparing the trait frequency (number of clade members with the trait, number of clade members without the trait) in a clade with its sister clade. Returned values are one-tailed p-values for the hypothesis that the trait frequency in the clade is greater than in the sister clade.
Root clades will be compared to themselves, as they have no sister clade. As such, root clades will take on p-values > 0.5.
The mask_trait_absent parameter can be used to exclude nodes from consideration, for instance &’ing with alifestd_mark_leaves can be used to only consider trait frequency among descendant leaves.
Returns a numpy array of bool with the same length as the input DataFrame, with array elements as the number of nodes in the clade that have the trait. Returned array matches row order of the input DataFrame.
- alifestd_screen_trait_defined_clades_fitch_asexual(phylogeny_df: ~pandas.core.frame.DataFrame, mutate: bool = False, *, mask_trait_absent: ~numpy.ndarray, mask_trait_present: ~numpy.ndarray, progress_wrap: ~typing.Callable = <function <lambda>>) ndarray
Perform a maximum parsimony screen for trait-defined clades using Fitch’s algorithm.
The mask_trait_absent parameter can be used to exclude nodes from consideration, for instance &’ing with alifestd_mark_leaves can be used to only consider traits on leaves.
Pass tqdm or equivalent as progress_wrap to display a progress bar.
Default root state is assumed to be False.
Returns a numpy array of bool with the same length as the input DataFrame, with array elements as the number of nodes in the clade that have the trait. Returned array matches row order of the input DataFrame.
- alifestd_screen_trait_defined_clades_naive_asexual(phylogeny_df: DataFrame, mutate: bool = False, *, mask_trait_absent: ndarray, mask_trait_present: ndarray, defining_mut_thresh: float = 0.75, defining_mut_sister_thresh: float = 0.75) ndarray
Perform a naive screen for trait-defined clades.
This function checks if the trait frequency in a clade is above a certain threshold (defining_mut_thresh), and if the trait frequency in the sister clade is below a certain threshold (defining_mut_sister_thresh). Clades are defined as a node and all descendant nodes.
The mask_trait_absent parameter can be used to exclude nodes from consideration, for instance &’ing with alifestd_mark_leaves can be used to only consider trait frequency among descendant leaves.
Returns a numpy array of bool with the same length as the input DataFrame, with array elements as the number of nodes in the clade that have the trait. Returned array matches row order of the input DataFrame.
- alifestd_sort_children_asexual(phylogeny_df: DataFrame, criterion: str, reverse: bool = False, mutate: bool = False) DataFrame
Reorder rows so children are sorted by the given criterion column, gathering children into contiguous rows.
Reorders rows so that among siblings, they appear in order of ascending
criterioncolumn values. Setreverse=Trueto sort descending (higher values first).The
criterioncolumn must already be present in the dataframe (e.g., added viaalifestd_mark_num_leaves_asexual).A topological sort will be applied if phylogeny_df is not topologically sorted. Dataframe reindexing (e.g., df.index) may be applied.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
Note: after sorting, ids will no longer be contiguous with respect to row indices. Call
alifestd_assign_contiguous_idson the result to reassign contiguous ids if needed.Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
- criterionstr
Name of the column to sort children by.
- reversebool, default False
If True, sort descending (higher values first).
- mutatebool, default False
If True, allow mutation of the input dataframe.
Returns
- pandas.DataFrame
The phylogeny with rows reordered by sorted children traversal.
See Also
- alifestd_sort_children_polars :
Polars-based implementation.
- alifestd_ladderize_asexual :
Convenience wrapper that sorts by
num_leaves.- alifestd_assign_contiguous_ids :
Reassign contiguous ids after reordering.
- alifestd_sort_children_polars(phylogeny_df: DataFrame, criterion: str | Expr, reverse: bool = False) DataFrame
Reorder rows so children are sorted by the given criterion column, gathering children into contiguous rows.
Reorders rows so that among siblings, they appear in order of ascending
criterioncolumn values. Setreverse=Trueto sort descending (higher values first).The
criterioncolumn must already be present in the dataframe.Note: after sorting, ids will no longer be contiguous with respect to row indices. Call
alifestd_assign_contiguous_ids_polarson the result to reassign contiguous ids if needed.Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.
- criterionstr or polars.Expr
Name of the column to sort children by, or a polars expression whose values determine the sort order.
- reversebool, default False
If True, sort descending (higher values first).
Returns
- polars.DataFrame
The phylogeny with rows reordered by sorted children traversal.
Raises
- NotImplementedError
If ids are not contiguous or rows are not topologically sorted.
See Also
- alifestd_sort_children_asexual :
Pandas-based implementation.
- alifestd_ladderize_polars :
Convenience wrapper that sorts by
num_leaves.- alifestd_assign_contiguous_ids_polars :
Reassign contiguous ids after reordering.
- alifestd_splay_polytomies(phylogeny_df: DataFrame, mutate: bool = False) DataFrame
Use a simple splay strategy to resolve polytomies, converting them into bifurcations.
1
/|
2 3 4
` becomes `1
/
- 2 5
/
3 4
No adjustments to any branch length columns in phylogeny_df are performed. However, origin_time (as well as all other columns) of a polytomy’s parent node are duplicated in splayed-out nodes that resolve that polytomy. So, nodes added to perform the splaying-out will have zero- length subtending branches in this regard (i.e., their origin time will match their parent’s).
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_splay_polytomies_polars(phylogeny_df: DataFrame) DataFrame
Use a simple splay strategy to resolve polytomies, converting them into bifurcations.
No adjustments to any branch length columns are performed. Nodes added to perform the splaying-out will have zero-length subtending branches.
- alifestd_sum_origin_time_deltas_asexual(phylogeny_df: DataFrame, mutate: bool = False) Number
Sum differences between taxa origin times and their ancestors’ origin time.
Input dataframe is not mutated by this operation unless mutate set True.
- alifestd_sum_origin_time_deltas_polars(phylogeny_df: DataFrame) float
Sum origin_time_delta values.
- alifestd_test_leaves_isomorphic_asexual(df1: ~pandas.core.frame.DataFrame, df2: ~pandas.core.frame.DataFrame, taxon_label: str, mutate: bool = False, progress_wrap: callable = <function <lambda>>) bool
Test if phylogenetic relationships between leaf nodes are topologically isomorphic between two phylogenies.
- alifestd_test_leaves_isomorphic_polars(df1: DataFrame, df2: DataFrame, taxon_label: str) bool
Test if phylogenetic relationships between leaf nodes are topologically isomorphic between two phylogenies.
See Also
- alifestd_test_leaves_isomorphic_asexual :
Pandas-based implementation.
- alifestd_to_iplotx_pandas(phylogeny_df: DataFrame, mutate: bool = False) AlifestdIplotxShimPandas
Wrap a pandas phylogeny DataFrame for use with iplotx.
Parameters
- phylogeny_dfpd.DataFrame
Asexual phylogeny in alife standard format with contiguous ids and topologically sorted rows.
- mutatebool, default False
If True, allow modification of the input dataframe.
Returns
- AlifestdIplotxShimPandas
An iplotx-compatible tree provider that can be passed directly to
iplotx.tree().
- alifestd_to_iplotx_polars(phylogeny_df: DataFrame) AlifestdIplotxShimPolars
Wrap a polars phylogeny DataFrame for use with iplotx.
Parameters
- phylogeny_dfpolars.DataFrame
Asexual phylogeny in alife standard format with contiguous ids and topologically sorted rows.
Returns
- AlifestdIplotxShimPolars
An iplotx-compatible tree provider that can be passed directly to
iplotx.tree().
- alifestd_to_working_format(phylogeny_df: DataFrame, mutate: bool = False) DataFrame
Re-encode phylogeny_df to facilitate efficient analysis and transformation operations.
The returned phylogeny dataframe will * be topologically sorted (i.e., organisms appear after all ancestors), * have contiguous ids (i.e., organisms’ ids correspond to row number), * contain an integer datatype ancestor_id column if the phylogeny is asexual (i.e., a more performant representation of ancestor_list).
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_to_working_format_polars(phylogeny_df: DataFrame, keep_ancestor_list: bool = False) DataFrame
Re-encode phylogeny_df to facilitate efficient analysis and transformation operations.
The returned phylogeny dataframe will * be topologically sorted (i.e., organisms appear after all ancestors), * have contiguous ids (i.e., organisms’ ids correspond to row number), * contain an integer datatype ancestor_id column if the phylogeny is asexual (i.e., a more performant representation of ancestor_list).
Parameters
- phylogeny_dfpolars.DataFrame
The phylogeny as a dataframe in alife standard format.
- keep_ancestor_listbool, default False
If True and ancestor_list was present in the input, regenerate the ancestor_list column from the (reassigned) ancestor_id column. The column is dropped during processing in all cases; it is only restored when this flag is set and the input already had it.
See Also
- alifestd_to_working_format :
Pandas-based implementation.
- alifestd_topological_sensitivity_warned(*, insert: bool, delete: bool, update: bool) Callable
Decorator that emits a topological sensitivity warning before the wrapped function executes.
The first positional argument of the decorated function must be the phylogeny dataframe (pandas).
The decorated function gains two additional keyword arguments:
ignore_topological_sensitivity(bool, default False): If True, suppress the topological sensitivity warning.drop_topological_sensitivity(bool, default False): If True, drop topology-sensitive columns from the result and suppress the warning.
Parameters
- insertbool
Whether the operation inserts new nodes.
- deletebool
Whether the operation deletes nodes.
- updatebool
Whether the operation updates ancestor relationships.
Returns
- typing.Callable
A decorator that wraps a function with topological sensitivity warning logic.
See Also
- alifestd_topological_sensitivity_warned_polars :
Polars-based implementation.
- alifestd_warn_topological_sensitivity :
Underlying warning function.
- alifestd_topological_sensitivity_warned_polars(*, insert: bool, delete: bool, update: bool) Callable
Decorator that emits a topological sensitivity warning before the wrapped function executes.
The first positional argument of the decorated function must be the phylogeny dataframe (polars).
The decorated function gains two additional keyword arguments:
ignore_topological_sensitivity(bool, default False): If True, suppress the topological sensitivity warning.drop_topological_sensitivity(bool, default False): If True, drop topology-sensitive columns from the result and suppress the warning.
Parameters
- insertbool
Whether the operation inserts new nodes.
- deletebool
Whether the operation deletes nodes.
- updatebool
Whether the operation updates ancestor relationships.
Returns
- typing.Callable
A decorator that wraps a function with topological sensitivity warning logic.
See Also
- alifestd_topological_sensitivity_warned :
Pandas-based implementation.
- alifestd_warn_topological_sensitivity_polars :
Underlying warning function.
- alifestd_topological_sort(phylogeny_df: DataFrame, mutate: bool = False) DataFrame
Sort rows so all organisms follow members of their ancestor_list.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_topological_sort_polars(phylogeny_df: DataFrame) DataFrame
Sort rows so all organisms follow members of their ancestor_id.
Uses contiguous id fast path when possible.
- alifestd_try_add_ancestor_id_col(phylogeny_df: DataFrame, mutate: bool = False) DataFrame
Add an ancestor_id column to the input DataFrame if the phylogeny is asexual and the column does not already exist.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_try_add_ancestor_id_col_polars(phylogeny_df: DataFrame) DataFrame
Add an ancestor_id column to the input DataFrame if the phylogeny is asexual and the column does not already exist.
- alifestd_try_add_ancestor_list_col(phylogeny_df: DataFrame_T, root_ancestor_token: str = 'none', mutate: bool = False) DataFrame_T
- Add an ancestor_list column to the input DataFrame if the column does
not already exist.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
alifestd_make_ancestor_list_col
This function also accepts a polars.DataFrame, for which there is a separate delegated implementation.
- alifestd_try_add_ancestor_list_col_polars(phylogeny_df: DataFrame, root_ancestor_token: str = 'none', mutate: bool = False) DataFrame
Add an ancestor_list column to the input DataFrame if the column does not already exist.
Notes
Even allowed by mutate flag, no side effects occur on input dataframe under Polars implementation. Flag is included for API compatibility with Pandas implementation.
See Also
- alifestd_try_add_ancestor_list_col :
Pandas-based implementation.
- alifestd_ultrametricize(phylogeny_df: DataFrame, mutate: bool = False, *, method: Literal['extend'] = 'extend') DataFrame
Adjust tip origin_time values so all tips share the same time.
With
method="extend", each tip’sorigin_timeis set to the maximumorigin_timeacross all nodes. Internal node times are not modified.Empty phylogenies are returned unchanged.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_ultrametricize_polars(phylogeny_df: DataFrame, *, method: Literal['extend'] = 'extend') DataFrame
Adjust tip origin_time values so all tips share the same time.
With
method="extend", each tip’sorigin_timeis set to the maximumorigin_timeacross all nodes. Internal node times are not modified.Empty phylogenies are returned unchanged. Must represent an asexual phylogeny (when
is_leafis not already present).See Also
- alifestd_ultrametricize :
Pandas-based implementation.
- alifestd_unfurl_lineage_asexual(phylogeny_df: DataFrame, leaf_id: int, mutate: bool = False) ndarray
List leaf_id and its ancestor id sequence through tree root.
The provided dataframe must be asexual.
- alifestd_unfurl_traversal_inorder_asexual(phylogeny_df: DataFrame, mutate: bool = False) ndarray
List id values in semiorder traversal order, with left children visited first.
The provided dataframe must be asexual and strictly bifurcating.
- alifestd_unfurl_traversal_inorder_polars(phylogeny_df: DataFrame) ndarray
List node indices in inorder traversal order, with left children visited first.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual, strictly bifurcating phylogeny with contiguous ids and topologically sorted rows.
Returns
- np.ndarray
Index array giving inorder traversal order.
See Also
- alifestd_unfurl_traversal_inorder_asexual :
Pandas-based implementation.
- alifestd_unfurl_traversal_levelorder_asexual(phylogeny_df: DataFrame, mutate: bool = False) ndarray
List id values in levelorder (BFS) traversal order.
The provided dataframe must be asexual.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_unfurl_traversal_levelorder_polars(phylogeny_df: DataFrame) ndarray
List node indices in levelorder (BFS) traversal order.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.
Returns
- np.ndarray
Index array giving levelorder (BFS) traversal order.
See Also
- alifestd_unfurl_traversal_levelorder_asexual :
Pandas-based implementation.
- alifestd_unfurl_traversal_postorder_asexual(phylogeny_df: DataFrame, mutate: bool = False) ndarray
List id values in postorder traversal order.
The provided dataframe must be asexual.
- alifestd_unfurl_traversal_postorder_contiguous_asexual(phylogeny_df: DataFrame, mutate: bool = False, child_order: Literal['asc', 'desc'] | None = None) ndarray
List node indices in DFS postorder traversal order, with subtree contiguity.
The provided dataframe must be asexual with contiguous ids and topologically sorted rows.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
Parameters
- phylogeny_dfpd.DataFrame
Asexual phylogeny in alife standard format with contiguous ids and topologically sorted rows.
- mutatebool, default False
If True, allow modification of the input dataframe.
- child_order{“asc”, “desc”, None}, default None
Order in which siblings are visited when descending the tree.
"asc"visits smallest-id child first,"desc"visits largest-id child first, andNoneuses an arbitrary (implementation-defined) order.
- alifestd_unfurl_traversal_postorder_contiguous_polars(phylogeny_df: DataFrame, child_order: Literal['asc', 'desc'] | None = None) ndarray
List node indices in DFS postorder traversal order, with subtree contiguity.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.
- child_order{“asc”, “desc”, None}, default None
Order in which siblings are visited when descending the tree.
"asc"visits smallest-id child first,"desc"visits largest-id child first, andNoneuses an arbitrary (implementation-defined) order.
Returns
- np.ndarray
Index array giving DFS postorder traversal order.
See Also
- alifestd_unfurl_traversal_postorder_asexual :
Pandas-based implementation.
- alifestd_unfurl_traversal_postorder_polars(phylogeny_df: DataFrame) ndarray
List node indices in postorder traversal order.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.
Returns
- np.ndarray
Index array giving postorder traversal order.
See Also
- alifestd_unfurl_traversal_postorder_asexual :
Pandas-based implementation.
- alifestd_unfurl_traversal_preorder_asexual(phylogeny_df: DataFrame, mutate: bool = False) ndarray
List id values in DFS preorder traversal order.
The provided dataframe must be asexual.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_unfurl_traversal_preorder_polars(phylogeny_df: DataFrame) ndarray
List node indices in DFS preorder traversal order.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny with contiguous ids and topologically sorted rows.
Returns
- np.ndarray
Index array giving DFS preorder traversal order.
See Also
- alifestd_unfurl_traversal_preorder_asexual :
Pandas-based implementation.
- alifestd_unfurl_traversal_semiorder_asexual(phylogeny_df: DataFrame, mutate: bool = False) ndarray
List id values in semiorder traversal order.
An inorder traversal where either left child (smaller id) or right child (larger id) may be visited first.
The provided dataframe must be asexual and strictly bifurcating.
- alifestd_unfurl_traversal_semiorder_polars(phylogeny_df: DataFrame) ndarray
List node indices in semiorder traversal order.
An inorder traversal where either left child (smaller id) or right child (larger id) may be visited first.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual, strictly bifurcating phylogeny with contiguous ids and topologically sorted rows.
Returns
- np.ndarray
Index array giving semiorder traversal order.
See Also
- alifestd_unfurl_traversal_semiorder_asexual :
Pandas-based implementation.
- alifestd_unfurl_traversal_topological_asexual(phylogeny_df: DataFrame, mutate: bool = False) ndarray
List id values in topological traversal order.
Parents are visited before children. If the dataframe is already topologically sorted, the existing id order is returned directly. Otherwise, a topological ordering is computed.
The provided dataframe must be asexual.
Input dataframe is not mutated by this operation unless mutate set True. If mutate set True, operation does not occur in place; still use return value to get transformed phylogeny dataframe.
- alifestd_unfurl_traversal_topological_polars(phylogeny_df: DataFrame) ndarray
List node indices in topological traversal order.
Parents are visited before children. If the dataframe is already topologically sorted, the existing row indices are returned directly. Otherwise, a topological ordering is computed.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
Must represent an asexual phylogeny with contiguous ids.
Returns
- np.ndarray
Index array giving topological traversal order.
See Also
- alifestd_unfurl_traversal_topological_asexual :
Pandas-based implementation.
- alifestd_validate(phylogeny_df: DataFrame, mutate: bool = False, diagnose: bool = True) bool
Is the phylogeny compliant to alife data standards?
Input dataframe is not mutated by this operation unless mutate set True. If diagnose is set, the failing validation subcheck will warn.
- alifestd_warn_topological_sensitivity(phylogeny_df: DataFrame, caller: str, *, insert: bool, delete: bool, update: bool) None
Emit a warning if phylogeny_df contains columns that may be invalidated by topological operations.
Parameters
- phylogeny_dfpandas.DataFrame
The phylogeny as a dataframe in alife standard format.
- callerstr
Name of the calling function, included in the warning message.
- insertbool
Whether the operation inserts new nodes.
- deletebool
Whether the operation deletes nodes.
- updatebool
Whether the operation updates ancestor relationships.
Input dataframe is not mutated by this operation.
See Also
- alifestd_warn_topological_sensitivity_polars :
Polars-based implementation.
- alifestd_warn_topological_sensitivity_polars(phylogeny_df: DataFrame | LazyFrame, caller: str, *, insert: bool, delete: bool, update: bool) None
Emit a warning if phylogeny_df contains columns that may be invalidated by topological operations.
Parameters
- phylogeny_dfpolars.DataFrame or polars.LazyFrame
The phylogeny as a dataframe in alife standard format.
- callerstr
Name of the calling function, included in the warning message.
- insertbool
Whether the operation inserts new nodes.
- deletebool
Whether the operation deletes nodes.
- updatebool
Whether the operation updates ancestor relationships.
See Also
- alifestd_warn_topological_sensitivity :
Pandas-based implementation.