Input and Output (Legacy)
Note
This documentation covers the legacy API (from phyloframe import legacy). The legacy API is stable and will continue to be maintained for backward compatibility. A redesigned API will accompany phyloframe v1.0.0.
This guide covers reading, writing, and converting phylogenetic data in phyloframe.
Newick Format
Parsing Newick Strings
from phyloframe import legacy as pfl
# Basic topology
df = pfl.alifestd_from_newick("((A,B),(C,D));")
# With branch lengths
df = pfl.alifestd_from_newick("((A:1.0,B:2.5):3.0,(C:4.0,D:5.0):6.0);")
# Columns created: id, ancestor_id, taxon_label,
# origin_time_delta, branch_length
Newick Parsing Options
# Integer branch lengths (uses nullable integer dtype)
df = pfl.alifestd_from_newick(
"((A:1,B:2):3,(C:4,D:5):6);",
branch_length_dtype=int,
)
# Include ancestor_list column for compatibility
df = pfl.alifestd_from_newick(
"((A,B),(C,D));",
create_ancestor_list=True,
)
# Polars version
df_polars = pfl.alifestd_from_newick_polars("((A,B),(C,D));")
Exporting to Newick
# Export to Newick string
newick_str = pfl.alifestd_as_newick_asexual(df)
# Include taxon labels from a column
newick_str = pfl.alifestd_as_newick_asexual(
df, taxon_label="taxon_label",
)
Tabular File Formats (CSV, Parquet)
Use standard Pandas and Polars I/O utilities for reading and writing phylogeny DataFrames. Parquet is recommended for large phylogenies due to columnar compression, selective column loading, explicit typing, and efficient enum-based categorical string storage.
import pandas as pd
import polars as pl
# CSV --- Pandas
df.to_csv("phylogeny.csv", index=False)
df = pd.read_csv("phylogeny.csv")
# Parquet --- Pandas
df.to_parquet("phylogeny.pqt")
df = pd.read_parquet("phylogeny.pqt")
# Parquet --- Polars (selective column loading)
df_polars.write_parquet("phylogeny.pqt")
df_polars = pl.read_parquet(
"phylogeny.pqt", columns=["id", "ancestor_id", "origin_time"],
)
Selective column deserialization is particularly advantageous with Polars streaming operations. See the Pandas I/O docs and Polars I/O docs for full details.
Remote and Cloud Sources
DataFrame libraries transparently handle URLs and cloud storage:
import pandas as pd
# From URL
df = pd.read_csv("https://example.com/data/phylogeny.csv")
# From S3
df = pd.read_parquet("s3://bucket/phylogeny.pqt")
# From Google Cloud Storage
df = pd.read_parquet("gs://bucket/phylogeny.pqt")
import polars as pl
# Polars also supports remote sources
df = pl.read_parquet("s3://bucket/phylogeny.pqt")