Input and Output (Legacy)

Note

This documentation covers the legacy API (from phyloframe import legacy). The legacy API is stable and will continue to be maintained for backward compatibility. A redesigned API will accompany phyloframe v1.0.0.

This guide covers reading, writing, and converting phylogenetic data in phyloframe.

Newick Format

Parsing Newick Strings

from phyloframe import legacy as pfl

# Basic topology
df = pfl.alifestd_from_newick("((A,B),(C,D));")

# With branch lengths
df = pfl.alifestd_from_newick("((A:1.0,B:2.5):3.0,(C:4.0,D:5.0):6.0);")

# Columns created: id, ancestor_id, taxon_label,
#                  origin_time_delta, branch_length

Newick Parsing Options

# Integer branch lengths (uses nullable integer dtype)
df = pfl.alifestd_from_newick(
    "((A:1,B:2):3,(C:4,D:5):6);",
    branch_length_dtype=int,
)

# Include ancestor_list column for compatibility
df = pfl.alifestd_from_newick(
    "((A,B),(C,D));",
    create_ancestor_list=True,
)

# Polars version
df_polars = pfl.alifestd_from_newick_polars("((A,B),(C,D));")

Exporting to Newick

# Export to Newick string
newick_str = pfl.alifestd_as_newick_asexual(df)

# Include taxon labels from a column
newick_str = pfl.alifestd_as_newick_asexual(
    df, taxon_label="taxon_label",
)

Tabular File Formats (CSV, Parquet)

Use standard Pandas and Polars I/O utilities for reading and writing phylogeny DataFrames. Parquet is recommended for large phylogenies due to columnar compression, selective column loading, explicit typing, and efficient enum-based categorical string storage.

import pandas as pd
import polars as pl

# CSV --- Pandas
df.to_csv("phylogeny.csv", index=False)
df = pd.read_csv("phylogeny.csv")

# Parquet --- Pandas
df.to_parquet("phylogeny.pqt")
df = pd.read_parquet("phylogeny.pqt")

# Parquet --- Polars (selective column loading)
df_polars.write_parquet("phylogeny.pqt")
df_polars = pl.read_parquet(
    "phylogeny.pqt", columns=["id", "ancestor_id", "origin_time"],
)

Selective column deserialization is particularly advantageous with Polars streaming operations. See the Pandas I/O docs and Polars I/O docs for full details.

Remote and Cloud Sources

DataFrame libraries transparently handle URLs and cloud storage:

import pandas as pd

# From URL
df = pd.read_csv("https://example.com/data/phylogeny.csv")

# From S3
df = pd.read_parquet("s3://bucket/phylogeny.pqt")

# From Google Cloud Storage
df = pd.read_parquet("gs://bucket/phylogeny.pqt")

import polars as pl

# Polars also supports remote sources
df = pl.read_parquet("s3://bucket/phylogeny.pqt")