========================= Input and Output (Legacy) ========================= .. note:: This documentation covers the **legacy** API (from phyloframe import legacy). The legacy API is stable and will continue to be maintained for backward compatibility. A redesigned API will accompany phyloframe v1.0.0. This guide covers reading, writing, and converting phylogenetic data in phyloframe. Newick Format ============= Parsing Newick Strings ---------------------- .. code-block:: python from phyloframe import legacy as pfl # Basic topology df = pfl.alifestd_from_newick("((A,B),(C,D));") # With branch lengths df = pfl.alifestd_from_newick("((A:1.0,B:2.5):3.0,(C:4.0,D:5.0):6.0);") # Columns created: id, ancestor_id, taxon_label, # origin_time_delta, branch_length Newick Parsing Options ---------------------- .. code-block:: python # Integer branch lengths (uses nullable integer dtype) df = pfl.alifestd_from_newick( "((A:1,B:2):3,(C:4,D:5):6);", branch_length_dtype=int, ) # Include ancestor_list column for compatibility df = pfl.alifestd_from_newick( "((A,B),(C,D));", create_ancestor_list=True, ) # Polars version df_polars = pfl.alifestd_from_newick_polars("((A,B),(C,D));") Exporting to Newick -------------------- .. code-block:: python # Export to Newick string newick_str = pfl.alifestd_as_newick_asexual(df) # Include taxon labels from a column newick_str = pfl.alifestd_as_newick_asexual( df, taxon_label="taxon_label", ) Tabular File Formats (CSV, Parquet) ==================================== Use standard Pandas and Polars I/O utilities for reading and writing phylogeny DataFrames. Parquet is recommended for large phylogenies due to columnar compression, selective column loading, explicit typing, and efficient enum-based categorical string storage. .. code-block:: python import pandas as pd import polars as pl # CSV --- Pandas df.to_csv("phylogeny.csv", index=False) df = pd.read_csv("phylogeny.csv") # Parquet --- Pandas df.to_parquet("phylogeny.pqt") df = pd.read_parquet("phylogeny.pqt") # Parquet --- Polars (selective column loading) df_polars.write_parquet("phylogeny.pqt") df_polars = pl.read_parquet( "phylogeny.pqt", columns=["id", "ancestor_id", "origin_time"], ) Selective column deserialization is particularly advantageous with Polars streaming operations. See the `Pandas I/O docs `_ and `Polars I/O docs `_ for full details. Remote and Cloud Sources ======================== DataFrame libraries transparently handle URLs and cloud storage: .. code-block:: python import pandas as pd # From URL df = pd.read_csv("https://example.com/data/phylogeny.csv") # From S3 df = pd.read_parquet("s3://bucket/phylogeny.pqt") # From Google Cloud Storage df = pd.read_parquet("gs://bucket/phylogeny.pqt") .. code-block:: python import polars as pl # Polars also supports remote sources df = pl.read_parquet("s3://bucket/phylogeny.pqt")