Skip to content

PRISTINE Framework Overview

PRISTINE is a flexible and differentiable inference engine for phylogenetic and evolutionary modeling. It combines maximum likelihood estimation with robust uncertainty quantification and supports a wide range of biological and statistical analyses.

This overview outlines the types of questions and tasks you can perform with PRISTINE, along with links to the technical documentation for each component.


Key Questions You Can Address

  • How have sequences evolved over a phylogenetic tree?
  • Use the Felsenstein Pruning Algorithm to compute the likelihood of observed sequences given a substitution model and tree.

  • What substitution process best fits my data?

  • Use the GTR model to estimate transition rates between nucleotides or amino acids.

  • When did evolutionary divergences occur?

  • Fit branch times using molecular clock models, including both strict and relaxed clocks.

  • How do traits or latent states influence speciation and sampling rates?

  • Fit state-dependent diversification processes using birth-death-sampling models, including trait-based linear models.

  • How confident am I in the inferred parameters?

  • Estimate parameter uncertainty with Laplace approximation or robust likelihood profiling.

  • Can I visualize non-identifiability or parameter sloppiness?

  • Use curvature diagnostics in Laplace estimation to assess identifiability.

  • How do I fit models robustly in complex loss landscapes?

  • Use gradient-based optimization with backtracking to ensure convergence even when gradients are unstable.

Modules and Their Roles

Component Functionality
Felsenstein Algorithm Efficient likelihood computation over a phylogenetic tree
GTR Model Generalized time-reversible substitution process
Molecular Clock Models Models that link substitutions to branch time
Birth-Death-Sampling Models Diversification modeling over time and states
Laplace Estimation Gaussian approximation of posterior uncertainty
Likelihood Profiling Profile-based confidence intervals
Optimization Stable training via adaptive learning and backtracking

Inference Modes Supported

  • Maximum likelihood estimation
  • Posterior variance approximation
  • Trait-dependent diversification
  • Empirical vs simulated likelihood comparison
  • Parallel profiling and batched optimization

Ideal Use Cases

  • Phylogenetic inference from DNA, RNA, or protein sequences
  • Molecular dating of trees with uncertain divergence times
  • Parameter sensitivity analysis and identifiability diagnostics
  • Fitting state-dependent speciation models
  • Simulation of evolutionary processes for benchmarking

For in-depth technical details, see the linked module documentation.