PRISTINE Tutorial: End-to-End Phylogenetic Inference¶
This tutorial introduces the PRISTINE framework through worked examples that guide you through simulation, model construction, and parameter inference for a variety of phylogenetic settings.
๐ Sequence Evolution and Substitution Model Inference¶
Learn how to simulate sequences along a phylogenetic tree and recover GTR substitution parameters:
Key steps: - Simulate DNA sequences along a tree using a known GTR model - Build a likelihood function using the Felsenstein pruning algorithm - Optimize the GTR parameters to recover stationary frequencies and exchange rates
๐ Molecular Clock Estimation¶
Estimate divergence times and substitution rates under different clock models:
Continuous Additive Relaxed Clock (cARC)¶
๐ example_01_carc.py
- Simulate distances using a relaxed molecular clock
- Fit node dates and evolutionary rate using maximum likelihood
- Compare estimated vs true node ages
Conditional Error Clock (JC69-based)¶
- Use a binomial model based on JC69 substitution probabilities
- Fit branch durations from simulated distances
- Suitable for shorter sequences or simpler models
๐ณ Joint Estimation of Phylogeny and Divergence¶
Recover both substitution dynamics and divergence times:
- Simulate sequences with a known GTR model
- Fit GTR parameters and internal node dates simultaneously
- Illustrates parameter entanglement and numerical optimization
๐ฑ Diversification Inference: Birth-Death-Sampling (BDS)¶
Constant-Rate BDS¶
๐ example_05_bds_constant.py
- Simulate trees under a fixed birth and sampling process
- Estimate log-likelihood using analytic formulas
- Fit birth and sampling rates from tree shape
State-Dependent BDS¶
๐ example_06_bds_multistate.py
- Simulate sequences that encode hidden states
- Assign state-dependent birth rates
- Use ancestral state probabilities to compute likelihoods
Linear Trait-Dependent BDS¶
- Simulate sequences under a 2-state GTR model
- Let birth rate depend linearly on hidden traits
- Estimate trait effects via maximum likelihood
๐งฎ Optimization and Inference Tools¶
All examples rely on the robust Adam optimizer with backtracking:
- Gradient-based optimization
- Learning rate adaptation
- Safe fallback for numerical instability
The optimizer is defined in optimize.py
.
To assess uncertainty and parameter identifiability: - Use Laplace approximation for posterior variance - Use Likelihood profiling for non-quadratic confidence intervals
Next Steps¶
Explore the overview page for conceptual organization and links to each moduleโs documentation.
All examples are designed to be runnable and modifiable. To explore further: - Change the number of tips or states - Add noise via clock dispersion - Enable curvature diagnostics to test identifiability