DESCRIPTION:
-----------
ASTRAL is a tool for estimating an unrooted species tree given a set of unrooted gene trees.
ASTRAL is statistically consistent under the multi-species coalescent model (and thus is useful for handling incomplete lineage sorting, i.e., ILS).
ASTRAL finds the species tree that has the maximum number of shared induced quartet trees with the set of gene trees, subject to the constraint that the set of bipartitions in the species tree comes from a predefined set of bipartitions. This predefined set is empirically decided by ASTRAL (but see tutorial on how to expand it). The current code corresponds to **ASTRAL-III** (see below for the publication).
The algorithm was designed by Tandy Warnow and Siavash Mirarab originally. ASTRAL-III incorporates many ideas by Chao Zhang and Maryam Rabiee.
[Code developers](https://github.com/smirarab/ASTRAL/graphs/contributors) are mainly Siavash Mirarab, Chao Zhang, Maryam Rabiee, and Erfan Sayyari.
### Bug Reports:
Contact ``astral-users@googlegroups.com`` or post on [ASTRAL issues page](https://github.com/smirarab/ASTRAL/issues).
### Other branches
**NOTE**:
Several new features of ASTRAL are not merged in this branch and are available in other branches or git pages.
Please use those branches if you find these features useful.
* **ASTRAL-Pro (Astral for paralogs)**: This new tool, which builds on ASTRAL, can handle multiple copy genes. You can find it here: https://github.com/chaoszhang/A-pro
* **ASTRAL-MP (Multi-threaded ASTRAL)**: A multi-threaded version of ASTRAL is available on [this branch](https://github.com/smirarab/ASTRAL/tree/MP)
* Note: the link https://github.com/smirarab/ASTRAL/tree/MP-similarity should be replaced with https://github.com/smirarab/ASTRAL/tree/M.
* **Astral with user constraints**: A version of ASTRAL that can satisfy user constraints is available [here](https://github.com/maryamrabiee/Constrained-search)
* **Tree updates**: An ASTRAL-based algorithm called INSTRAL enables inserting new species onto and existing ASTRAL tree is available [here](https://github.com/maryamrabiee/INSTRAL)
## Publications
#### Papers on the current version:
- Since version 5.1.1, the code corresponds to **ASTRAL-III**, described in:
* Zhang, Chao, Maryam Rabiee, Erfan Sayyari, and Siavash Mirarab. 2018. “ASTRAL-III: Polynomial Time Species Tree Reconstruction from Partially Resolved Gene Trees.” BMC Bioinformatics 19 (S6): 153. [doi:10.1186/s12859-018-2129-y](https://doi.org/10.1186/s12859-018-2129-y).
- For **multi-individual** datasets, the relevant paper to cite is:
* Rabiee, Maryam, Erfan Sayyari, and Siavash Mirarab. 2019. “Multi-Allele Species Reconstruction Using ASTRAL.” Molecular Phylogenetics and Evolution 130 (January). 286–96. [doi:10.1016/j.ympev.2018.10.033](https://doi.org/10.1016/j.ympev.2018.10.033).
- Since version 4.10.0, ASTRAL can also compute branch length (in coalescent units) and a measure of support called **local posterior probability**, described here:
* Sayyari, Erfan, and Siavash Mirarab. 2016. “Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies.” Molecular Biology and Evolution 33 (7): 1654–68. [doi:10.1093/molbev/msw079](http://mbe.oxfordjournals.org/content/early/2016/05/12/molbev.msw079.short?rss=1)
- ASTRAL can also perform a **polytomy test** (`-t 10` option):
* Sayyari, Erfan, and Siavash Mirarab. 2018. “Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies.” Genes 9 (3): 132. [doi:10.3390/genes9030132](http://www.mdpi.com/2073-4425/9/3/132).
- For practical tips on using ASTRAL see [this preprint book chapter](https://arxiv.org/pdf/1904.03826.pdf).
#### Papers on older versions:
- The original algorithm (ASTRAL-I) is described in:
- Mirarab, Siavash, Rezwana Reaz, Md. Shamsuzzoha Bayzid, Théo Zimmermann, M. S. Swenson, and Tandy Warnow. 2014. “ASTRAL: Genome-Scale Coalescent-Based Species Tree Estimation.” Bioinformatics 30 (17): i541–48. [doi:10.1093/bioinformatics/btu462](doi.org/10.1093/bioinformatics/btu462).
- All the versions between 4.7.4 and 5.1.0 correspond to ASTRAL-II, described in:
* Mirarab, Siavash, and Tandy Warnow. 2015. “ASTRAL-II: Coalescent-Based Species Tree Estimation with Many Hundreds of Taxa and Thousands of Genes.” Bioinformatics 31 (12): i44–52. [doi:10.1093/bioinformatics/btv234](http://bioinformatics.oxfordjournals.org/content/31/12/i44)
#### Papers with relevance to ASTRAL:
These papers do not describe features in ASTRAL, but are also relveant and we encourage you to read them:
1. **ASTRAL-Pro**: This paper extends the ASTRAL methodology to multiple copy genes.
- Zhang, Chao, Celine Scornavacca, Erin K Molloy, and Siavash Mirarab. “ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy.” Edited by Jeffrey Thorne. Molecular Biology and Evolution, September 4, 2020, msaa139. https://doi.org/10.1093/molbev/msaa139.
- **ASTRAL-constrained**: This paper shows how to impose user-defined constraints on ASTRAL
- Rabiee, Maryam, and Siavash Mirarab. “Forcing External Constraints on Tree Inference Using ASTRAL.” BMC Genomics 21, no. S2 (April 16, 2020): 218. https://doi.org/10.1186/s12864-020-6607-z.
- **DiscoVista**: This paper shows how quartet scores (more broadly, genome discordance) can be visualized in interpretable ways. The visualization of quartet scores, in particular, is closely tied to the ASTRAL method.
- Sayyari, Erfan, J.B. James B. Whitfield, and Siavash Mirarab. 2018. “DiscoVista: Interpretable Visualizations of Gene Tree Discordance.” Molecular Phylogenetics and Evolution 122 (May): 110–15. [doi:10.1016/j.ympev.2018.01.019](https://doi.org/10.1016/j.ympev.2018.01.019).
- **Fragmentary data**: The following paper made the case that before inferring gene trees, removing fragmentary data (e.g., those that have uncharacteristically large numbers of gaps) should be removed. It also showed RAxML gene trees are preferable to FastTree trees.
- Sayyari, Erfan, James B Whitfield, and Siavash Mirarab. 2017. “Fragmentary Gene Sequences Negatively Impact Gene Tree and Species Tree Reconstruction.” Molecular Biology and Evolution 34 (12): 3279–91. [doi:10.1093/molbev/msx261](https://doi.org/10.1093/molbev/msx261).
- **Missing data**: The following paper showed that excluding genes because they have missing data is often detrimental to accuracy.
- Molloy, Erin K., and Tandy Warnow. 2018. “To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods.” Systematic Biology 67 (2): 285–303. [doi:10.1093/sysbio/syx077](https://doi.org/10.1093/sysbio/syx077).
- **TreeShrink**: This paper introduced a method for removing very long branches from gene trees in a statistically motivated way. These branches make gene trees less accurate.
- Mai, Uyen, and Siavash Mirarab. 2018. “TreeShrink: Fast and Accurate Detection of Outlier Long Branches in Collections of Phylogenetic Trees.” BMC Genomics 19 (S5): 272. [doi:10.1186/s12864-018-4620-2](https://doi.org/10.1186/s12864-018-4620-2).
- **Sample Complexity**: This paper established the theoretical sample complexity (i.e., number of required genes) for ASTRAL.
- Shekhar, Shubhanshu, Sebastien Roch, and Siavash Mirarab. 2018. “Species Tree Estimation Using ASTRAL: How Many Genes Are Enough?” IEEE/ACM Transactions on Computational Biology and Bioinformatics 15 (5): 1738–47. [doi:10.1109/TCBB.2017.2757930](https://doi.org/10.1109/TCBB.2017.2757930).
- **INSTRAL**: introduces an ASTRAL-based algorithm for adding new species unto an existing species tree; so, the phylogenetic placement problem but for species trees.
- Rabiee, Maryam, and Siavash Mirarab. 2018. “INSTRAL: Discordance-Aware Phylogenetic Placement Using Quartet Scores.” BioRxiv 432906. [doi:10.1101/432906](https://doi.org/10.1101/432906).
- **BestML:** This paper was published before AST