Skip to main content
Log in

Testing for treeness: lateral gene transfer, phylogenetic inference, and model selection

  • Published:
Biology & Philosophy Aims and scope Submit manuscript

Abstract

A phylogeny that allows for lateral gene transfer (LGT) can be thought of as a strictly branching tree (all of whose branches are vertical) to which lateral branches have been added. Given that the goal of phylogenetics is to depict evolutionary history, we should look for the best supported phylogenetic network and not restrict ourselves to considering trees. However, the obvious extensions of popular tree-based methods such as maximum parsimony and maximum likelihood face a serious problem—if we judge networks by fit to data alone, networks that have lateral branches will always fit the data at least as well as any network that restricts itself to vertical branches. This is analogous to the well-studied problem of overfitting data in the curve-fitting problem. Analogous problems often have analogous solutions and we propose to treat network inference as a case of model selection and use the Akaike Information Criterion (AIC). Strictly tree-like networks are more parsimonious than those that postulate lateral as well as vertical branches. This leads to the conclusion that we should not always infer LGT events whenever it would improve our fit-to-data, but should do so only when the improved fit is larger than the penalty for adding extra lateral branches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. It may not be possible to recover the true network. The data might be structured so that a false lateral branch will reduce the parsimony score more than one or more of the real lateral branches. In this case, no matter what the threshold, parsimony cannot recover the true network.

  2. To derive the AIC, a few background assumptions are needed. For example, there are certain regularity conditions that have to hold for the likelihood function to be asymptotically normal and there has to be enough data to ensure that the likelihood function will approximate its asymptotic properties. See Forster and Sober (1994) and Burnham and Anderson (2002) for more details.

References

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. Pages 267–281 in Second International Symposium on Information Theory. Akademiai Kiado, Budapest

    Google Scholar 

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Aut Control 19:716–723

    Article  Google Scholar 

  • Appleby CA, Tjepkema JD, Trinick MJ (1983) Hemoglobin in a nonleguminous plant Parasponia: possible genetic origin and function in nitrogen fixation. Science 220:951–953

    Article  Google Scholar 

  • Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical and information-theoretic approach, 3rd edn. Springer, New York

    Google Scholar 

  • Chan CX, Darling AE, Beiko RG, Ragan MA (2009) Are protein domains modules of lateral genetic transfer? PLoS One 4(2):e4524

    Article  Google Scholar 

  • Davis CC, Wurdack KJ (2004) Host-to-parasite gene transfer in flowering plants: phylogenetic evidence from Malpighiales. Science 305:676

    Article  Google Scholar 

  • Farris JS (1983) The logical basis of phylogenetic analysis. In: Platnick NI, Funk VA (eds), Advances in cladistics II. Columbia University Press, New York, pp 7–36. Reprinted in Sober E. (1994), Conceptual Issues in Evolutionary Biology, MIT Press, Cambridge, pp 333–362

  • Felsenstein J (1973) Maximum likelihood and minimum-step methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240–249

    Article  Google Scholar 

  • Forster M, Sober E (1994) How to tell when simpler, more unified, or less ad hoc theories will provide more accurate predictions. Br J Philos Sci 45:1–35

    Article  Google Scholar 

  • Hein J (1990) Reconstructing evolution of sequences subject to recombination using parsimony. Math Biosci 98:185–200

    Article  Google Scholar 

  • Hein J (1993) A heuristic method to reconstruct the history of sequences subject to recombination. J Mol Evol 36:396–405

    Article  Google Scholar 

  • Jeffreys AJ (1982) Evolution of globin genes. In: Dover GA, Flavell RB (eds) Genome evolution. Academic Press, New York, pp 157–176

    Google Scholar 

  • Jin G, Nakhleh L, Snir S, Tuller T (2006) Maximum likelihood of phylogenetic networks. Bioinformatics 22(21):2604–2611

    Article  Google Scholar 

  • Jin G, Nakhleh L, Snir S, Tuller T (2007) Inferring phylogenetic networks by the maximum parsimony criterion: a case study. Mol Biol Evol 24(1):324–337

    Article  Google Scholar 

  • Jukes TH, Cantor CR (1969) Evolution of protein molecules. Mammalian-protein metabolism. Academic Press, New York, pp 21–132

    Google Scholar 

  • Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120

    Article  Google Scholar 

  • Kluge AG (2005) What is the rationale for ‘Ockham’s Razor’ (a.k.a. Parsimony) in phylogenetic inference? In: Albert V (ed) Parsimony, phylogeny and genomics. Oxford University Press, Oxford, pp 15–42

    Google Scholar 

  • Kubo H (1939) Über hämoprotein aus den wurzelknöllchen von leguminosen. Acta Phytochim (Tokyo) 11:195–200

    Google Scholar 

  • Markarenkov V, Legendre P (2004) From a phylogenetic tree to a reticulated network. J Comput Biol 11(1):195–212

    Article  Google Scholar 

  • Moret BME, Nakhleh L, Warnow T, Linder CR, Tholse A, Padolina A, Sun J, Timme R (2004) Phylogenetic networks: Modeling, reconstructibility, and accuracy. IEEE/ACM Trans Comput Biol Bioinform 1(1):13–23

    Article  Google Scholar 

  • Nakhleh L, Sun J, Warnow T, Linder CR, Moret BME, Tholse A (2003) Towards the development of computational tools for evaluating phylogenetic network reconstruction methods. In: Proceedings of the PSB03. Kauai, Hawaii

  • Nakhleh L, Jin G, Zhao F, Mellor-Crummey J (2005) Reconstructing phylogenetic networks using maximum parsimony. In: Markstein V (ed). Proceedings of the 2005 IEEE computational systems bioinformatics conference (CSB2005); August. pp 93–102

  • Park HJ, Jin G, Nakhleh L (2010) Bootstrap-based support of HGT inferred by maximum parsimony. BMC Evol Biol (forthcoming)

  • Posada D, Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of the AIC and Bayesian approaches over likelihood ratio tests. Syst Biol 53:793–808

    Article  Google Scholar 

  • Posada D, Crandall KA (1998) Modeltest: testing the model of DNA substitution. Bioinformatics 14(9):817–818

    Article  Google Scholar 

  • Sober E (2004) The contest between likelihood and parsimony. Syst Biol 53:6–16

    Article  Google Scholar 

  • Sober E (2008) Evidence and evolution—the logic behind the science. Cambridge University Press, Cambridge

    Google Scholar 

  • Than C, Ruths D, Nakhleh L (2008) PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 9:322

    Article  Google Scholar 

  • Tuffley C, Steel M (1997) Links between maximum-likelihood and maximum parsimony under a simple model of site substitution. Bull Math Biol 59:581–607

    Article  Google Scholar 

  • Vinogradov SN, Hoogewijs D, Bailly X, Arredondo-Peter R, Gough J, Dewilde S, Moens L, Vanfleteren JR (2006) A phylogenomic profile of globins. BMC Evol Biol 6:31–47

    Article  Google Scholar 

  • Wiley E (1981) Phylogenetics: the theory and practice of phylogenetic systematics. Wiley-Interscience, NY

    Google Scholar 

Download references

Acknowledgments

We thank David Baum, Rob Beiko, Matt Haber, Ehud Lamm, Bret Larget, Luay Nakhleh, Mike Steel, and an anonymous referee for helpful discussion. This paper was first presented at the workshop, Perspectives on the Tree of Life, sponsored by the Leverhulme Trust and held in Halifax, Nova Scotia, July, 2009.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joel D. Velasco.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Velasco, J.D., Sober, E. Testing for treeness: lateral gene transfer, phylogenetic inference, and model selection. Biol Philos 25, 675–687 (2010). https://doi.org/10.1007/s10539-010-9222-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10539-010-9222-6

Keywords

Navigation