A first draft of the “tree of life” for the roughly 2.3 million named species of animals, plants, fungi and microbes has been released, and two University of Michigan biologists played a key role in its creation.
A collaborative effort among 11 institutions, the tree depicts the relationships among living things as they diverged from one another over time, tracing back to the beginning of life on Earth more than 3.5 billion years ago.
The current version of the “tree of life” — along with the underlying data and source code — is available to browse and download.
Tens of thousands of smaller trees have been published over the years for select branches of the tree of life — some containing upwards of 100,000 species — but this is the first time those results have been combined into a single tree that encompasses all of life. The end result is a digital resource that is available free online for anyone to use or edit, much like a “Wikipedia” for evolutionary trees.
Understanding how the millions of species on Earth are related to one another helps scientists discover new drugs, increase crop and livestock yields, and trace the origins and spread of infectious diseases such as HIV, Ebola and influenza.
“This is the first real attempt to connect the dots and put it all together,” said principal investigator Karen Cranston of Duke University. “Think of it as Version 1.0.”
A paper summarizing the findings was published online in Proceedings of the National Academy of Sciences on Sept. 18.
Stephen Smith, assistant professor of ecology and evolutionary biology, heads the group that tackled the nitty-gritty details of piecing together all the existing branches, stems and twigs of life’s tree into a single diagram.
Cody Hinchliff, formerly a postdoctoral researcher in Smith’s lab who is now at the University of Idaho, did much of the heavy lifting on the project and shares first-author credits with Smith on the PNAS paper.
Rather than build the tree of life from scratch, the researchers pieced it together by compiling thousands of smaller chunks that had already been published online and merging them into a gigantic “supertree” that encompasses all named species.
“Many participants on the project contributed hundreds of hours tracking down and cleaning up thousands of trees from the literature, then selecting 484 of them that were used to generate the draft tree of life,” Hinchliff said.
Combining the 484 trees was a painstaking process that took three years to complete, Smith said.
Smith and Hinchliff brought computer savvy and knowledge of evolutionary biology to the project, which required them to write tens of thousands of lines of computer code and to create several new software packages.
“In addition to the process of combining existing trees, much of what was done at the University of Michigan was the development of tools and techniques and the analysis of the tree itself,” Smith said. “To complete this project, we had to code our own solutions. There was nothing out of the box that we could use.”
The aim was to create software tools and algorithms that balanced performance with efficiency when combining large numbers of trees, Hinchliff said.
“Our software, which is called ‘treemachine,’ took a few days to generate the current draft tree of life on a moderately outfitted desktop workstation in Stephen’s office,” he said. “For comparison, other state-of-the-art methods we tried would have taken hundreds of years to finish on that kind of hardware.”
Another challenge faced by the team: The vast majority of evolutionary trees are published as PDFs and other image files that are impossible to enter into a database or merge with other trees.
“There’s a pretty big gap between the sum of what scientists know about how living things are related, and what’s actually available digitally,” Cranston said.
As a result, the relationships depicted in some parts of the tree, such as the branches representing the pea and sunflower families, don’t always agree with expert opinion.
Other parts of the tree, particularly insects and microbes, remain elusive.
That’s because even the most popular online archive of raw genetic sequences — from which many evolutionary trees are built — contains DNA data for less than 5 percent of the tens of millions of species estimated to exist on Earth.
“As important as showing what we do know about relationships, this first tree of life is also important in revealing what we don’t know,” said co-author Douglas Soltis of the University of Florida.
To help fill in the gaps, the team also is developing software that will enable researchers to log on and update and revise the tree as new data come in for the millions of species still being named or discovered.
“This is just the beginning,” Smith said. “While the tree of life is interesting in its own right, our database of thousands of curated trees is an even more useful resource. We hope that this publication will encourage other researchers to contribute their own studies or to enter information from previously published sources.”
“Twenty five years ago, people said this goal of huge trees was impossible,” Soltis said. “The Open Tree of Life is an important starting point that other investigators can now refine and improve for decades to come.”
This research was supported by a three-year, $5.76 million grant from the National Science Foundation, including $900,000 to the University of Michigan.
Other study co-authors include James Allman of Interrobang Corp.; Gordon Burleigh, Ruchi Chaudhary, Jiabin Deng, Christopher Owen of the University of Florida; Lyndon Coghill, Peter Midford and Richard Ree of the Field Museum of Natural History; Keith Crandall of George Washington University; Bryan Drew of the University of Nebraska-Kearney; Romina Gazis and David Hibbett of Clark University; Karl Gude of Michigan State University; Laura Katz and H. Dail Laughinghouse IV of Smith College; Emily Jane McTavish of the University of Kansas; Jonathan Rees of the National Evolutionary Synthesis Center and Tiffani Williams at Texas A&M University.