Taxonomy and Orchids

[ This is a work in progress, and will be appended as I gather more data... ]

This is a huge topic, but I will try to make it clearer by breaking it up into more digestible bits. As often as possible, I will also try to use "common language" rather than "scientific terminology", since that should convey the message more easily to a wider audience. For instance instead of "phenotypically similar" I will use "look the same", since there is no fundamental difference between these phrases.

First and foremost, this effort is to support the fact that we are NOT adopting recent changes to orchid names based on DNA analysis, until we are confident in the methodology behind these name changes. The fact that Laelia purpurata has been moved between 3 genera in the space of a year, proves that there is some fundamental problem with the data and/or processes that led to these conclusions (Laelia -> Sophronitis -> Cattleya). We are not alone in this opinion as evidenced by the following statement by Dr. Ruben P. Sauleda (complete text is accessible at this link.) :

"As a taxonomist with a Ph. D. in orchid taxonomy from the University of South Florida, I wish to note that I am very aware of the changes being made to orchid nomenclature based on DNA studies. Many of the changes are not new and the genera being used were actually established in the 1800s but were ignored until they conveniently fit the DNA data. Some of the changes I agree with but, some I totally disagree with. Placing Laelia tenebrosa in the genus Sophronitis demonstrates that something is seriously wrong with the data that led to this conclusion. In addition, the orchid family is a horticulturally important group and this should be taken into consideration before making drastic nomenclatural changes. It suffices to recognize that sections are distinct groups without the need to raise them to generic level. I feel that many of these changes are not necessary and just lead to confusion among orchid hobbyists. Therefore, unless they are generic names that have been in use, I do not plan to use them on my web site or on my labels at this time."

Another reknowned Brazilian taxonomist is Francisco Miranda. He is intimately familiar with the Cattleya Alliance, and within Brazil especially. Read his well-organized thoughts here at this link. If you're in a hurry, jump down near the last third of the article, since that focuses on DNA analysis, and his take on that aspect of modern taxonomy.

Now if at this stage you're saying "Claudio's just lazy and doesn't like change", you are sadly mistaken. I have no issues with change at all, and in most cases I welcome it. However, I DO have major issues with "change for the sake of change". Any change, and in particular major changes, should always be accompanied by strong, supportive and irrefutable data. In my humble opinion, this is not the case with most of the recent change proposals to genera within the orchid family.

Before I proceed, I'll give you a bit of my background, and why I feel justified in stating my perspective. In university, I studied taxonomy, botany and genetics, but specialized in mathematics (specifically statistics) and Computer Sciences. My current field is computers, primarily focusing on super-computers, as are used in weather-forecasting, space exploration, gene-mapping, and a host of other scientific and medical applications. I feel that this gives me a unique skill-set to comment on the efforts in current DNA analysis, as this requires a solid understanding of statistics and sampling, since statistics are so often mis-applied in the real world to support invalid conclusions. I don't pretend to be more knowledgeable than the taxonomic scientists doing the DNA analysis, but I am familiar enough with the underlying methods used to comment on them.

I am compelled to ponder why this recent preliminary work has led to a quick acceptance and entire overhaul of the Cattleya Alliance. It took nearly 20 years for Encyclia to be accepted as a viable separation from Epidendrum, yet in the course of 8 years it was quickly adopted that very few species in the Cattleya Alliance were correctly grouped. In fact, more effort appears to have been applied to what naming convention to use for reassignment, than to ensure that this was the right course of action. Even more important, it appears that everyone is still assuming that Taxonomy should be accepted as an exact science, which it really is not. Entitities in nature are NOT static, especially in Alliances that are still in the process of speciation. Does DNA analysis make any of this more exact ? Quite the contrary. DNA analysis is merely another means of trying to quantify what separates one species from the next. If the genetic pool of that species is still in flux, then DNA analysis would tend to introduce more confusion regarding interpretation of data, and increases the chances of arriving at wrong conclusions.

Next I will list a few references, since the concepts they describe will be referred to in my commentary of the Cattleya Alliance analysis :

  1. "Humans share 99.4 percent of their DNA with chimpanzees, 85 percent with dogs, and 70 percent with slugs." How is this possible you ask ? See this link, for some enlightenment on transposons, horizontal transfer and how "The mapping of the human genome shows that about half of our genetic code is derived from transposable elements." (Point CAA-1)
  2. Notice that the sharing of DNA is not limited to animals, as humans "share 60% of our DNA with a banana", referenced at this link.
  3. Since the ultimate goal is to separate species into related groups. one cannot ignore the aspect of introgression, which is defined here. This phenomenon has already been used (in literature) to support characteristics of plants within the Cattleya alliance. In a nutshell, if a chance hybridization occurred between species A and B, and the progeny from that interbred with one of the parent groups such as A. Over centuries, some of the genes from this hybridization event would be present in every member of species A. How do you now distinguish between species A and B ? DNA analysis would lead you to surmise that these two species share a common ancestor, which is a false conclusion. The analysis could further lead to a separate sub-branch for these species to support their "close relationship", thus further obscuring the truth.(Point CAA-3)

Now for a couple of rules when using/applying statistics :

  1. A methodology applied to data from a given sample, leads to a set of conclusions. You cannot choose which conclusions you wish to document as supported by the data. It is an "all or none" proposition, since selective exclusion from the final outcome would ignore the fact that the "erroneous" data was involved in arriving at ALL final conclusions.
  2. If any conclusion is deemed to be in error, then the entire process needs to be repeated after correcting/addressing the error.
  3. For statistical conclusions to be meaningful, the sample must be truly representative of the population being analysed.(Point STAT-3)
  4. The lower the level of diversity in a sample, the greater the need for stringency, and the higher the risk of false outcomes.(Point STAT-4)
  5. When selecting a sample, every measure must be taken to ensure that none of the enforced criteria lead to a potential skewing of the selection.(Point STAT-5)

The document outlining the research that led to the rearrangement of genera and species within the Cattleya alliance is "A PHYLOGENETIC ANALYSIS OF LAELIINAE (ORCHIDACEAE) BASED ON SEQUENCE DATA FROM INTERNAL TRANSCRIBED SPACERS (ITS) OF NUCLEAR RIBOSOMAL DNA (published Lindleyana 15(2): 96114. 2000)", which I will refer to as PAL for brevity. I will group my comments under two headings which follow.

Issues with sampling

  1. Is point STAT-3 above actually being satisfied ? The sampling shows no provenance of the source material relative to it's distribution in the wild. For instance, there are two distinct wild populations of Epi. ciliare (Colombian and Caribbean), yet only 1 sample was used in the PAL analysis. This begs the question of which population was actually being sampled ?
  2. The habitat of Laelia/Sophronitis/Cattleya purpurata extends some 750 miles, but it is seldom more than a few miles wide along that range. This particular species was one of the "problematic" ones, and kept landing in the middle of Sophronitis. This would not be surprising at all if the single DNA sample that was used came from one of the areas where purpurata intersects the distribution of Sophronitis. (See Points CAA-1 and CAA-3) At the very least, in cases such as this, I would expect that a minimum of 3 wild samples should be used rather than one. Samples taken from both extremes in distribution, and one from the middle. It may be that the samples would be identical for these 3 locations, but to simply assume they are without verification is a potentially dangerous oversight.
  3. Another "problematic" species is C. maxima, for which there were two samples taken. But there are two distinct forms of C. maxima, a highland form and a lowland form. So then, was one sample from each of these populations, or were both samples from the same population ?
  4. I am assuming that every sample was from a wild population, since using samples from captive populations could be unpredictable without having a full pedigree which is fully traceable back to wild ancestors and locations of collection.(Point STAT-3)
  5. The requested sample generated by the tooling was 10,000 trees. This is an artificial limit imposed by the user, but is it appropriate ? If the process of selection was in the midst of doing a pass over the data, then you have unintentionally skewed the data already. It would make better sense to determine what numbers are achieved when doing complete passes of the data, and then choose the number of generated trees based on completed passes, rather than impose an artificial limit right from the beginning.(Point STAT-5)

Issues with outgroup selection

The key factor surrounding the selection of an outgroup is just how much it will help to filter out erroneous trees. It would be wise to track this metric (as a percentage) and have it reflect the confidence-level behind the final outcome. The selection of an outgroup is probably the single-most important exercise in DNA analysis, aside from sampling. In this particular case (PAL), the outgroup chosen was comprised of exclusively old-world Epidendroideae (Thunia alba, Pleione chunii, Calanthe tricarinata, Earina autumnalis, and Polystachya galeata). These species come from SE Asia, China, SE Asia, New Zealand, and W Africa respectively.
  1. There is no indication or measure of just how effective this outgroup selection was. My gut tells me it is likely less than 5 %. But because we are in a situation where Point STAT-4 is the case, we should likely want at least a 50 % exclusion. The key concept to embrace here is that the greater the incidence of erroneous data that is missed by filtering, the greater the likelihood of false outcomes and the lower the confidence-level.
  2. IMHO, this outgroup is more of a "far outgroup" since I suspect there is little relevance offered by this selection. Their distributions have no recent associations (in terms of evolution) with those of the group being studied. I would even venture to propose that an outgroup of 5 Gesneriad species would lead to similar results to those achieved by the outgroup that was used in PAL.
  3. There are several options for more meaningful outgroups that could be used in addition to the one selected. Since the Laeliinae are a fairly young alliance in terms of evolution, it would make more sense to have an outgroup which is clearly distinct from Laeliinae, but cohabitates throughout the same wild habitats. Two groups that satisfy this requirement would be the Catasetinae and Oncidiinae. This would at least help address the issues behind Points CAA-1 and CAA-3.

I could keep on going ad nauseum, but will stop here for now. As soon as time permits, I will be taking a look at the DNA analyses that have thrown the Pleurothallidinae and Oncidinae into a state of turmoil. I am truly curious whether the same problems stated above would also apply to those scientific results. In closing I'll offer you a quote from the author of PAL :
Further work is needed to clarify the relationships of Laeliinae both at the generic and species levels, although most of the outgroup relationships have been well resolved with ITS data alone.
The first part of the sentence is quite telling, since it represents a sort of "disclaimer" by the author on his results. So then who made the decision to accept this work as gospel ? What ratio of taxonomists agree with this work ? Is DNA analysis really at the point where solid conclusions can be arrived at, or will it only lead to invalid conclusions that will overturned in the next 10 years ? The second part of the quoted sentence is merely an opinion, and doesn't really mean a great deal due to the issues surrounding the outgroup selection. How surprised would you be if I unequivocally stated that I have proven that none of the Cattleya Alliance is closely related to a Petunia ?

To be continued ...