Archivos de diario de septiembre 2021

01 de septiembre de 2021

An examination of Research Grade Observations in the genus Callirhoe

Introduction

The genus Callirhoe is a genus of showy, mostly reddish and sometimes white flowering plants in the Malvaceae family native to North America. The most recent study of the genus was by Dorr in A Revision of the North American Genus Callirhoe (Malvaceae) published in 1990 and revised for the Flora of North America (FNA) in 2015.

In the spring of 2021, I began informally examining a number of different type of observations in the Callirhoe dataset for various reasons, sometimes to annotate lesser known species, sometimes to verify identifications or sometimes to add identifications. The occurence of some observations of Callirhoe leiocarpa in the Dallas area (outside their known range) in 2020 had sparked my interest in the genus and I returned to it in 2021. Over the course of a couple of months I noticed a non-trivial number of incorrect observations among Research Grade(RG) observations. At some point, my focus switched to looking for incorrect identifications and I began looking through Callirhoe observations in Texas chronologically, spot checking images in the grid view of the iNaturalist explore window, but I did not track how many were incorrect. After looking through numerous observations and getting part way through the year of 2018, I finally decided that it might be interesting to know what sort of error rate was occurring. Given the relatively large number of observations of Callirhoe (over 10,000), I decided to initially examine a small subset, though eventually examined all RG observations from Texas for the years of 2018 and 2019, including some which were likely RG before I previously made an identification, for a total of 1521. I adjusted the set under examination to exclude RG observations where an identification I had made affected the status and to include Needs ID observations where my identification had changed the status from RG. The error rate for the two years combined was about 19.1%.

Species

There are nine species in the genus, of which the following six are found in Texas: C. alcaeoides, C. involucrata, C. leiocarpa, C. papaver, C. pedata, and C. scabriscula, which has not yet been observed on iNaturalist due to its rarity. C. involucrata is the most common and widespread Callirhoe species in North America. It ranges throughout most of Texas, encompassing the ranges of all the other species in Texas except for parts of the range of C. papaver in east Texas. Two additional species, C. bushii and C. digitata, have historically been found close to or along the Texas Oklahoma border, but are not thought to occur in Texas (Kartesz 2015).

Selection of observations

While I did examine all RG observations from Texas from 2018(594) and 2019(937), I was primarily interested in observations which were Research Grade without my interaction. This required some manipulation of the dataset, removing RG observations where my identification was critical to the observation acheiving RG status and adding to the dataset observations where my identification had changed the observation from RG to Needs ID. Observations that were RG where I simply agreed with the previous identifications which made the status RG were left in the totals since my identification had no effect on achieving RG status.

Limiting the observations to the years of 2018 and 2019 in Texas was mostly based on convenience. I initially started with the year 2019, which was attractive because it was a reasonably large data set and also the year for which I had made the lowest percentage of identifications. I initially intended to evaluate only a small subset of the 2019 observations in order to get a rough figure for the error rate, but I eventually examined all of them and added observations from 2018 for comparison. Limiting the observations to Texas was based mostly on my own familiarity with Texas species and comfort with identifying them. With that said, most Callirhoe observations are currently from Texas and it also contains six of the nine known species, five of which have observations on iNaturalist.

Observation numbers were download as CSV files from iNaturalist by filtering by the genus Callirhoe in Texas for the years 2019 and 2018, repectively, accessed on June 3, 2021 and July 9, 2021. Observations were viewed through the iNaturalist web interface. I examined the observations in random order, first from 2019 and then 2018, but made no attempt to mask other data such as the observer or identifiers names, location and so on.

Standard for correctness

I decided to track whether an identification was incorrect rather than correct. The vast majority of observations, even Research Grade ones, do not show the features one would normally use for identification in a taxonomic key, and so it is often not possible to prove what species a specimen is. However, one can sometimes see a feature which disproves the identification. Most often this was something like the presence or absence of an involucel, but a number of features can be used. For species descriptions and identifying characteristics, I referred to the works of Dorr(1990,2015) and Diggs et al (1999). The reader can refer to my previous post "A Short Guide to Callirhoe in Texas" for methods of identification. On rare occasions I would defer primarily to location, typically only in cases where an observation was identified as C. alcaeoides well outside of its known range.

Given the approach of tracking only incorrect identifications, the results here likely represent a lower bound for the error rate for this data set as there are numerous Research Grade observations which likely don't contain enough information to actually prove or disprove a species level identification. Additionally, even though it is sometimes possible to tell if an identification is incorrect, it is not always possible to make a species level identification in these cases. A common example would be an observation identified as C. involucrata, which upon examination shows a lack of an involucel or the presence of a valvate bud. This would disprove the identification of C. involucrata, but may not offer enough proof to distinguish between C. leiocarpa and C. pedata.

There is a larger discussion to be had about what should constitute enough information for an identification, and this discussion unfortunately can devolve into a broader discussion about the purpose and structure of iNaturalist itself which is beyond the scope of this post. As mentioned, most Callirhoe observations do not show the morphological features one would normally use for identification. To be rigourous, one would likely need to identify those observations only to genus level, and this would remove a significant percentage of Callirhoe observations from RG status. For the time being, for this endeavor at least, I have chosen to track and offer identifications on only those which are fairly clearly incorrect (or in the rare case correct) and leave the rest alone. This unfortunately leaves a fair number of observations at RG status which may or may not be correct, which is not a very satisfying solution given that RG status has implications such as affecting the iNaturalist Computer Vision model and observations being exported to external services such as GBIF. Ironically, for Callirhoe, due to the sheer abundance of C. involucrata compared to the other species, there is probably a high likelihood that something identified as C. involucrata actually is C. involucrata, even if it does not show the features one would need for a positive identification.

Results

The error rate for the adjusted dataset for the two years combined was 19.1%. There was a noticeable difference between the two years, at 15.6% for 2018 and 21.4% for 2019.

Incorrect Research Grade Identifications
Year RG Observations Adjusted RG Observations Incorrect Percent Incorrect
2018 594 617 96 15.6
2019 937 904 194 21.4
2018+2019 1531 1521 290 19.1

The identifications which were not deemed incorrect(approximately 80.9% or 1231) were not necessarily correct. They ranged from merely being possible as in cases where only the non-distinguishing features of a flower were shown all the way to correct where all of the distinguishing features required for an identification were clearly shown, though these appeared to be only a small minority. With most of these observations I found it difficult to decide how to separate those which should be identified only to genus and those which could be identified to species based on what morphological features were shown, and I ultimately could not come up with a good solution for the problem. As an initial assessment, I arrived upon the following rough categorization of the non-incorrect identifications, but I found even my own application to be inconsistent and so this is a rough estimate only, possibly to be revisited in the future.

Categories of Possibly Correct Identifications
Category Number Percent Description
Plausible to Likely 647 42.5 Observation showed one or more features supporting the identification which ruled out some but not all other possibilities found in or near the state. Examples could range from partially visible hairs on the sepals to presence of an involucel.
Possible 372 24.5 Observation showed no or only very weakly distinguishing features. Examples were those that showed only the corolla petals or maybe corolla petals appearing close to the ground.
Correct 141 9.3 Observation showed enough features to rule out other possibilities found in the state.
Uncertain 50 3.3 Observation had some quality, sometimes unclear, which warranted raising the identifcation to genus level, though which did not make it clearly incorrect.
Rosette 17 1.1 Observation showed only leaves.

Trends

A closer examination of the results revealed at least two trends. First, the majority of erroneous identifications were identified as C. involucrata. This trend is unsurprising. C. involucrata is the most common and widespread species of Callirhoe in North America, and so it as not surprising that observers and identifiers might erroneously identify a specimen as C. involucrata. Also, C. involucrata would most likely have been included in the earliest Computer Vision models due to the number of observations, and it, along with C. pedata may have been the only two Callirhoe suggested by the iNaturalist system.

The second trend is that the species most commonly misidentified was C. leiocarpa. It is difficult to determine the ultimate cause of why C. leiocarpa was misidentified so often, but at least two reasons seem possible. First, though I have not examined the data in detail, it is clear that the range of C. leiocarpa seems to be expanding, and numerous observations of it occurred around major metropolitan areas like Dallas, Fort Worth, and Houston, all areas where it was thought not to occur previously (Dorr 1990). It seems plausible that naturalists in those areas were not familiar with C. leiocarpa and were thus identifying it as other species such as C. involucrata and C. pedata. In addition to that, it is likely that C. leiocarpa was not included in the earliest versions of the Computer Vision model on iNaturalist, so that it would not have occurred as a suggestion by the iNaturalist system and observers and identifiers may not have even known it was a possibility. More research might reveal the answer.

Most commonly assigned species in incorrect identifications
Species Times used as incorrect ID
C. involucrata 247
C. pedata 35
C. alcaeoides 6
C. leiocarpa 2
C. papaver 0
Total 290

Most Common Misidentifications where species could be identified
Actual Species Identified as Occurences
C. leiocarpa C. involucrata 95
C. leiocarpa C. pedata 18
C. pedata C. involucrata 18
C. papaver C. involucrata 10
C. involucrata C. alcaeoides 3
C. alcaeoides C. pedata 2
C. involucrata C. pedata 2
C. pedata C. leiocarpa 2
C. papaver C. pedata 1
Total 151

As stated above, in many cases there is enough information to disprove the identification but not enough to prove which other species the specimen is. The most common case of this occurring is when a specimen is identified as C. involucrata but it has a valvate bud or lacks an involucel. I suspect the vast majority of these cases are C. leiocarpa, with a minority of them being C. pedata and a few being C. papaver, as is the case with those which can be identified to species, but I have not attempted a rigorous analysis of this set of observations.

Most commonly assigned species in incorrect identifications where species could not be identified
Species Times used as incorrect ID
C. involucrata 124
C. pedata 11
C. alcaeoides 3
C. leiocarpa 1
C. papaver 0
Total 139

Discussion

There does appear to be a significant error rate for the data examined, and it seems plausible that a similar error rate may have continued without some intervention. The difference in the years could be attributable to any number of factors such as differences in the dataset itself and differences in the iNaturalist observer and identifier community. A significant portion of these errors seems to be attributable to ignorance of C. leiocarpa, and perhaps this could at least partially be rectified by simply correcting the erroneous identifications, which I plan to do. There could however be a significant number of erroneous identifications in the data for other years, which I have not examined rigorously. I have also created a guide, "A short Guide to Callirhoe in Texas", as a starting point to help the iNaturalist community better identify Callirhoe specimens in the future. Hopefully, some difference will be made by identifications that have already been corrected. Additionally, updates to the iNaturalist Computer Vision system may already be helping the issue. Four of the six Texas species (including C. leiocarpa) are now included in the iNaturalist Computer Vision model, though one somewhat common species,C. papaver, is still excluded. It will likely be included in future models as there are almost enough observations to meet the previous criteria to be included in the models.

The question of what to do with the large number of Research Grade observations which do not appear to have enough evidence to support a species level identification remains unanswered. There may be incorrect identifications among them which might raise the error rate, but I suspect the majority of them probably are in fact C. involucrata. However, it would be nice to have some middle ground between Needs ID and Research Grade to indicate that they may be likely to be C. involucrata but that there is not enough proof to fully support that claim. This is an issue with iNaturalist itself though and there does not appear to be any solution on the horizon as far as I can tell.

References

Diggs, G. M., Lipscomb, B. L., O'Kennon, B., Mahler, W. F., & Shinners, L. H. (1999). Shinners & Mahler's Illustrated Flora of North Central Texas. Botanical Research Institute of Texas.
Dorr, L. J. 1990. A Revision of the North American genus Callirhoe (Malvaceae). Mem. New York Bot. Gard. 56: 1–75.
Dorr, L. J. 2015. Callirhoe. In: Flora of North America Editorial Committee, eds. 1993+. Flora of North America North of Mexico. 20+ vols. New York and Oxford. Vol. 6. http://www.efloras.org/florataxon.aspx?flora_id=1&taxon_id=105128
Enquist, Marshall. 1987. Wildflowers of the Texas Hill Country. Lone Star Botanical, Austin, Texas.
iNaturalist. Available from https://www.inaturalist.org. Accessed [2021].
Kartesz, J.T., The Biota of North America Program (BONAP). 2015. North American Plant Atlas. (http://bonap.net/napa). Chapel Hill, N.C. [maps generated from Kartesz, J.T. 2015. Floristic Synthesis of North America, Version 1.0. Biota of North America Program (BONAP). (in press)].
McDaniel, R.T. “A short Guide to Callirhoe in Texas.” iNaturalist, https://www.inaturalist.org/journal/rymcdaniel/54356-a-short-guide-to-callirhoe-in-texas. Accessed August 2021.

Revisions

1.0 - September 1, 2021 - Original revision.

Publicado el septiembre 1, 2021 10:40 TARDE por rymcdaniel rymcdaniel | 5 comentarios | Deja un comentario