Friday, September 10, 2010

open access fauna

The fauna from the site of Chogha Mish, Iran, has been published on Open Context. It was released by Levent Atici, Justin Lev-Tov, and Sarah Kansa in late August. The original analysis was done by Jane Wheeler in the 60s. Honestly, I'm not certain I understand the purpose of Open Context (to get the data out there, presumably, but what else has been done or is planned to be done with this information? who can use it and for what?) That information may be on the webpage, I haven't looked yet.

I'm of two minds about publishing raw faunal data on the web. My major concern is that archaeologists who are not themselves zooarchaeologists (or people who are zoologists, ecologists, etc.), may use and interpret the data naively. For example, when I compare faunal remains from archaeological sites, I routinely compensate for problems in published analyses. An inexperienced analyst may identify multiple species of gerbils, for example, but I know there is no way those species could be told apart by skeletal material alone. I would automatically re-classify those bones as just "gerbils, in general", especially if I knew the person who did the original analysis was not particularly experienced. Nobody likes to admit this, but an unfortunate number of CRM faunal reports are done by unqualified analysts. I've done a lot of CRM myself, I'm not knocking it as a profession, but there are problems with some reports. Unfortunately, an archaeologist without specialization in zooarchaeology, who was trying to reconstruct the paleoclimate by using the habitat tolerances of those specific species of gerbils, may not realize that the data are flawed. (Quick note - I am NOT saying that the original analysis at Chogha Mish was flawed, I'm just giving an example of potential problems with on-line data dumps.)

Another potential problem is that non-zooarchaeologists often don't understand taphonomic processes, cultural filters, and other factors that skew faunal assemblages. In a previous post, I mentioned that artiodactyl remains increased through time in Ventana Cave, Arizona, due to changes in the site function (Bayham 1982). Long ago, I had a short but interesting discussion with Paul Martin (the geologist from Arizona) about the meaning of increasing artiodactyls in the archaeological record in the American West. Dr. Martin is certainly not naive about faunal data, but he was surprised by my contention that more artiodactyls in archaeological sites do not necessarily represent more artiodactyls in the surrounding landscape. Rather, I argued, the change represented a cultural filter. This is not a concept most natural scientists have been trained to consider, yet these same natural scientists may be very interested in our data.

One final complaint about the Open Context publication of Chogha Mish: I find the interface incredibly difficult. I hope it is possible to download all of this data into a more functional format (like an Excel spreadsheet), otherwise it's unusable. Again, I haven't taken the time to look at the site in detail, so maybe there is an easy way to do so.


  1. [Sorry if this is here twice, I got an error when I submitted my comment, so I'm trying again]


    Thanks for the discussion about Chogha Mish in Open Context.

    I very much agree with your point about the difficulties associated with data sharing. Data sharing is a complex and theoretically challenging undertaking. However, the problem of mis-use and misintepretation is not something unique to datasets. Journal papers can and are misused both my novices and by even by domain specialists who fail to give a paper a careful read. Despite these problems and potential for misuse, we still publish papers because the benefits outweigh these risks. Similarly, I think we should still publish researcher datasets, because such data can improve the transparency and analytic rigor of analysis.

    One of the points of posting the Chogha Mish data was that it helped illustrate some useful points about how to go about data sharing in a better way. If you see the ICAZ Poster associated with the project, there are many recommendations regarding the need to contextualize data (including editorial oversight of data publication). Ideally, data publication should accompany print/narrative publication, since the two forms of communication can enhance each other. Most of the data in Open Context comes from projects with active publication efforts, and as these publications become available, Open Context and the publications will link back and forth.

    Regarding why we published these data, the point is to make these available, free-of-charge, and free of copyright hinderance for anyone to reuse. These can be used in a class to teach analytic methods (one can ask a class to interpret the kill-off patterns, or ask them to critique the data and probe its ambiguities and limits). It can be used with other datasets for some larger project. The "About Section" of Open Context explains more.

    Last, thanks for identifying an interface flaw. You helped find a bug where downloadable tables associated with projects weren't showing up. The bug is fixed and when you look at the Chogha Mish Overview, you'll find a link to a table you download and use in Excel or similar applications. Feed back like this is really important for us, otherwise we won't know how to improve Open Context. So this is much appreciated!!

    (Open Context Lead Developer)

  2. Thanks for your comment! Sorry it wasn't posted immediately. For some reason, my spam filter caught it.

    Thanks for the links - they greatly improved my understanding of the project and its context. Thanks also for fixing the downloading feature. I like the idea of using this data in a class. I've used my own data in the past, but there's something to be said for using more than one assemblage, or an assemblage that the professor has no vested interest in.

    It is definitely true that journal articles can be mis-used, but those articles do have analysis in them, which we can hope people will read. Also, published data has often been "cleaned up" - for example, by taking out suspected intrusive specimens. On the one hand, that kind of data manipulation is not very transparent. On the other hand, we're using our education and experience to analyze the raw data, which is exactly what we're supposed to do. That's why they pay us the big bucks! (Well, in my case, they're pretty small bucks. I seem to have a foraging efficiency problem.)

    I may be somewhat over-sensitive to the issue, since I've personally run into difficulties both with misinterpretation of my data by ecologists and with premature raw datasharing before I had fully published. Thanks for your efforts in this field, though. Clearly, this type of data sharing is in our future, and we need to figure out how it can be done well, while protecting everyone's interests.