Specimen Data Best Practices

Table of contents

Overview
Recommendations
- Specimen Data (DD1, DD2)
Authors
Contributors
References
Citation
References
Licence
Document Control
- Edit This Page

Overview

Specimen data should provide information about why data fields have been left incomplete, and should tell the user if Optical Character Recognition (OCR) was applied. More information on why this is important can be found in the Manual Transcription Guidance.

Recommendations

Specimen Data (DD1, DD2)

Level: Advanced

Use Case: As a researcher I want to know if data is reliable/complete so that I can determine if it can be included to my research.

Recommendation:

DD1: When data is extracted from the digitalisation platform to CMS, make sure there is information available about a missing datafield: (1) if the field is marked empty/missing by the digitation operator or (2) if the field was not databased at all by the operator.

DD2: If Optical Character Recognition (OCR) is applied during the ETL process, the CMS should support marking the data field to be "automatically filled" and the ETL process should make sure to fill in this information.

Discussion

Data field value can be one of the following:

Absent: information has not been documented at time of collection event and can not be later resolved
Unknown: information is documented but is not yet databased
Unknown:missing: the information could have been databased but is absent
Unknown:indecipherable: the information appears to be present but failed to be captured
Automatically filled: information has been databased using automated methods (OCR) but not yet cleaned/verified by a human
Default: information is present and has no known problems
Erroneous: information is present but contains errors/marked as unreliable by a human
Unknown:withheld: information is databased but has been withheld by the provider (Note: not a factor for ETL processes; this is a data publishing problem)

Implementation

See Manual Transcription Guidance for more information

References

Dillen M, Groom Q, & Hardisty A. (2019). Interoperability of Collection Management Systems. Zenodo. https://doi.org/10.5281/zenodo.3361598 (p5 recommendation #8)

Groom Q et al. (2019) Improved standardization of transcribed digital specimen data. Database, Volume 2019, 2019, baz129. https://doi.org/10.1093/database/baz129 (table 2)

Authors

Zhengzhe Wu and Esko Piirainen
Finnish Museum of Natural History (Luomus)

Contributors

Lisa French, Laurence Livermore

References

Dillen M, Groom Q, & Hardisty A. (2019). Interoperability of Collection Management Systems. Zenodo. https://doi.org/10.5281/zenodo.3361598
Groom Q et al. (2019) Improved standardization of transcribed digital specimen data. Database, Volume 2019, 2019, baz129. https://doi.org/10.1093/database/baz129

Specimen Data Best Practices

Overview

Recommendations

Specimen Data (DD1, DD2)

Authors

Contributors

References

Citation

References

Licence

Document Control

Edit This Page