Software
Software will be needed in various aspects of the digitisation process. In many instances free software is available in addition to paid software. This page includes examples of software actively being used by institutions, but is not an exhaustive list.
Pre-Digitisation Curation
Barcode Label Creation
Barcode labels can be used to encode information in a machine-readable format, thus enabling automation. They can be generated in-house or using external companies, and can be permanent or temporary, for example:
- Permanent - encoding specimen/object unique identifiers
- Temporary (for digitisation purposes) - encoding information relating to the specimen/object in the collection, such as location in collection, taxonomy, type status etc. See Slides - Mass Digitisation for a workflow example.
Software | Cost | Barcode Label Creation | Notes |
---|---|---|---|
BarTender | paid | X | |
Collection Management System* | free & paid | X | Most robust way of generating unique identifiers as it ensures a 1:1 match with the database |
EntomoLabels | free | X | Can be used for all collection and preservation label types |
Label Designer | free | X | Tool used in Kotka and Notebook. Guide for Label Designer available here |
* various collection management systems are available
Camera Control
Mass digitisation of specimens/objects is generally carried out using a camera or flat-bed scanner, while specialised imaging systems are used for high resolution/diagnostic imaging. More information about cameras, imaging setups and requirements can be found in Image Capture.
Tethered Camera Control
Tethered photography for specimen/object image capture has many benefits, such as live view on a monitor, controlling camera settings from the connected device and remote trigger.
Software | Cost | Single Camera Control | Multiple Camera Control | Basic Metadata Writing | Image Processing |
---|---|---|---|---|---|
EoS Utility (Canon Only) | free | X | X | ||
Helicon Remote | paid | X | X | ||
LUMIX Tether (Panasonic only) | free | X | X | ||
Capture One | paid | X | X | X | |
LeafCapture* (Leaf only) | free | X | X | X | |
digiCamControl | free | X | X | X | |
Breeze Multi-Camera Array (DSLR Remote Pro Multi-Camera) | paid | X | X |
* available through third party websites
Focus Staking (Extended Depth of Field Imaging)
Certain specimens/objects or imaging approaches may require a deeper depth of field than a single image can provide. This can be achieved using an automated focus rail that moves the camera/specimen or altering the focus of the lens. Some cameras can do in-camera stacking and/or focus bracketing.
Software | Cost | Lens/Focus Rail Control |
---|---|---|
digiCamControl | free | X |
Helicon Remote | paid | X |
Image Processing
Images may require some form of processing, such as renaming, basic/advanced editing, stacking, photogrammetry etc. The use of hot folders and automated actions can enable activities, such as image capture and processing, to happen consecutively, significantly reducing post-processing time. More information can be found in Automating Image Processing and a workflow example can be found in Slides - Mass Digitisation.
File Renaming
Batch and automated file renaming can increase the efficiency of digitisation workflows as well as reduce the potential for human error. One example is to use software to rename image files by extracting information from barcode(s) present in the images.
Software | Cost | Automated Renaming | Batch Renaming* | Notes |
---|---|---|---|---|
BardecodeFiler | paid | X | X | Reads multiple barcodes in image and renames file. Guidance available here |
Gouda (decodes_barcodes.exe) | free | X | X | Reads single barcode in image and renames file |
Bulk Rename Utility | free | X | ||
PowerRename (Microsoft PowerToys) | free | X |
* scripts can also be used for batch renaming
Post-Processing
Post-processing of image files can be carried out either in bulk using batch processing or consecutively using hot folders.
Software | Cost | Basic Editing | Advanced Editing | Focus Stacking* | Notes |
---|---|---|---|---|---|
XnConvert | free | X | |||
Adobe Lightroom | paid | X | X | See guidance | |
Adobe Photoshop | paid | X | X | ||
GIMP | free | X | X | ||
ImageMagick | free | X | X | Stitch multiple images ** | |
ImageJ | free | X | X | X | Focus stacking available as a plugin (Stack Focuser) |
Helicon Focus | paid | X | |||
Zerene Stacker | paid | X | |||
Inselect | free | Automated image segmentation |
* some cameras can do in-camera stacking and/or focus bracketing
** scripts can also be used for bulk stitching of images
Electronic Data Capture
Information about the specimen/object on handwritten or typed labels can be captured manually or using optical character recognition (OCR). More information about transcription project planning, impact versus effort, and the challenges can be found in Manual Transcription. The use of data standards such as Darwin Core (DwC), which is primarily based on taxa and their occurrence, will provide a stable, standard format for sharing information (Wieczorek et al., 2012).
Data Capture and Extraction
Software | Cost | Manual Data Capture | OCR | Notes |
---|---|---|---|---|
Crowdsourcing Platforms* | free & paid | X | Web applications | |
ABBYY FineReader Server (ABBYY Recognition Server) | paid | X | ||
Amazon Textract | free & paid | X | Cloud service | |
Azure OCR (Microsoft) | paid | X | Cloud service | |
Cloud Vision API (Google) | free & paid | X | Cloud service | |
Tesseract OCR | free | X | NYU libraries ‘Tesseract OCR Software Tutorial’ | |
Text Extractor (Microsoft PowerToys) | free | X | ||
Voyant Tools | free | X | ‘Reading and analysis environment for digital texts’. Web application | |
ChatGPT | free | Parsing OCR output text into Darwin Core fields |
* various online platforms are available
Georeferencing
Georeferencing is a key process that enables specimens to be used in a variety of ways from geospatial analyses to understanding the history of collections. More information about the importance, pitfalls and recommendations can be found in Georeferencing Checklist.
Software | Cost | Georeferencing | Notes |
---|---|---|---|
Ali-Bey | free | X | ‘An open collaborative georeferencing web application’ (Marcer et al., 2022). Web application, API and dockerized version |
GEOLocate | free | X | ‘Platform for georeferencing Natural History Collections data’. Web application. |
GeoPick | free | X | ‘Online companion tool for easy georeferencing following best practices’. Web application - Marcer A, Escobar E, Uribe F, Chapman AD, Wieczorek JR (in development). |
Georeferencing Calculator | free | X | Web or desktop application - Wieczorek C, Wieczorek J (2021). Georeferencing calculator manual (Bloom et al., 2020). |
Data Quality
As there are many methods and resources (from established to new/evolving) for identifying errors and cleaning data, we will only highlight a few examples here. More information about data quality and methods of data cleaning and be found in the following GBIF reports: ‘Principles of Data Quality’ (Chapman, 2005a) and ‘Principles and Methods of Data Cleaning’ (Chapman, 2005b).
Data Cleaning and Quality Checks
Software* | Data Cleaning | Data Quality Checks | Notes | |
---|---|---|---|---|
OpenRefine | free | X | Desktop application. Cleaning data, transformation of data, parsing data, reconciliation etc. | |
Global Names Verifier | free | X | Verifies scientific names against biodiversity data-sources. Web application, command line or through an API (Application Programming Interface). |
* scripts can be used alongside manual data checks and data cleaning processes.
References and Further Reading
Bloom DA, Wieczorek JR, Zermoglio PF (2020). Georeferencing Calculator Manual. Copenhagen: GBIF Secretariat. https://docs.gbif.org/georeferencing-calculator-manual/1.0/en/
Chapman AD (2005a). Principles of Data Quality, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. https://assets.ctfassets.net/uo17ejk9rkwj/2gupj7dJIw62UeOUYiqSsm/0a4bb732bd7fd8cf28f7703dc20a43ba/Data_Quality_-_ENGLISH.pdf
Chapman AD (2005b). Principles and Methods of Data Cleaning – Primary Species and SpeciesOccurrence Data, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. https://assets.ctfassets.net/uo17ejk9rkwj/46SfGRfOesU0IagMMAOIkk/1c03ea3e21fcd9025cc800d786890e72/Principles_20and_20Methods_20of_20Data_20Cleaning_20-_20ENGLISH.pdf
Marcer A, Escobar A, Garcia-Font V, Uribe F, Marcer A, Uribe F (2022). Ali-Bey - an open collaborative georeferencing web application. Biodiversity Data Journal; 10, http://doi.org/10.3897/bdj.10.e81282
Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, Giovanni R, Robertson T, Vieglais D (2012). Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. https://doi.org/10.1371/journal.pone.0029715
Software Citations
Wieczorek C, Wieczorek J (2021). Georeferencing Calculator. Available: http://georeferencing.org/georefcalculator/gc.html. Accessed [2023-01-13].
Marcer A, Escobar E, Uribe F, Chapman AD and Wieczorek JR (v.1.0.2023-Beta, in development). GeoPick: an online companion tool for easy georeferencing following best practices [Web application].
Authors
Louise Allan
Contributors
Anne Koivunen, Laurence Livermore, Arnald Marcer, Deb Paul
Licence
CC-BY-4.0
Citation
Allan, L. (2022) DiSSCo Digitisation Guide: Software. Version 1.0 Available at: https://dissco.github.io/DataManagement/Software/Software.html
Document Control
Version: 1.0
Changes since last version: N/A
Last Updated: 20 March 2023
Edit This Page
You can suggest changes to this page on our GitHub