The ESIP community and others in machine learning regularly lament the lack of labeled datasets, needed for certain classes of training algorithms. Generating accurate, useful labels is a hard problem, with no general automated solution in sight. Thus, labeling generally involves human effort, which is challenging because the volume of data needed for training can be very large.
Tools exist to help in labeling data. This session will demonstrate three labeling tools and associated processes:
- Image Labeler, a fast, scalable cloud-based tool to facilitate the rapid development of Earth science event databases, to aid in automated ML-based image classification, Rahul Ramachandran
- Labelimg, an open source graphical image annotation tool, https://github.com/tzutalin/labelImg, Ziheng Sun
- Bokeh, a Python based plotting and annotation tool set for building arbitrary labeling workflows, https://bokeh.org/, Jim Bednar
Time permitting, the session will conclude with a short discussion of thoughts and tradeoffs about the tools.
This session is followed by a hands-on workshop for using Labelimg and Bokeh. Please see the session abstract for "Hands on Labeling Workshop" for information on preparing for that workshop if you are interested in participating.
Presentations
https://doi.org/10.6084/m9.figshare.11629110.v1https://doi.org/10.6084/m9.figshare.11591739.v1View Recording: https://youtu.be/3ufBOoD3M1ETakeaways- Machine learning based classification applications require high-quality labelled data sets for both model training and evaluation. There are many existing tools for labeling images (including earth science data), but labeling tasks are very labor and time intensive.
- If the pre-built labeling tools don’t work for your problem, Anaconda provides a general-purpose labeler-building toolkit based on Bokeh for Python users; see https://examples.pyviz.org/ml_annotators/ml_annotators.html
- There is opportunity in combining partly automated, partly human labeling, to automate the easy cases while leaving the final call to a person. Currently not much tool support or good practices, hard to integrate.The art of avoiding extra work!