Loading…
This event has ended. Create your own event on Sched.
Join the 2020 ESIP Winter Meeting Highlights Webinar on Feb. 5th at 3 pm ET for a fast-paced overview of what took place at the meeting. More info here.

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Forest Glen [clear filter]
Wednesday, January 8
 

11:00am EST

Software Sustainability, Discovery and Accreditation
It is commonly understood that software is essential to research, in data collection, curation, analysis, and understanding, and it is also a critical element within any research infrastructure. This session will address two related software issues: 1) sustainability, and 2) discovery and accreditation.

Because scientific software is an instance of a software stack containing problem-specific software, discipline-specific tools, general tools and middleware, and infrastructural software, changes within the stack can cause the overall software to collapse and stop working, and as time goes on, work is increasingly needed to compensate for these problems, which we refer to as sustainability. Issues in which we are interested include incentives that encourage sustainability activities, business models for sustainability (including public-private partnership), software design that can reduce the sustainability burden, and metrics to measure sustainability (perhaps tied to the on-going process of defining FAIR software).

The second issue, discovery and accreditation, asks how we enable users to discover and access trustworthy and fit-for-purpose software to undertake science processing on the compute infrastructures to which they have access? And how do we ensure that publications cite the exact version of software that was used and is cited and properly credited the responsible authors?

This session will include a number of short talks, and at least two breakouts in parallel, one about the sustainability of software, and a second about discovery of sustainable and viable solutions.

Potential speakers who want to talk about an aspect of software sustainability, discovery, or accreditation should contact the session organizers.

Agenda/slides:
Presentations: See above

View Recording:
https://youtu.be/nsxjOC04JxQ

Key takeaways:

1. Funding agencies spend a large amount of money on software, but don't always know this because it's not something that they track.

OpenSource software is growing very quickly:
  • 2001: 208K SourceForge users
  • 2017: 20M GitHub users
  • 2019: 37M Github users
Software, like data, is a “first class citizen” in the ecosystem of tools and resources for scientific research and our community is accelerating their attention to this as they have for FAIR data


2. Ideas for changing our culture to better support and reward contributions to sustainable software:
  • Citation (ESIP guidelines) and/or software heritage IDs for credit and usage metrics and to meet publisher requirements (e.g. AGU)
  • Prizes
  • Incentives in hiring and promotion
  • Promote FAIR principles and/or Technical Readiness Levels for software
  • Increased use to make science more efficient through common software
  • Publish best practice materials in other languages, e.g. Mandarin, as software comes from a global community


3. A checklist of topics to consider for your community sustained software:
  • Repository with “cookie cutter” templates and sketches for forking
  • Licensing
  • Contributors Guide
  • Code of Conduct and Governance
  • Use of “Self-Documentation” features and standards
  • Easy step for trying out software
  • Continuous Integration builds
  • Unit tests
  • Good set of “known first issues” for new users trying out the software
  • Gitter or Slack Channel for feedback and communication, beyond a simple repo issues queue


Detailed notes:
The group then divided into 2 breakout sessions (Sustainability; Discovery and Accreditation), with notes as follows.

Notes from Sustainability breakout (by Daniel S. Katz):

What we think should be done:
  • Build a cookiecutter recipe for new projects, based on Ben’s slides?  What part of ESIP would be interested in this? And would do it, and support it?
  • Define governance as part of this? How do we store governance?
  • What is required, what is optional (maybe with different answers at different tiers)
  • Define types of projects (individual developer, community code, …)
  • Define for different languages – tooling needs to match needs
  • Is this specific to ESIP? Who could it be done with? The Carpentries?  SSI?

Other discussion:
  • What do we mean by sustainability – for how long?  Up to 50 years?  How do we run the system?
  • What’s the purpose of the software (use case) – transparency to see the software, actual reuse?
  • What about research objects that contain both software and data? How do we archive them? How do we cite them?
  • We have some overlap with research object citation cluster


Notes from Discovery and Accreditation breakout (by Shelley Stall):

Use Cases - Discovery
  1. science question- looking for software to support
  2. have some data output from a software process, need to gain access to the software to better understand the data.   

Example of work happening: Data and Software Preservation - NSF Funded
  • promote linked data to other research products
  • similar project in Australia - want to gain access to the chain of events that resulted in the data and/or software - the scientific drivers that resulted in this product
  • Provenance information is part of this concept.

A deeper look at discovery, once software is found, is to better understand how the software came into being. It is important to know the undocumented elements of a process that effected/impacted the chain of events that are useful information to understand for a particular piece of software.
How do we discover existing packages?
Dependency management helps to discover new elements that support software.
Concern expressed that packaged solution for creating an environment, like “AWS/AMI”, are not recognized as good enough, that an editor requested a d

Speakers
avatar for Daniel S. Katz

Daniel S. Katz

Chief Scientist, NCSA, University of Illinois at Urbana-Champaign
Dan is Chief Scientist at the National Center for Supercomputing Applications (NCSA) and Research Associate Professor in Computer Science, Electrical and Computer Engineering, and the School of Information Sciences (iSchool), at the University of Illinois Urbana-Champaign. In past... Read More →
avatar for Lesley Wyborn

Lesley Wyborn

Honorary Professor, Australian National University


Wednesday January 8, 2020 11:00am - 12:30pm EST
Forest Glen
  Forest Glen, Working Session

2:00pm EST

FAIR Laboratory Instrumentation, Analytical Procedures, and Data Quality
Acquisition and analysis of data in the laboratory are pervasive in the Earth, environmental, and planetary sciences. Analytical and experimental laboratory data, often acquired with sophisticated and expensive instrumentation, are fundamental for understanding past, present, and future processes in natural systems, from the interior of the Earth to its surface environments on land, in the oceans, and in the air, to the entire solar system. Despite the importance of provenance information for analytical data including, for example, sample preparation or experimental set up, instrument type and configuration, calibration, data reduction, and analytical uncertainties, there are no consistent community-endorsed best practices and protocols for describing, identifying, and citing laboratory instrumentation and analytical procedures, and documenting data quality. This session is intended as a kick-off working session to engage researchers, data managers, and system engineers, to contribute ideas how to move forward with and accelerate the development of global standard protocols and the promulgation of best practices for analytical laboratory data. How to Prepare for this Session:

Presentations:

View Recording:
https://youtu.be/LOfb_4r7DBA

Takeaways
  • Analytical and experimental data are collected widely in both the field and laboratory settings from a variety of earth environmental and planetary sciences, spanning a variety of disciplines. FAIR use of such data is dependent of data provenance. 
  • Need community exchange of such data consider use of data is broader than the original use of data in the domain. Brings to mind interoperability of such data. Need networks of these data to be plugged into evolving CI systems. In seismology a common standard for data implemented by early visionaries was a massive boon to the field. 
  • Documentation of how analytical data were generated is time consuming for data curators providers etc. Having standards/protocols for data exchange protocols is urgently required for emerging global data networks. OneGeochemistry as example use case for international research group to establish a global network for discoverable geochemical data.


Speakers
avatar for Lesley Wyborn

Lesley Wyborn

Honorary Professor, Australian National University
avatar for Kerstin Lehnert

Kerstin Lehnert

President, IGSN e.V.
Kerstin Lehnert is Doherty Senior Research Scientist at the Lamont-Doherty Earth Observatory of Columbia University and Director of the Interdisciplinary Earth Data Alliance that operates EarthChem, the System for Earth Sample Registration, and the Astromaterials Data System. Kerstin... Read More →


Wednesday January 8, 2020 2:00pm - 3:30pm EST
Forest Glen
  Forest Glen, Working Session