It is commonly understood that software is essential to research, in data collection, curation, analysis, and understanding, and it is also a critical element within any research infrastructure. This session will address two related software issues: 1) sustainability, and 2) discovery and accreditation.
Because scientific software is an instance of a software stack containing problem-specific software, discipline-specific tools, general tools and middleware, and infrastructural software, changes within the stack can cause the overall software to collapse and stop working, and as time goes on, work is increasingly needed to compensate for these problems, which we refer to as sustainability. Issues in which we are interested include incentives that encourage sustainability activities, business models for sustainability (including public-private partnership), software design that can reduce the sustainability burden, and metrics to measure sustainability (perhaps tied to the on-going process of defining FAIR software).
The second issue, discovery and accreditation, asks how we enable users to discover and access trustworthy and fit-for-purpose software to undertake science processing on the compute infrastructures to which they have access? And how do we ensure that publications cite the exact version of software that was used and is cited and properly credited the responsible authors?
This session will include a number of short talks, and at least two breakouts in parallel, one about the sustainability of software, and a second about discovery of sustainable and viable solutions.
Potential speakers who want to talk about an aspect of software sustainability, discovery, or accreditation should contact the session organizers.
Agenda/slides:Presentations: See above
View Recording: https://youtu.be/nsxjOC04JxQKey takeaways:1. Funding agencies spend a large amount of money on software, but don't always know this because it's not something that they track.
OpenSource software is growing very quickly:
- 2001: 208K SourceForge users
- 2017: 20M GitHub users
- 2019: 37M Github users
Software, like data, is a “first class citizen” in the ecosystem of tools and resources for scientific research and our community is accelerating their attention to this as they have for FAIR data
2. Ideas for changing our culture to better support and reward contributions to sustainable software:
- Citation (ESIP guidelines) and/or software heritage IDs for credit and usage metrics and to meet publisher requirements (e.g. AGU)
- Prizes
- Incentives in hiring and promotion
- Promote FAIR principles and/or Technical Readiness Levels for software
- Increased use to make science more efficient through common software
- Publish best practice materials in other languages, e.g. Mandarin, as software comes from a global community
3. A checklist of topics to consider for your community sustained software:
- Repository with “cookie cutter” templates and sketches for forking
- Licensing
- Contributors Guide
- Code of Conduct and Governance
- Use of “Self-Documentation” features and standards
- Easy step for trying out software
- Continuous Integration builds
- Unit tests
- Good set of “known first issues” for new users trying out the software
- Gitter or Slack Channel for feedback and communication, beyond a simple repo issues queue
Detailed notes:The group then divided into 2 breakout sessions (
Sustainability; Discovery and Accreditation), with notes as follows.
Notes from
Sustainability breakout (by Daniel S. Katz):
What we think should be done:
- Build a cookiecutter recipe for new projects, based on Ben’s slides? What part of ESIP would be interested in this? And would do it, and support it?
- Define governance as part of this? How do we store governance?
- What is required, what is optional (maybe with different answers at different tiers)
- Define types of projects (individual developer, community code, …)
- Define for different languages – tooling needs to match needs
- Is this specific to ESIP? Who could it be done with? The Carpentries? SSI?
Other discussion:
- What do we mean by sustainability – for how long? Up to 50 years? How do we run the system?
- What’s the purpose of the software (use case) – transparency to see the software, actual reuse?
- What about research objects that contain both software and data? How do we archive them? How do we cite them?
- We have some overlap with research object citation cluster
Notes from
Discovery and Accreditation breakout (by Shelley Stall):
Use Cases - Discovery- science question- looking for software to support
- have some data output from a software process, need to gain access to the software to better understand the data.
Example of work happening: Data and Software Preservation - NSF Funded
- promote linked data to other research products
- similar project in Australia - want to gain access to the chain of events that resulted in the data and/or software - the scientific drivers that resulted in this product
- Provenance information is part of this concept.
A deeper look at discovery, once software is found, is to better understand how the software came into being. It is important to know the undocumented elements of a process that effected/impacted the chain of events that are useful information to understand for a particular piece of software.
How do we discover existing packages?
Dependency management helps to discover new elements that support software.
Concern expressed that packaged solution for creating an environment, like “AWS/AMI”, are not recognized as good enough, that an editor requested a d