Loading…
This event has ended. Create your own event on Sched.
Join the 2020 ESIP Winter Meeting Highlights Webinar on Feb. 5th at 3 pm ET for a fast-paced overview of what took place at the meeting. More info here.

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Data Stewardship [clear filter]
Monday, January 6
 

4:00pm EST

Council of Data Facilities General Assembly Meeting
The Council of Data Facilities (CDF) is committed to working with relevant agencies, professional associations, initiatives, and other complementary efforts to enable transformational science, innovative education, and informed public policy through increased coordination, collaboration, and innovation in the acquisition, curation, preservation, and dissemination of geoscience data, tools, models, and services. Existing and emerging geoscience data facilities – through the Council – are committed to serving as an effective foundation for EarthCube. The General Assembly meeting is open to the official representatives from all member data facilities, additional member organization personnel as desired by the members, as well as observers. How to

Agenda:
400-415 Welcome/introductions/sign-in - Danie415-430 High level Summary of OKN workshop - TBA
430-435 Updates on shared infrastructure - Kerstin, Danie
435-445 Update on COPDESS-Kerstin, Shelley
445-515 Update and next steps on P419-Doug, Adam
515-530 Progress on EC supplements for CCHDO and MagIC related to P418/P419 (GeoCODES)-Steve
530-550 Update from tech team EarthCube Office-Kenton McHenry
550-600 Summer topics - Danie
      • Suggested Charter changes (to be voted on at july 2020)
      • Announce  CDF exec elections in july 2020 - 2 co-chair and 3 at large positions


Speakers
avatar for Jessica Hausman

Jessica Hausman

Data Engineer, PO.DAAC JPL


Monday January 6, 2020 4:00pm - 6:00pm EST
Glen Echo
  Glen Echo, Business Meeting
 
Tuesday, January 7
 

11:00am EST

FAIR Metadata Recommendations
We will discuss the FAIR metadata recommendations that were introduced at the ESIP Summer Meeting. How to Prepare for this Session: Use git repository: Issues

Links:
Glossary
Use git repository: 
Issues

View Recording:https://youtu.be/5hwZOLQ1p9M.

Takeaways
  • NCEAS is continuing to work on pinning down what are the fundamental characteristics for FAIR data. Have the suite of checks (e.g. is title present). 54 are currently implemented and they are working toward a community define 1.0 check suite. This is a good tool for data curators but has the potential to be misunderstood or misused - need a public FAIR metric. Public FAIR metric is high level and simple and includes only items that everyone agrees upon.
  • Future plans to create community specific custom FAIR suite checks to handle the variability of how metadata is hosted. Continually evaluating if checks are helping/hurting the data curators. Work is needed on the user interface - how do we ensure that metadata evaluation is a positive experience regardless of the score.
  • Reusability is typically low throughout the data repositories. Accessibility needs a greater focus as it’s hindered by broken/missing links. “When you decide what fields are mandatory (vs optional) you decide what metadata you get”


Speakers
avatar for Ted Habermann

Ted Habermann

Chief Game Changer, Metadata Game Changers
I am interested in all facets of metadata needed to discover, access, use, and understand data of any kind. Also evaluation and improvement of metadata collections, translation proofing. Ask me about the Metadata Game.
avatar for Matt Jones

Matt Jones

Director, DataONE Program, DataONE, UC Santa Barbara
DataONE | Arctic Data Center | Open Science | Provenance and Semantics | Scientific Synthesis


Tuesday January 7, 2020 11:00am - 12:30pm EST
Forest Glen
  Forest Glen, Breakout

11:00am EST

Creating a Data at Risk Commons at DataAtRisk.org
Several professional organizations have become increasingly concerned about the loss of reusable data from primary sources such as individual researchers, projects, and agencies. DataAtRisk.org aims to connect people with data in need, to data expertise, and is a response to the clear need for a community building application. This “Data at Risk” commons will allow individuals to submit and request help with threatened datasets and connect these datasets to experts who can provide resources and skills to help rescue data through a secure, professional mechanism to facilitate self-identification and discovery.

This session will provide an overview of the current status of the DataAtRisk.org project, and aims to expand the network of individuals involved in the development and implementation of DataAtRisk.org

How to Prepare for this Session: Please check out https://dataatrisk.org/ for some background on the activities.

Presentations: http://bit.ly/303gig7, https://doi.org/10.6084/m9.figshare.11536317.v1
Link to use case / user scenario: https://tinyurl.com/yh4rnk7b

View Recording: https://youtu.be/96NMQwx_EtI

Takeaways
  • Perfection is the enemy of getting stuff done
  • Something is better than nothing
  • Triage will be necessary at several places in the process



Speakers
avatar for Denise Hills

Denise Hills

Director, Energy Investigations, Geological Survey of Alabama
Long tail data, data preservation, connecting physical samples to digital information, geoscience policy, science communication


Tuesday January 7, 2020 11:00am - 12:30pm EST
Linden Oak
  Linden Oak, Working Session

11:00am EST

Interoperability of geospatial data with STAC
SpatioTemporal Asset Catalogs is an emerging specification of a common metadata model for geospatial data, and a way to make data catalogs indexable and searchable. We have already seen STAC being adopted for both public data and commercial data. Catalogs exist for several AWS Public Datasets, Landsat Collection 2 data will be published along with STAC metadata, and communities like Pangeo are using STAC to organize data repositories in a scalable way. Commercial companies like Planet and Digital Globe are starting to publish STAC metadata for some of their catalogs. Session talks may cover overviews of the STAC, software projects utilizing STAC, and use cases of STAC in organizations. How to Prepare for this Session: See https://stacspec.org/.

View Recording:https://youtu.be/BdZbJLQSNFE.

Takeaways


Speakers
avatar for Dan Pilone

Dan Pilone

Chief Technologist, Element 84
Dan Pilone is CEO/CTO of Element 84 and oversees the architecture, design, and development of Element 84's projects including supporting NASA, the USGS, Stanford University School of Medicine, and commercial clients. He has supported NASA's Earth Observing System for nearly 13 years... Read More →
avatar for Aimee Barciauskas

Aimee Barciauskas

Data engineer, Development Seed
MH

Matthew Hanson

Element 84
STAC


Tuesday January 7, 2020 11:00am - 12:30pm EST
White Flint
  White Flint, Breakout

2:00pm EST

Making a Good First Impression: Metadata Quality Metrics for Earth Observation Data and Information
Metadata is often the first information that a user interacts with when looking for data. Understanding that there is typically only one chance to make a good impression, data and information repositories have placed an emphasis on metadata quality as a way of increasing the likelihood that a user will have a favorable first impression. This session will explore quality metrics, badging or scoring, and metadata quality assessment approaches within the Earth observation community. Discussion questions include:
● Does your organization implement metadata quality metrics and/or scores?
○ What are the key metrics that the scores are based on?
○ What priorities are driving your metadata quality metrics? For example, different repositories have different priorities. These priorities can include an emphasis on discoverability, accessibility, usability, provenance, etc...
● Does your organization make metadata quality scores publically viewable? What are the pros and cons of making the scores publically accessible?
How to Prepare for this Session:

Presentations:
https://doi.org/10.6084/m9.figshare.11553606.v1
https://doi.org/10.6084/m9.figshare.11551182.v1

View Recording: https://youtu.be/lbza3gEHmtQ

Takeaways
  • Visualizations of the metadata quality metrics need to be easily understood or well documented to be effective
  • There are diverse ideas and current metrics that are being rolled out soon (U.S. Global Change Research Program & NCA)
  • Ensuring that metrics interact with existing standards such as FAIR is also important

Speakers
avatar for Amrutha Elamparuthy

Amrutha Elamparuthy

GCIS Data Manager, U.S. Global Change Research Program


Tuesday January 7, 2020 2:00pm - 3:30pm EST
Forest Glen
  Forest Glen, Breakout

2:00pm EST

Data Skills & Competencies Requirements for Data Stewards: Views from the ESIP Community & Beyond
At the ESIP Summer 2019, many ESIP community members offered their feedback on the range and importance of skills and competencies for data specialists whose job responsibilities focus upon offering data "advise" (e.g., from data curators) and data "service providers" (e.g., from data librarians). By means of an interactive poster, participants were asked to choose whether a competency was of high, medium, low or no importance from a subset of competencies identified by a European Open Science Cloud (EOSC) project. In this session, session leaders will present the results of the ESIP community feedback within the context of the full list of EOSC competencies, and visualized from both a poster synthesis and a research data lifecycle point of view. Session leaders are hoping to have the audience participate by providing feedback and engaging in discussion on the data and views presented. One outcome of this work will be a "Career Compass" to be published by the American Geoscience Institute for students interested in becoming data stewards. How to Prepare for this Session:

Presentations:

View Recording: https://youtu.be/1s1L3Jter8w

Takeaways



Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services & Information Technology, University of New Mexico
Since 1986 I have had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when I arrived at UNM I have worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director, and currently... Read More →


Tuesday January 7, 2020 2:00pm - 3:30pm EST
Linden Oak
  Linden Oak, Breakout

2:00pm EST

Current Data that are available on the Cloud
NASA, NOAA and USGS are in the process of moving data onto the cloud. While they have discussed what types of services are available and future plans of what data can be found, it is not completely clear what datasets users can currently access. This session will go over what datasets are currently up in the cloud and what data to expect in the near future. This way as users are transitioning to the cloud for their compute, they can also know what data are available to them on the cloud as well. There will also be presentations from AWS. Speakers:
Katie Baynes - NASA/EOSDIS
Jon O'Neil - NOAA
Jeff de La Beaujardiere - NCAR
Kristi Kliene - USGS/EROS
Joe Flasher - AWS

Presentations: See attached.

View Recording: https://youtu.be/yssgXB7iaxw

Takeaways
  • Petabyte scale data is being moved into the cloud. This is concentrated in AWS, Google Cloud and Microsoft depending on the agency and dataset
  • Some concern around partnerships with companies (AWS most discussed) in terms of long term relationships, moving data etc. and how those things might impact access or data use
  • Need to make clear the authoritative source of the data, who is stewarding it, and any modifications done when copying to cloud. Users should exercise due diligence in selecting and using data.



Speakers
JO

Jon O'Neil

Director, NOAA Big Data Program, NOAA
avatar for Joe Flasher

Joe Flasher

Open Geospatial Data Lead, Amazon Web Services
Joe Flasher is the Open Geospatial Data Lead at Amazon Web Services helping organizations most effectively make data available for analysis in the cloud. The AWS open data program has democratized access to petabytes of data, including satellite imagery, genomic data, and data used... Read More →
avatar for Christopher Lynnes

Christopher Lynnes

Systems Architect, NASA/EOSDIS, NASA/GSFC
Christopher Lynnes is currently System Architect for NASA’s Earth Observing System Data and Information System, known as EOSDIS. He has been working on EOSDIS since 1992, over which time he has worked multiple generations of data archive systems, search engines and interfaces, science... Read More →
avatar for Jessica Hausman

Jessica Hausman

Data Engineer, PO.DAAC JPL
avatar for Jeff de La Beaujardière

Jeff de La Beaujardière

Director, Information Systems Division, NCAR
Big data, cloud computing, object storage, data management.
avatar for Dave Meyer

Dave Meyer

GES DISC manager, NASA


Tuesday January 7, 2020 2:00pm - 3:30pm EST
White Flint
  White Flint, Breakout

4:00pm EST

Bringing Science Data Uncertainty Down to Earth - Sub-orbital, In Situ, and Beyond
In the Fall of 2019, the Information Quality Cluster (IQC) published a white paper entitled “Understanding the Various Perspectives of Earth Science Observational Data Uncertainty”. The intention of this paper is to provide a diversely sampled exposition of both prolific and unique policies and practices, applicable in an international context of diverse policies and working groups, made toward quantifying, characterizing, communicating and making use of uncertainty information throughout the diverse, cross-disciplinary Earth science data landscape; to these ends, the IQC addressed uncertainty information from the following four perspectives: Mathematical, Programmatic, User, and Observational. These perspectives affect policies and practices in a diverse international context, which in turn influence how uncertainty is quantified, characterized, communicated and utilized. The IQC is now in a scoping exercise to produce a follow-on paper that is intended to provide a set of recommendations and best practices regarding uncertainty information. It is our hope that we can consider and examine additional areas of opportunity with regard to the cross-domain and cross-disciplinary aspects of Earth science data. For instance, the existing white paper covers uncertainty information from the perspective of satellite-based remote sensing well, but does not adequately address the in situ or airborne (i.e., sub-orbital) perspective. This session intends to explore such opportunities to expand the scope of the IQC’s awareness of what is being done with regard to uncertainty information, while also providing participants and observers with an opportunity to weigh in on how best to move forward with the follow-on paper. How to Prepare for this Session:Agenda:
  1. "IQC Uncertainty White Paper Status Summary and Next Steps" - Presented by: David Moroni (15 minutes)
  2. "Uncertainty quantification for in situ ocean data: The S-MODE sub-orbital campaign" - Presented by: Fred Bingham (15 minutes)
  3. "Uncertainty Quantification for Spatio-Temporal Mapping of Argo Float Data" - Presented by Mikael Kuusela (20 minutes)
  4. Panel Discussion (35 minutes)
  5. Closing Comments (5 minutes)
Notes Page: https://docs.google.com/document/d/1vfYBK_DLTAt535kMZusTPVCBAjDqptvT0AA5D6oWrEc/edit?usp=sharing

Presentations:
https://doi.org/10.6084/m9.figshare.11553681.v1

View Recording: https://youtu.be/vC2O8FRgvck

Takeaways

Speakers
avatar for David Moroni

David Moroni

Data Stewardship and User Services Team Lead, Jet Propulsion Laboratory, Physical Oceanography Distributed Active Archive Center
I am a Senior Science Data Systems Engineer at the Jet Propulsion Laboratory and Data Stewardship and User Services Team Lead for the PO.DAAC Project, which provides users with data stewardship services including discovery, access, sub-setting, visualization, extraction, documentation... Read More →
avatar for Ge Peng

Ge Peng

Research Scholar, CISESS/NCEI
Dataset-centric scientific data stewardship, data quality management
FB

Fred Bingham

University of North Carolina at Wilmington
MK

Mikael Kuusela

Carnegie Mellon University


Tuesday January 7, 2020 4:00pm - 5:30pm EST
Forest Glen

4:00pm EST

Defining the Bull's Eye of Sample Metadata
In recent years, the integration of physical collections and samples into digital data infrastructure has received increased attention in the context of Open Science and FAIR research results. In order to support open, transparent, and reproducible science, physical samples need to be uniquely identified, findable in online catalogues, well documented, and linked to related data, publications, people, and other relevant digital information. Substantial progress has been made through wide-spread implementation of the IGSN as a persistent unique identifier. What is missing is the development and implementation of protocols and best practices for sample metadata. Effort to do this have shown that it is impossible to develop a common vocabulary that describes all samples collected: one size does not fit all and each domain e.g. soil scientists, volcanologists, cosmochemists, paleoclimate scientists, and granite researchers – to name a few examples - all have their own vocabularies. Yet there is a minimum set of attributes that are common to all samples, the ‘Bull’s Eye of sample metadata’. This session invites participants from all walks of earth and environmental science to help define what is the minimum set of attributes needed to describe physical samples that are at the heart of much of Earth and environmental research.

How to Prepare for this Session:
Participations should come with a list of the mimimum metadata requirements for their institutions or domains.  They should be prepared to give a brief introduction to their needs.

Session Agenda:
  1. Introduction to the issue
  2. Review of existing examples and discussion of the limitations
  3. Discuss minimal requirements; propose changes/addition
  4. Summarize outcomes and discuss next steps
Google doc with the current metadata list and proposed changes

Presentations: ​​​​

View Recording: https://youtu.be/bxhTmrNqkCA

Takeaways

Speakers
avatar for Lesley Wyborn

Lesley Wyborn

Adjunct Fellow, Australian National University
avatar for Kerstin Lehnert

Kerstin Lehnert

President, IGSN e.V.
Kerstin Lehnert is Senior Research Scientist at the Lamont-Doherty Earth Observatory of Columbia University and Director of EarthChem, the System for Earth Sample Registration, and the Astromaterials Data System. Kerstin holds a Ph.D in Petrology from the University of Freiburg in... Read More →


Tuesday January 7, 2020 4:00pm - 5:30pm EST
Linden Oak
  Linden Oak, Working Session
 
Wednesday, January 8
 

2:00pm EST

FAIR Laboratory Instrumentation, Analytical Procedures, and Data Quality
Acquisition and analysis of data in the laboratory are pervasive in the Earth, environmental, and planetary sciences. Analytical and experimental laboratory data, often acquired with sophisticated and expensive instrumentation, are fundamental for understanding past, present, and future processes in natural systems, from the interior of the Earth to its surface environments on land, in the oceans, and in the air, to the entire solar system. Despite the importance of provenance information for analytical data including, for example, sample preparation or experimental set up, instrument type and configuration, calibration, data reduction, and analytical uncertainties, there are no consistent community-endorsed best practices and protocols for describing, identifying, and citing laboratory instrumentation and analytical procedures, and documenting data quality. This session is intended as a kick-off working session to engage researchers, data managers, and system engineers, to contribute ideas how to move forward with and accelerate the development of global standard protocols and the promulgation of best practices for analytical laboratory data. How to Prepare for this Session:

Presentations:

View Recording:
https://youtu.be/LOfb_4r7DBA

Takeaways
  • Analytical and experimental data are collected widely in both the field and laboratory settings from a variety of earth environmental and planetary sciences, spanning a variety of disciplines. FAIR use of such data is dependent of data provenance. 
  • Need community exchange of such data consider use of data is broader than the original use of data in the domain. Brings to mind interoperability of such data. Need networks of these data to be plugged into evolving CI systems. In seismology a common standard for data implemented by early visionaries was a massive boon to the field. 
  • Documentation of how analytical data were generated is time consuming for data curators providers etc. Having standards/protocols for data exchange protocols is urgently required for emerging global data networks. OneGeochemistry as example use case for international research group to establish a global network for discoverable geochemical data.


Speakers
avatar for Lesley Wyborn

Lesley Wyborn

Adjunct Fellow, Australian National University
avatar for Kerstin Lehnert

Kerstin Lehnert

President, IGSN e.V.
Kerstin Lehnert is Senior Research Scientist at the Lamont-Doherty Earth Observatory of Columbia University and Director of EarthChem, the System for Earth Sample Registration, and the Astromaterials Data System. Kerstin holds a Ph.D in Petrology from the University of Freiburg in... Read More →


Wednesday January 8, 2020 2:00pm - 3:30pm EST
Forest Glen
  Forest Glen, Working Session

2:00pm EST

Citizen Science Data and Information Quality
The ESIP Information Quality Cluster (IQC) has formally defined information quality as a combination of the following four aspects of quality, spanning the full life cycle of data products: scientific quality, product quality, stewardship quality, and service quality. Focus of the IQC has been quality of Earth science data captured by scientists/experts. For example, the whitepaper “Understanding the Various Perspectives of Earth Science Observational Data Uncertainty”, published by IQC in the fall of 2019, mainly addresses uncertainty information from the perspective of satellite-based remote sensing. With the advance of mobile computing technologies, including smart phones, Citizen Science (CS) data have been increasingly becoming more and more important sources for Earth science research. CS data have their own unique challenges regarding data quality, compared with data captured through traditional scientific approaches. The purpose of this session is to broaden the scope of IQC efforts, present the community with the state-of-the-art of research on CS data quality, and foster a collaborative interchange of technical information intended to help advance the assessment, improvement, capturing, conveying, and use of quality information associated with CS data. This session will summarize the scope of what we mean by CS data (including examples of platforms/sensors commonly used in collecting CS data) and include presentations from both past and current CS projects focusing on the topics such as challenges with CS data quality; strategies to assess, ensure, and improve CS data quality; approaches to capturing CS data quality information and conveying it to users; and use of CS data quality information for scientific discovery. 

Agenda (Click titles to view presentations)
  1. Introduction - Yaxing Wei - 5 mins
  2. Citizen Science Data Quality: The GLOBE Program – Helen M. Amos (NASA GSFC) – 18 (15+3) mins.
  3. Can we trust the power of the crowd? A look at citizen science data quality from NOAA case studies - Laura Oremland (NOAA) – 18 (15+3) mins.
  4. Turning Citizen Science into Community Science - Stephen C. Diggs (Scripps Institution of Oceanography / UCSD) and Andrea Thomer (University of Michigan)  – 18 (15+3) mins.
  5. Earth Challenge 2020: Understanding and Designing for Data Quality at Scale - Anne Bowser (Wilson Center) – 18 (15+3) mins.
  6. Discussion and Key Takeaways – All – 13 mins.

    View Recording: https://youtu.be/xaTLP4wqwe8

    Takeaways

Notes Page:
https://docs.google.com/document/d/1lRp19SF9U727ureKjY38PHOF3EGUgE-BixYDs2KlmII/edit?usp=sharing

Presentation Abstracts

  • Citizen Science Data Quality: The GLOBE Program - Helen M. Amos (NASA GSFC)
The Global Learning and Observations to Benefit the Environment (GLOBE) Program is an international program that provides a way for students and the public to contribute Earth system observations. Currently 122 countries, more than 40,000 schools, and 200,000 citizen scientists are participating in GLOBE. Since 1995, participants have contributed 195 million observations. Modes of data collection and data entry have evolved with technology over the lifetime of the program, including the launch of the GLOBE Observer mobile app in 2016 to broaden access and public participation in data collection. GLOBE must meet the data needs of a diverse range of stakeholders, from elementary school classrooms to scientists across the globe, including NASA scientists. Operational quality assurance measures include participant training, adherence to standardized data collection protocols, range and logic checks, and an approval process for photos submitted with an observation. In this presentation, we will discuss the current state of operational data QA/QC, as well as additional QA/QC processes recently explored and future directions. 
  • Can we trust the power of the crowd? A look at citizen science data quality from NOAA case studies - Laura Oremland (NOAA)
NOAA has a rich history in citizen science dating back hundreds of years.  Today NOAA’s citizen science covers a wide range of topics such as weather, oceans, and fisheries with volunteers contributing over 500,000 hours annually to these projects. The data are used to enhance NOAA’s science and monitoring programs.   But how do we know we can trust these volunteer-based efforts to provide data that reflect the high standards of NOAA’s scientific enterprise? This talk will provide an overview of NOAA’s citizen science, describe the data quality assurance and quality control processes applied to different programs, and summarize common themes and recommendations for collecting high quality citizen science data. 
  • Earth Challenge 2020: Understanding and Designing for Data Quality at Scale - Anne Bowser (Wilson Center)
April 22nd, 2020 marks the 50th anniversary of Earth day.  In recognition of this milestone Earth Day Network, the Woodrow Wilson International Center for Scholars, and the U.S. Department of State are launching Earth Challenge 2020 as the world’s largest coordinated citizen science campaign.  For 2020, the project focuses on six priority areas: air quality, water quality, insect populations, plastics pollution, food security, and climate change.  For each of these six areas, one work stream will focus on collaborating with existing citizen science projects to increase the amount of open and findable, accessible, interoperable, and reusable (FAIR) data.  A second work stream will focus on designing tools to support both existing and new citizen science activities, including a mobile application for data collection; an open, API-enabled data integration platform; data visualization tools; and, a metadata repository and data journal.
A primary value of Earth Challenge 2020 is recognizing, and elevating, ongoing citizen science activities.  Our approach seeks first to document a range of data quality practices that citizen science projects are already using to help the global research and public policy community understand these practices and assess fitness-for-use.  This information will be captured primarily through the metadata repository and data journal.  In addition, we are leveraging a range of data quality solutions for the Earth Challenge 2020 mobile app, including designing automated data quality checks and leveraging a crowdsourcing platform for expert-based data validation that will help train machine learning (ML) support.  Many of the processes designed for Earth Challenge 2020 app data can also be applied to other citizen science data sets, so maintaining information on processing level, readiness level, and provenance is a critical concern.  The goal of this presentation is to offer an overview of key Earth Challenge 2020 data documentation and data quality practices before inviting the ESIP community to offer concrete feedback and support for future work.

Speakers
avatar for David Moroni

David Moroni

Data Stewardship and User Services Team Lead, Jet Propulsion Laboratory, Physical Oceanography Distributed Active Archive Center
I am a Senior Science Data Systems Engineer at the Jet Propulsion Laboratory and Data Stewardship and User Services Team Lead for the PO.DAAC Project, which provides users with data stewardship services including discovery, access, sub-setting, visualization, extraction, documentation... Read More →
avatar for Ge Peng

Ge Peng

Research Scholar, CISESS/NCEI
Dataset-centric scientific data stewardship, data quality management
avatar for Yaxing Wei

Yaxing Wei

Scientist, Oak Ridge National Laboratory


Wednesday January 8, 2020 2:00pm - 3:30pm EST
Linden Oak
  Linden Oak, Breakout

4:00pm EST

Developing, Using and Testing Tools to Assess Learning Resources from two Perspectives: the Teacher and the Learner
Session leaders will describe tools being developed to assess the learning resources in the ESIP"s Data Management Training Clearinghouse (DMTC) from the perspectives of both instructors and students. The feedback collected through these tools will aid in identifying and choosing resources appropriate for their needs. First efforts have been focused on using DataONE's EEVA tool to identify and adapt questions. Feedback will be requested from participants to help guide the content, look and feel of the tool. How to Prepare for this Session: Visiting ESIP's Data Management Training Clearinghouse (https://dmtclearinghouse.esipfed.org) would be helpful but not required for productive participation in the session.

Presentations:

View Recording: https://youtu.be/uc4tbjyePpI

Takeaways


Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services & Information Technology, University of New Mexico
Since 1986 I have had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when I arrived at UNM I have worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director, and currently... Read More →


Wednesday January 8, 2020 4:00pm - 5:30pm EST
Glen Echo
  Glen Echo, Working Session

4:00pm EST

Citizen Science Data in Earth Science: Challenges and Opportunities
Citizen science is scientific data collection and research performed primarily or in part by non-professional and amateur scientists. Citizen science data has been used in a variety of the physical sciences, including physics, ecology, biology, and water quality. As volunteer-contributed datasets continue to grow, they represent a unique opportunity to collect and analyze earth-science data on spatial and temporal scales impossible to achieve by individual researchers. This session will explore the ways open citizen science data sets can be used in earth science research and some of the associated challenges and opportunities for the ESIP community to use and partner with citizen science organizations.

Speakers:View Recording: https://youtu.be/jTNgWZI6Cik

Takeaways


How to Prepare for this Session: https://www.nationalgeographic.org/encyclopedia/citizen-science/
http://www.earthsciweek.org/citizen-science

Speakers
avatar for Alexis Garretson

Alexis Garretson

Community Fellow, ESIP
avatar for Kelsey Breseman

Kelsey Breseman

Archiving Program Lead, Environmental Data & Governance Initiative
Governmental accountability around public data & the environment. Decentralized web. Intersection of tech & ethics & civics.


Wednesday January 8, 2020 4:00pm - 5:30pm EST
Linden Oak
  Linden Oak, Breakout

4:00pm EST

Structured data web and coverages integration working session
This working session will follow on the "Advancing Data Integration approaches of the structured data web” session and the Coverage Analytics sprint as an opportunity for those interested in building linked data information products that integrate spatial features, coverage data, and more. As such, inspiration will be drawn from projects like science on schema.org, the Environmental Linked Features Interoperability Experiment, the Australian Location Index, and those that session attendees take part in. Participants will self organize into use-case or technology focused groups to discuss and synthesize the outcomes of the sprint and structured data web session. Session outcomes could take a number of forms: linked data and web page mock ups, ideas and issues for OGC, W3C, or ESIP groups to consider, example data or use cases for relevant software development projects to consider, or work plans and proposals for suture ESIP work. The session format is expected to be fluid with an ideation and group formation exercise followed by structured discussion to explore a set of ideas then narrow on a focused valuable outcome. Participants will be encouraged to work together prior to the meeting to design and plan the session structure. Outcomes of the session will be reported at an Information Technology and Interoperability webinar in early 2020. How to Prepare for this Session: Attend the coverage sprint and the "Advancing Data Integration approaches of the structured data web" session.

Shared document for session here.

Full Notes: https://doi.org/10.6084/m9.figshare.11559087.v1

Presentations:

View Recording: https://youtu.be/u2x3I0cr46A

  • Takeaways
    Breakout session information interoperability committee and webinar series. See notes: https://docs.google.com/document/d/1LpcTMwP0mAD4G4Gb8mStI5uSDV61_qWPUkQ9nI1x1cI/edit?usp=sharing
  • Foster cross-project consistency via breakouts. Such as dealing with science on schema.org issue of Links to “in-band” linked (meta)data and “out of band” linked data. Content negotiation and in-band and out of band links Use blank nodes with link properties for rdf elements that are URI for out of band content. Identify in band links with sdo @id, out of band links with sdo:URL
  • Incorporating Spatial Coverages in Knowledge Graphs; Next Steps? Need to explore more on tessellations as an intermediate index. Will carry forward some of these ideas at the EDR SWG Will represent some of these ideas to the OGC-API Coverages SWG Will mention these ideas to the UFOKN Role of ‘spatial’ knowledge graphs Will spatial data analysis and transformation tools grow to adopt/support RDF as an underlying data structure for spatial information or will RDF continue to be a ‘view’ of existing (legacy) spatial data in GI systems?


Speakers
avatar for Adam Shepherd

Adam Shepherd

Technical Director, Co-PI, BCO-DMO
schema.org | Data Containerization | Linked Data | Semantic Web | Knowledge Representation | Ontologies
avatar for Irina  Bastrakova

Irina Bastrakova

Director, Spatial Data Architecture, Geoscience Australia
I have been actively involved with international and national geoinformatics communities for more than 19 years. I am the Chair of the Australian and New Zealand Metadata Working Group. My particular interest is in developing and practical application of geoscientific and geospatial... Read More →
WF

William Francis

Geoscience Australia
avatar for Jonathan Yu

Jonathan Yu

Research data scientist/architect, CSIRO
Jonathan is a data scientist/architect with the Environmental Informatics group in CSIRO. He has expertise in information and web architectures, data integration (particularly Linked Data), data analytics and visualisation. Dr Yu is currently the technical lead for the Loc-I project... Read More →
DF

Doug Fils

Consortium for Ocean Leadership
avatar for David Blodgett

David Blodgett

U.S. Geological Survey


Wednesday January 8, 2020 4:00pm - 5:30pm EST
White Flint
 
Thursday, January 9
 

10:15am EST

Working Group for the Data Stewardship Committee
This session is a working group for the 2020-2021 year for the Data Stewardship committee. We will discuss priorities for the next year, potential collaborative outputs, and review the work in progress from the last year. 

Notes Document: https://docs.google.com/document/d/1B_0K5jGnFgH72U3P2-oGr5vEqHOGU8CWU-IkZ6pjXbM/edit?ts=5e174588

Presentations

View Recording: https://youtu.be/am-ZLfHgM4w

Takeaways
  • Wow, the members of the Committee really are active! Practically everyone has their own cluster or two!
  • Six activities proposed for the upcoming year have champions who will lead the effort to define the outputs of their selected activity.


Speakers
avatar for Alexis Garretson

Alexis Garretson

Community Fellow, ESIP
avatar for Kelsey Breseman

Kelsey Breseman

Archiving Program Lead, Environmental Data & Governance Initiative
Governmental accountability around public data & the environment. Decentralized web. Intersection of tech & ethics & civics.


Thursday January 9, 2020 10:15am - 11:45am EST
Forest Glen
  Forest Glen, Business Meeting

10:15am EST

Identifying ESIP
Permanent Identifiers (PIDs) make connections across the scholarly community possible. We are familiar with DOI's for data, but how about ORCIDs for people or RORs for organizations. How is the ESIP community using identifiers and how can we benefit from that usage?

This is the first report from the Identifying ESIP Connections Funding Friday Project that started last summer. The focus so far has been on identifying organizations associated with ESIP using the Research Organization Registry. During this session we will introduces identifiers at four levels: U.S. Federal Agencies and Departments, ESIP Sponsors, ESIP Members, and ESIP Participants. Information on all of these levels is available on the ESIP Wiki.
  1. Maria Gould, the ROR Project lead at the California Digital Library will fill us in on ROOR and answer questions about RORs. (Presentation)
  2. Ted Habermann the PI of Identifying ESIP Connections will discuss this work and lead a working discussion of RORs

Click here to participate: http://wiki.esipfed.org/index.php/Category:Identifying_ESIP_Connections


Presentations
https://doi.org/10.6084/m9.figshare.11794182.v1

View Recording: https://youtu.be/iUYmTaDdJGQ

Takeaways
  • Generally positive attitude about using identifiers for organizations but all organizations in ESIP may not end up with RORs...
  • The granularity of RORs is an ongoing challenge and spans many challenges - multi-organization projects, changes as function of time.
  • How are research organizations defined? Do repositories have RORs? Wiki pages were good way to share information.



Speakers
avatar for Ted Habermann

Ted Habermann

Chief Game Changer, Metadata Game Changers
I am interested in all facets of metadata needed to discover, access, use, and understand data of any kind. Also evaluation and improvement of metadata collections, translation proofing. Ask me about the Metadata Game.


Thursday January 9, 2020 10:15am - 11:45am EST
Linden Oak
  Linden Oak, Breakout

10:15am EST

Mapping Data & Operational Readiness Levels (ORLs) to Community Lifelines
Approach: The Disaster Lifecycle Cluster has seen great success in its efforts to put Federated arms around “trusted data for decision makers” as a way to accelerate situational awareness and decision-making. By identifying trust levels for data. This session will build upon the Summer meeting and align perfectly with the overall ESIP theme of: Data to Action: Increasing the Use and Value of Earth Science Data and Information.

The ESIP Disaster Lifecycle Cluster has evolved into one of the most operationally active clusters in the Federation with a thirst for applying datasets to decision-making environments while building trust levels that manifest themselves as ORLs. Duke Energy, All Hazards Consortium’s Sensitive Information Sharing environment (SISE), DHS and FEMA are all increasing their interest in ORLs with their sights set on implementing them in the near future. Data is available everywhere and more of it is on the way. Trusted data is available some places and can help decision makers such as utilities make 30-second decisions that can save lives, property and get the lights back on sooner, saving millions of dollars.

This session will provide the venue to discuss emerging projects from NASA’s Applied Sciences Division (A.37), Initiatives at JPL and Federal Agency data portal access that can accelerate decision making today and in the future. We will also discuss drone data and European satellite data that is available for access and use when disasters threaten. Come and join us, the data you have may just save a life.

Agenda:
  1. Greg McShane, DHS CISA - The Critical Nature of the Public-Private Trusted Information Sharing Paradigm (10 min) Presented by Tom Moran, All Hazards Consortium Executive Director
  2. Dave Jones, StormCenter/GeoCollaborate - The status of ORLs, where we are, ESIP Announcement at GEO in Australia, AHC SISE, Next Steps (10 min)
  3. Maggi Glassco, NASA Disasters Program, JPL - New Applied Sciences Disasters Projects, Possible Lifeline Support Information Sources in the Future (10 min)
  4. Bob Chen/Bob Downs, Columbia Univ./SEDAC/CIESIN - Specific Global and Local Population Data for Community Lifeline Decision Making (10 min)
  5. Discussion/Q&A Period (40 min)

Presentations

View Recording: https://youtu.be/gJ93R6SlMkM

Key Takeaways for this Session: 
  1. Through the All Hazards Consortium, a new research institute will begin to help bring candidate research products into operations. An imagery committee, consisting of private and research members under SISE, will identify and evaluate use-case driven candidate imagery data within the ORL context using Geo-Collaborate.
  2. NASA grant opportunities within the disasters program requires co-funding by end user partners to guide usage needs and adoption (using ARL success criteria). This should increase adoption of NASA funded ASP project data and/or services. The cluster would like to work with NASA ASP as a testbed for funded projects to connect to additional user communities.
  3. We discussed the need / value of population data (current and predictions on affected populations) for preparedness activities and emergency response. We would like to leverage additional data services from SEDAC to test with operational decision makers. 


Speakers
avatar for Dave Jones

Dave Jones

StormCenter Communications, StormCenter Communications
Real-time data access, sharing and collaboration across multiple platforms. Collaborative Common Operating Pictures, Decision Making, Situational Awareness, connecting disparate mapping systems to share data, cross-product data sharing and collaboration. SBIR Phase III status with... Read More →
avatar for Karen Moe

Karen Moe

NASA Goddard Emeritus
ESIP Disasters Lifecycle cluster co-chair with Dave Jones/StormCenter IncManaging an air quality monitoring project for my town just outside of Washington DC and looking for free software!! Enjoying citizen science roles in environmental monitoring and sustainable practices in my... Read More →


Thursday January 9, 2020 10:15am - 11:45am EST
Salon A-C
  Salon A-C, Breakout

12:00pm EST

License Up! What license works for you and your downstream repositories?
Many repositories are seeing an increase in the use and diversity of licenses and other intellectual property management (IPM) tools applied to externally-created data submissions and software developed by staff. However, adding a license to data files may have unexpected or unintended consequences in the downstream use or redistribution of those data. Who “owns” the intellectual property rights to data collected by university researchers using Federal and State (i.e., public) funding that must be deposited at a Federal repository? What license is appropriate for those data and what — exactly — does that license allow and disallow? What kind of license or other IPM instrument is appropriate for software written by a team of Federal and Cooperative Institute software engineers? Is there a significant difference between Creative Commons, GNU, and other ‘open source licenses’?

We have invited a panel of legal advisors from Federal and other organizations to discuss the implications of these questions for data stewards and the software teams that work collaboratively with those stewards. We may also discuss the latest information about Federal data licenses as it applies to the OPEN Government Data Act of 2019. How to Prepare for this Session: Consider what, if any, licenses, copyright, or other intellectual property rights management you apply or think applies to your work. Also consider Federal requirements such as the OPEN Government Data Act of 2019, Section 508 of the Rehabilitation Act of 1973.

Speakers:
Dr. Robert J. Hanisch is the Director of the Office of Data and Informatics, Material Measurement Laboratory, at the National Institute of Standards and Technology in Gaithersburg, Maryland. He is responsible for improving data management and analysis practices and helping to assure compliance with national directives on open data access. Prior to coming to NIST in 2014, Dr. Hanisch was a Senior Scientist at the Space Telescope Science Institute, Baltimore, Maryland, and was the Director of the US Virtual Astronomical Observatory. For more than twenty-five years Dr. Hanisch led efforts in the astronomy community to improve the accessibility and interoperability of data archives and catalogs.
Henry Wixon is Chief Counsel for the National Institute of Standards and Technology (NIST) of the U.S. Department of Commerce. His office provides programmatic legal guidance to NIST, as well as intellectual property counsel and representation to the Department of Commerce and other Department bureaus. In this role, it interacts with principal developers and users of research, including private and public laboratories, universities, corporations and governments. Responsibilities of Mr. Wixon’s office include review of NIST Cooperative Research and Development Agreements (CRADAs), licenses, Non-Disclosure Agreements (NDAs) and Material Transfer Agreements (MTAs), and the preparation and prosecution of the agency’s patent applications. As Chief Counsel, Mr. Wixon is active in standing Interagency Working Groups on Technology Transfer, on Bayh-Dole, and on Research Misconduct, as well as in the Federal Laboratory Consortium. He is a Certified Licensing Professional and a Past Chair of the Maryland Chapter of the Licensing Executives Society, USA and Canada (LES), and is a member of the Board of Visitors of the College of Computer, Mathematical and Natural Sciences of the University of Maryland, College Park.

Presentations
See attached

View Recording: https://youtu.be/5Ng5FDW1LXk.

Takeaways



Speakers
DC

Donald Collins

Oceanographer, NESDIS/NCEI Archive Branch
Send2NCEI, NCEI archival processes, records management


Thursday January 9, 2020 12:00pm - 1:30pm EST
Forest Glen
  Forest Glen, Panel

12:00pm EST

Research Object Citation Cluster Working Session
ESIP has published guidelines for citing data and for citing software and services. These have been important and influential ESIP products. Now a new cluster is working to address the issues of “research object” citation writ large. The cluster has been working to identify the various types of research objects that could or should be cited such as samples, instruments, annotations, and other artifacts. We have also been examining the various concerns that may be addressed in citing the objects such as access, credit or attribution, and scientific reproducibility. We find that citation of different types of objects may need to address different concerns and that different approaches may be necessary for different concerns and objects. We have, therefore, been working through a matrix that attempts to map all the various objects and citation concerns.

In this working session, we will provide a brief overview of the cluster's work to date on determining when different research objects get IDs. We will then work in small groups to determine when different research objects need to be identified to ensure reproducibility or validity of a result. For this purpose, we define reproducibility as the ability to independently recreate or confirm a result (not the data). A result could be a finding in a scientific paper, a legal brief, a policy recommendation, a model output or derived product — essentially any formal, testable assertion. This is essentially a provenance use case. It is very broad, but distinct from the credit and even the access concerns of citation. This is primarily about unambiguous reference. When does an object become a first-class research object?
To approach the problem, we will break up into 4-5 groups to define and give examples of different clusters of research objects and then work to answer When or under what circumstance is it necessary to identify an object to enable reproducibility. 
Potential groups include:
  1. Literature and related objects (not to be discussed)
  2. Software and related objects — Dan Katz
  3. Data and related objects — Mark Parsons
  4. Samples — Sarah Ramdeen
  5. Ontologies and vocabularies — Ruth Duerr
  6. Complex research objects (esp. but not exclusively learning resources) — Nancy Hoebelheinrich
  7. Instruments and facilites — Mike Daniels
  8. Organizations 
  9. Activities
We encourage everyone  to start drafting definitions and examples in the spreadsheet now: https://docs.google.com/spreadsheets/d/1VEYPLgTsCR_zbMUbThonBrqaYqBiMT4e525NzFi7ql8/edit#gid=1494916301
Our goal is to have a draft recommendation or complete matrix by the end of the meeting as well as potential follow-on activities for the cluster.

How to Prepare for this Session: Participants should be familiar with existing ESIP citation guidelines and have reviewed the minutes of the last several meetings, especially the "Objects and Concerns Matrix". See http://wiki.esipfed.org/index.php/Research_Object_Citation

Presentations

View Recording:
https://youtu.be/5MXzBLu7hjg (abbreviated due to breakout group emphasis of session).

Takeaways
  • What is a ‘thing’? ‘Research object’ is a defined term in other communities. Our conception is broader. Perhaps we need a new term, but much of the issue is defining when something becomes a ‘thing’ that is named and located.
  • The particular citation use case matters a lot. Reproducibility demands different considerations than credit. The cluster will consider more use cases.
  • There appears to be classes of things that can be treated similarly, but we haven’t sorted that out yet.



Speakers
avatar for Jessica Hausman

Jessica Hausman

Data Engineer, PO.DAAC JPL
avatar for Mark Parsons

Mark Parsons

Editor in Chief, Data Science Journal


Thursday January 9, 2020 12:00pm - 1:30pm EST
Linden Oak
  Linden Oak, Working Session