Loading…
This event has ended. Create your own event on Sched.
Join the 2020 ESIP Winter Meeting Highlights Webinar on Feb. 5th at 3 pm ET for a fast-paced overview of what took place at the meeting. More info here.

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Linden Oak [clear filter]
Tuesday, January 7
 

2:00pm EST

Data Skills & Competencies Requirements for Data Stewards: Views from the ESIP Community & Beyond
At the ESIP Summer 2019, many ESIP community members offered their feedback on the range and importance of skills and competencies for data specialists whose job responsibilities focus upon offering data "advise" (e.g., from data curators) and data "service providers" (e.g., from data librarians). By means of an interactive poster, participants were asked to choose whether a competency was of high, medium, low or no importance from a subset of competencies identified by a European Open Science Cloud (EOSC) project. In this session, session leaders will present the results of the ESIP community feedback within the context of the full list of EOSC competencies, and visualized from both a poster synthesis and a research data lifecycle point of view. Session leaders are hoping to have the audience participate by providing feedback and engaging in discussion on the data and views presented. One outcome of this work will be a "Career Compass" to be published by the American Geoscience Institute for students interested in becoming data stewards. How to Prepare for this Session:

Presentations:

View Recording: https://youtu.be/1s1L3Jter8w

Takeaways



Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services & Information Technology, University of New Mexico
Since 1986 I have had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when I arrived at UNM I have worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director, and currently... Read More →


Tuesday January 7, 2020 2:00pm - 3:30pm EST
Linden Oak
  Linden Oak, Breakout
 
Wednesday, January 8
 

11:00am EST

Pangeo in Action
The NSF-funded Pangeo project (http://pangeo.io/) is a community-driven architectural framework for big data geoscience. A typical Pangeo software stack leverages Python open-development libraries including elements such as Jupyter Notebooks for interactive data analysis, Intake catalogs to provide a higher level of abstraction, Dask for scalable, parallelized data access, and Xarray for working with labeled multi-dimensional arrays of data, and can support data formats including NetCDF as well the cloud-optimized Zarr format for chunked, compressed, N-dimensional arrays.

This session includes presentations describing implementations, results, or lessons learned from using these tools, as well as some time for open discussion. We encourage attendance by people interested in knowing more about Pangeo.

Draft schedule:
Dr. Amanda Tan, U. Washington: Pangeo overview and lessons learned
Dr. Rich Signell, USGS: The USGS EarthMap Pangeo: Success Stories and Lessons Learned
Dr. Jeff de La Beaujardière, NCAR: Climate model outputs on AWS using Pangeo framework
Dr. Karl Benedict, UNM: Pangeo as a platform for workshops
Open discussion

How to Prepare for this Session:

Presentations:
https://doi.org/10.6084/m9.figshare.11559174.v1

View Recording: https://youtu.be/VNfpGIIjL3E.

Takeaways
  • Pangeo is a community platform for Big Data geoscience; A cohesive ecosystem of open community, open source software, open ecosystem; Three core python packages: jupyter, xarray, Dask
  • Deploying Pangeo on cloud face challenges
    • Cloud costs
    • Cloud skills
    • Need of cloud-optimized data
    • Best strategy of pangeo deployment in the changing cloud services platform
  • Pangeo can be applied to leverage the jupyter notebook and other resources for different level of data users (NCAR: scientists new to cloud computing platform; University of New Mexico: workshop platform etc)

Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services & Information Technology, University of New Mexico
Since 1986 I have had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when I arrived at UNM I have worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director, and currently... Read More →
avatar for Rich Signell

Rich Signell

Oceanographer, USGS
Ocean Modeling, Python, NetCDF, THREDDS, ERDDAP, UGRID, SGRID, CF-Conventions, Jupyter, JupyterHub, CSW, TerriaJS
avatar for Amanda Tan

Amanda Tan

Data Scientist, University of Washington
Cloud computing, distributed systems
avatar for Jeff de La Beaujardière

Jeff de La Beaujardière

Director, Information Systems Division, NCAR
Big data, cloud computing, object storage, data management.


Wednesday January 8, 2020 11:00am - 12:30pm EST
Linden Oak
  Linden Oak, Breakout

2:00pm EST

Citizen Science Data and Information Quality
The ESIP Information Quality Cluster (IQC) has formally defined information quality as a combination of the following four aspects of quality, spanning the full life cycle of data products: scientific quality, product quality, stewardship quality, and service quality. Focus of the IQC has been quality of Earth science data captured by scientists/experts. For example, the whitepaper “Understanding the Various Perspectives of Earth Science Observational Data Uncertainty”, published by IQC in the fall of 2019, mainly addresses uncertainty information from the perspective of satellite-based remote sensing. With the advance of mobile computing technologies, including smart phones, Citizen Science (CS) data have been increasingly becoming more and more important sources for Earth science research. CS data have their own unique challenges regarding data quality, compared with data captured through traditional scientific approaches. The purpose of this session is to broaden the scope of IQC efforts, present the community with the state-of-the-art of research on CS data quality, and foster a collaborative interchange of technical information intended to help advance the assessment, improvement, capturing, conveying, and use of quality information associated with CS data. This session will summarize the scope of what we mean by CS data (including examples of platforms/sensors commonly used in collecting CS data) and include presentations from both past and current CS projects focusing on the topics such as challenges with CS data quality; strategies to assess, ensure, and improve CS data quality; approaches to capturing CS data quality information and conveying it to users; and use of CS data quality information for scientific discovery. 

Agenda (Click titles to view presentations)
  1. Introduction - Yaxing Wei - 5 mins
  2. Citizen Science Data Quality: The GLOBE Program – Helen M. Amos (NASA GSFC) – 18 (15+3) mins.
  3. Can we trust the power of the crowd? A look at citizen science data quality from NOAA case studies - Laura Oremland (NOAA) – 18 (15+3) mins.
  4. Turning Citizen Science into Community Science - Stephen C. Diggs (Scripps Institution of Oceanography / UCSD) and Andrea Thomer (University of Michigan)  – 18 (15+3) mins.
  5. Earth Challenge 2020: Understanding and Designing for Data Quality at Scale - Anne Bowser (Wilson Center) – 18 (15+3) mins.
  6. Discussion and Key Takeaways – All – 13 mins.

    View Recording: https://youtu.be/xaTLP4wqwe8

    Takeaways

Notes Page:
https://docs.google.com/document/d/1lRp19SF9U727ureKjY38PHOF3EGUgE-BixYDs2KlmII/edit?usp=sharing

Presentation Abstracts

  • Citizen Science Data Quality: The GLOBE Program - Helen M. Amos (NASA GSFC)
The Global Learning and Observations to Benefit the Environment (GLOBE) Program is an international program that provides a way for students and the public to contribute Earth system observations. Currently 122 countries, more than 40,000 schools, and 200,000 citizen scientists are participating in GLOBE. Since 1995, participants have contributed 195 million observations. Modes of data collection and data entry have evolved with technology over the lifetime of the program, including the launch of the GLOBE Observer mobile app in 2016 to broaden access and public participation in data collection. GLOBE must meet the data needs of a diverse range of stakeholders, from elementary school classrooms to scientists across the globe, including NASA scientists. Operational quality assurance measures include participant training, adherence to standardized data collection protocols, range and logic checks, and an approval process for photos submitted with an observation. In this presentation, we will discuss the current state of operational data QA/QC, as well as additional QA/QC processes recently explored and future directions. 
  • Can we trust the power of the crowd? A look at citizen science data quality from NOAA case studies - Laura Oremland (NOAA)
NOAA has a rich history in citizen science dating back hundreds of years.  Today NOAA’s citizen science covers a wide range of topics such as weather, oceans, and fisheries with volunteers contributing over 500,000 hours annually to these projects. The data are used to enhance NOAA’s science and monitoring programs.   But how do we know we can trust these volunteer-based efforts to provide data that reflect the high standards of NOAA’s scientific enterprise? This talk will provide an overview of NOAA’s citizen science, describe the data quality assurance and quality control processes applied to different programs, and summarize common themes and recommendations for collecting high quality citizen science data. 
  • Earth Challenge 2020: Understanding and Designing for Data Quality at Scale - Anne Bowser (Wilson Center)
April 22nd, 2020 marks the 50th anniversary of Earth day.  In recognition of this milestone Earth Day Network, the Woodrow Wilson International Center for Scholars, and the U.S. Department of State are launching Earth Challenge 2020 as the world’s largest coordinated citizen science campaign.  For 2020, the project focuses on six priority areas: air quality, water quality, insect populations, plastics pollution, food security, and climate change.  For each of these six areas, one work stream will focus on collaborating with existing citizen science projects to increase the amount of open and findable, accessible, interoperable, and reusable (FAIR) data.  A second work stream will focus on designing tools to support both existing and new citizen science activities, including a mobile application for data collection; an open, API-enabled data integration platform; data visualization tools; and, a metadata repository and data journal.
A primary value of Earth Challenge 2020 is recognizing, and elevating, ongoing citizen science activities.  Our approach seeks first to document a range of data quality practices that citizen science projects are already using to help the global research and public policy community understand these practices and assess fitness-for-use.  This information will be captured primarily through the metadata repository and data journal.  In addition, we are leveraging a range of data quality solutions for the Earth Challenge 2020 mobile app, including designing automated data quality checks and leveraging a crowdsourcing platform for expert-based data validation that will help train machine learning (ML) support.  Many of the processes designed for Earth Challenge 2020 app data can also be applied to other citizen science data sets, so maintaining information on processing level, readiness level, and provenance is a critical concern.  The goal of this presentation is to offer an overview of key Earth Challenge 2020 data documentation and data quality practices before inviting the ESIP community to offer concrete feedback and support for future work.

Speakers
avatar for David Moroni

David Moroni

Data Stewardship and User Services Team Lead, Jet Propulsion Laboratory, Physical Oceanography Distributed Active Archive Center
I am a Senior Science Data Systems Engineer at the Jet Propulsion Laboratory and Data Stewardship and User Services Team Lead for the PO.DAAC Project, which provides users with data stewardship services including discovery, access, sub-setting, visualization, extraction, documentation... Read More →
avatar for Ge Peng

Ge Peng

Research Scholar, CISESS/NCEI
Dataset-centric scientific data stewardship, data quality management
avatar for Yaxing Wei

Yaxing Wei

Scientist, Oak Ridge National Laboratory


Wednesday January 8, 2020 2:00pm - 3:30pm EST
Linden Oak
  Linden Oak, Breakout

4:00pm EST

Citizen Science Data in Earth Science: Challenges and Opportunities
Citizen science is scientific data collection and research performed primarily or in part by non-professional and amateur scientists. Citizen science data has been used in a variety of the physical sciences, including physics, ecology, biology, and water quality. As volunteer-contributed datasets continue to grow, they represent a unique opportunity to collect and analyze earth-science data on spatial and temporal scales impossible to achieve by individual researchers. This session will explore the ways open citizen science data sets can be used in earth science research and some of the associated challenges and opportunities for the ESIP community to use and partner with citizen science organizations.

Speakers:View Recording: https://youtu.be/jTNgWZI6Cik

Takeaways


How to Prepare for this Session: https://www.nationalgeographic.org/encyclopedia/citizen-science/
http://www.earthsciweek.org/citizen-science

Speakers
avatar for Alexis Garretson

Alexis Garretson

Community Fellow, ESIP
avatar for Kelsey Breseman

Kelsey Breseman

Archiving Program Lead, Environmental Data & Governance Initiative
Governmental accountability around public data & the environment. Decentralized web. Intersection of tech & ethics & civics.


Wednesday January 8, 2020 4:00pm - 5:30pm EST
Linden Oak
  Linden Oak, Breakout
 
Thursday, January 9
 

10:15am EST

Identifying ESIP
Permanent Identifiers (PIDs) make connections across the scholarly community possible. We are familiar with DOI's for data, but how about ORCIDs for people or RORs for organizations. How is the ESIP community using identifiers and how can we benefit from that usage?

This is the first report from the Identifying ESIP Connections Funding Friday Project that started last summer. The focus so far has been on identifying organizations associated with ESIP using the Research Organization Registry. During this session we will introduces identifiers at four levels: U.S. Federal Agencies and Departments, ESIP Sponsors, ESIP Members, and ESIP Participants. Information on all of these levels is available on the ESIP Wiki.
  1. Maria Gould, the ROR Project lead at the California Digital Library will fill us in on ROOR and answer questions about RORs. (Presentation)
  2. Ted Habermann the PI of Identifying ESIP Connections will discuss this work and lead a working discussion of RORs

Click here to participate: http://wiki.esipfed.org/index.php/Category:Identifying_ESIP_Connections


Presentations
https://doi.org/10.6084/m9.figshare.11794182.v1

View Recording: https://youtu.be/iUYmTaDdJGQ

Takeaways
  • Generally positive attitude about using identifiers for organizations but all organizations in ESIP may not end up with RORs...
  • The granularity of RORs is an ongoing challenge and spans many challenges - multi-organization projects, changes as function of time.
  • How are research organizations defined? Do repositories have RORs? Wiki pages were good way to share information.



Speakers
avatar for Ted Habermann

Ted Habermann

Chief Game Changer, Metadata Game Changers
I am interested in all facets of metadata needed to discover, access, use, and understand data of any kind. Also evaluation and improvement of metadata collections, translation proofing. Ask me about the Metadata Game.


Thursday January 9, 2020 10:15am - 11:45am EST
Linden Oak
  Linden Oak, Breakout