Loading…
This event has ended. Create your own event on Sched.
Join the 2020 ESIP Winter Meeting Highlights Webinar on Feb. 5th at 3 pm ET for a fast-paced overview of what took place at the meeting. More info here.

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Skim the Surface [clear filter]
Tuesday, January 7
 

11:00am EST

Analytic Centers for Air Quality
The Analytic Center Framework (ACF) is a concept to support scientific investigations with a harmonized collection of data from a wide range of sources and vantage points, tools and computational resources. Four recent NASA AIST competitive awards are focused on either ACFs or components which could feed into AQ ACF's. Previous projects have developed tools and improved the accessibility and usability of data for Air Quality analysis, and have tried to address issues related to inconsistent metadata, uncertainty quantification, interoperability among tools and computing resources and visualization to aid scientific investigation or applications. The format for this meeting will be a series of brief presentati.ons by invited speakers followed by a discussion. This generally follows the panel model How to Prepare for this Session: A link to a set of pre-read materials will be provided.

View Recording: https://youtu.be/fy4eoOfSbpo.

Takeaways
  • Is there enough interest to start an Air Quality cluster? Yes!
  • Technologists and scientists should both be involved in the cluster to ensure usability through stakeholder engagement


Speakers
ML

Mike Little

ESTO, NASA
Computational Technology to support scientific investigations


Tuesday January 7, 2020 11:00am - 12:30pm EST
Glen Echo
  Glen Echo, Working Session

11:00am EST

Creating a Data at Risk Commons at DataAtRisk.org
Several professional organizations have become increasingly concerned about the loss of reusable data from primary sources such as individual researchers, projects, and agencies. DataAtRisk.org aims to connect people with data in need, to data expertise, and is a response to the clear need for a community building application. This “Data at Risk” commons will allow individuals to submit and request help with threatened datasets and connect these datasets to experts who can provide resources and skills to help rescue data through a secure, professional mechanism to facilitate self-identification and discovery.

This session will provide an overview of the current status of the DataAtRisk.org project, and aims to expand the network of individuals involved in the development and implementation of DataAtRisk.org

How to Prepare for this Session: Please check out https://dataatrisk.org/ for some background on the activities.

Presentations: http://bit.ly/303gig7, https://doi.org/10.6084/m9.figshare.11536317.v1
Link to use case / user scenario: https://tinyurl.com/yh4rnk7b

View Recording: https://youtu.be/96NMQwx_EtI

Takeaways
  • Perfection is the enemy of getting stuff done
  • Something is better than nothing
  • Triage will be necessary at several places in the process



Speakers
avatar for Denise Hills

Denise Hills

Director, Energy Investigations, Geological Survey of Alabama
Long tail data, data preservation, connecting physical samples to digital information, geoscience policy, science communication


Tuesday January 7, 2020 11:00am - 12:30pm EST
Linden Oak
  Linden Oak, Working Session

11:00am EST

Interoperability of geospatial data with STAC
SpatioTemporal Asset Catalogs is an emerging specification of a common metadata model for geospatial data, and a way to make data catalogs indexable and searchable. We have already seen STAC being adopted for both public data and commercial data. Catalogs exist for several AWS Public Datasets, Landsat Collection 2 data will be published along with STAC metadata, and communities like Pangeo are using STAC to organize data repositories in a scalable way. Commercial companies like Planet and Digital Globe are starting to publish STAC metadata for some of their catalogs. Session talks may cover overviews of the STAC, software projects utilizing STAC, and use cases of STAC in organizations. How to Prepare for this Session: See https://stacspec.org/.

View Recording:https://youtu.be/BdZbJLQSNFE.

Takeaways


Speakers
avatar for Dan Pilone

Dan Pilone

Chief Technologist, Element 84
Dan Pilone is CEO/CTO of Element 84 and oversees the architecture, design, and development of Element 84's projects including supporting NASA, the USGS, Stanford University School of Medicine, and commercial clients. He has supported NASA's Earth Observing System for nearly 13 years... Read More →
avatar for Aimee Barciauskas

Aimee Barciauskas

Data engineer, Development Seed
MH

Matthew Hanson

Element 84
STAC


Tuesday January 7, 2020 11:00am - 12:30pm EST
White Flint
  White Flint, Breakout

2:00pm EST

Current Data that are available on the Cloud
NASA, NOAA and USGS are in the process of moving data onto the cloud. While they have discussed what types of services are available and future plans of what data can be found, it is not completely clear what datasets users can currently access. This session will go over what datasets are currently up in the cloud and what data to expect in the near future. This way as users are transitioning to the cloud for their compute, they can also know what data are available to them on the cloud as well. There will also be presentations from AWS. Speakers:
Katie Baynes - NASA/EOSDIS
Jon O'Neil - NOAA
Jeff de La Beaujardiere - NCAR
Kristi Kliene - USGS/EROS
Joe Flasher - AWS

Presentations: See attached.

View Recording: https://youtu.be/yssgXB7iaxw

Takeaways
  • Petabyte scale data is being moved into the cloud. This is concentrated in AWS, Google Cloud and Microsoft depending on the agency and dataset
  • Some concern around partnerships with companies (AWS most discussed) in terms of long term relationships, moving data etc. and how those things might impact access or data use
  • Need to make clear the authoritative source of the data, who is stewarding it, and any modifications done when copying to cloud. Users should exercise due diligence in selecting and using data.



Speakers
JO

Jon O'Neil

Director, NOAA Big Data Program, NOAA
avatar for Joe Flasher

Joe Flasher

Open Geospatial Data Lead, Amazon Web Services
Joe Flasher is the Open Geospatial Data Lead at Amazon Web Services helping organizations most effectively make data available for analysis in the cloud. The AWS open data program has democratized access to petabytes of data, including satellite imagery, genomic data, and data used... Read More →
avatar for Christopher Lynnes

Christopher Lynnes

Systems Architect, NASA/EOSDIS, NASA/GSFC
Christopher Lynnes is currently System Architect for NASA’s Earth Observing System Data and Information System, known as EOSDIS. He has been working on EOSDIS since 1992, over which time he has worked multiple generations of data archive systems, search engines and interfaces, science... Read More →
avatar for Jessica Hausman

Jessica Hausman

Data Engineer, PO.DAAC JPL
avatar for Jeff de La Beaujardière

Jeff de La Beaujardière

Director, Information Systems Division, NCAR
Big data, cloud computing, object storage, data management.
avatar for Dave Meyer

Dave Meyer

GES DISC manager, NASA


Tuesday January 7, 2020 2:00pm - 3:30pm EST
White Flint
  White Flint, Breakout

4:00pm EST

Bringing Science Data Uncertainty Down to Earth - Sub-orbital, In Situ, and Beyond
In the Fall of 2019, the Information Quality Cluster (IQC) published a white paper entitled “Understanding the Various Perspectives of Earth Science Observational Data Uncertainty”. The intention of this paper is to provide a diversely sampled exposition of both prolific and unique policies and practices, applicable in an international context of diverse policies and working groups, made toward quantifying, characterizing, communicating and making use of uncertainty information throughout the diverse, cross-disciplinary Earth science data landscape; to these ends, the IQC addressed uncertainty information from the following four perspectives: Mathematical, Programmatic, User, and Observational. These perspectives affect policies and practices in a diverse international context, which in turn influence how uncertainty is quantified, characterized, communicated and utilized. The IQC is now in a scoping exercise to produce a follow-on paper that is intended to provide a set of recommendations and best practices regarding uncertainty information. It is our hope that we can consider and examine additional areas of opportunity with regard to the cross-domain and cross-disciplinary aspects of Earth science data. For instance, the existing white paper covers uncertainty information from the perspective of satellite-based remote sensing well, but does not adequately address the in situ or airborne (i.e., sub-orbital) perspective. This session intends to explore such opportunities to expand the scope of the IQC’s awareness of what is being done with regard to uncertainty information, while also providing participants and observers with an opportunity to weigh in on how best to move forward with the follow-on paper. How to Prepare for this Session:Agenda:
  1. "IQC Uncertainty White Paper Status Summary and Next Steps" - Presented by: David Moroni (15 minutes)
  2. "Uncertainty quantification for in situ ocean data: The S-MODE sub-orbital campaign" - Presented by: Fred Bingham (15 minutes)
  3. "Uncertainty Quantification for Spatio-Temporal Mapping of Argo Float Data" - Presented by Mikael Kuusela (20 minutes)
  4. Panel Discussion (35 minutes)
  5. Closing Comments (5 minutes)
Notes Page: https://docs.google.com/document/d/1vfYBK_DLTAt535kMZusTPVCBAjDqptvT0AA5D6oWrEc/edit?usp=sharing

Presentations:
https://doi.org/10.6084/m9.figshare.11553681.v1

View Recording: https://youtu.be/vC2O8FRgvck

Takeaways

Speakers
avatar for David Moroni

David Moroni

Data Stewardship and User Services Team Lead, Jet Propulsion Laboratory, Physical Oceanography Distributed Active Archive Center
I am a Senior Science Data Systems Engineer at the Jet Propulsion Laboratory and Data Stewardship and User Services Team Lead for the PO.DAAC Project, which provides users with data stewardship services including discovery, access, sub-setting, visualization, extraction, documentation... Read More →
avatar for Ge Peng

Ge Peng

Research Scholar, CISESS/NCEI
Dataset-centric scientific data stewardship, data quality management
FB

Fred Bingham

University of North Carolina at Wilmington
MK

Mikael Kuusela

Carnegie Mellon University


Tuesday January 7, 2020 4:00pm - 5:30pm EST
Forest Glen
 
Wednesday, January 8
 

11:00am EST

Software Sustainability, Discovery and Accreditation
It is commonly understood that software is essential to research, in data collection, curation, analysis, and understanding, and it is also a critical element within any research infrastructure. This session will address two related software issues: 1) sustainability, and 2) discovery and accreditation.

Because scientific software is an instance of a software stack containing problem-specific software, discipline-specific tools, general tools and middleware, and infrastructural software, changes within the stack can cause the overall software to collapse and stop working, and as time goes on, work is increasingly needed to compensate for these problems, which we refer to as sustainability. Issues in which we are interested include incentives that encourage sustainability activities, business models for sustainability (including public-private partnership), software design that can reduce the sustainability burden, and metrics to measure sustainability (perhaps tied to the on-going process of defining FAIR software).

The second issue, discovery and accreditation, asks how we enable users to discover and access trustworthy and fit-for-purpose software to undertake science processing on the compute infrastructures to which they have access? And how do we ensure that publications cite the exact version of software that was used and is cited and properly credited the responsible authors?

This session will include a number of short talks, and at least two breakouts in parallel, one about the sustainability of software, and a second about discovery of sustainable and viable solutions.

Potential speakers who want to talk about an aspect of software sustainability, discovery, or accreditation should contact the session organizers.

Agenda/slides:
Presentations: See above

View Recording:
https://youtu.be/nsxjOC04JxQ

Key takeaways:

1. Funding agencies spend a large amount of money on software, but don't always know this because it's not something that they track.

OpenSource software is growing very quickly:
  • 2001: 208K SourceForge users
  • 2017: 20M GitHub users
  • 2019: 37M Github users
Software, like data, is a “first class citizen” in the ecosystem of tools and resources for scientific research and our community is accelerating their attention to this as they have for FAIR data


2. Ideas for changing our culture to better support and reward contributions to sustainable software:
  • Citation (ESIP guidelines) and/or software heritage IDs for credit and usage metrics and to meet publisher requirements (e.g. AGU)
  • Prizes
  • Incentives in hiring and promotion
  • Promote FAIR principles and/or Technical Readiness Levels for software
  • Increased use to make science more efficient through common software
  • Publish best practice materials in other languages, e.g. Mandarin, as software comes from a global community


3. A checklist of topics to consider for your community sustained software:
  • Repository with “cookie cutter” templates and sketches for forking
  • Licensing
  • Contributors Guide
  • Code of Conduct and Governance
  • Use of “Self-Documentation” features and standards
  • Easy step for trying out software
  • Continuous Integration builds
  • Unit tests
  • Good set of “known first issues” for new users trying out the software
  • Gitter or Slack Channel for feedback and communication, beyond a simple repo issues queue


Detailed notes:
The group then divided into 2 breakout sessions (Sustainability; Discovery and Accreditation), with notes as follows.

Notes from Sustainability breakout (by Daniel S. Katz):

What we think should be done:
  • Build a cookiecutter recipe for new projects, based on Ben’s slides?  What part of ESIP would be interested in this? And would do it, and support it?
  • Define governance as part of this? How do we store governance?
  • What is required, what is optional (maybe with different answers at different tiers)
  • Define types of projects (individual developer, community code, …)
  • Define for different languages – tooling needs to match needs
  • Is this specific to ESIP? Who could it be done with? The Carpentries?  SSI?

Other discussion:
  • What do we mean by sustainability – for how long?  Up to 50 years?  How do we run the system?
  • What’s the purpose of the software (use case) – transparency to see the software, actual reuse?
  • What about research objects that contain both software and data? How do we archive them? How do we cite them?
  • We have some overlap with research object citation cluster


Notes from Discovery and Accreditation breakout (by Shelley Stall):

Use Cases - Discovery
  1. science question- looking for software to support
  2. have some data output from a software process, need to gain access to the software to better understand the data.   

Example of work happening: Data and Software Preservation - NSF Funded
  • promote linked data to other research products
  • similar project in Australia - want to gain access to the chain of events that resulted in the data and/or software - the scientific drivers that resulted in this product
  • Provenance information is part of this concept.

A deeper look at discovery, once software is found, is to better understand how the software came into being. It is important to know the undocumented elements of a process that effected/impacted the chain of events that are useful information to understand for a particular piece of software.
How do we discover existing packages?
Dependency management helps to discover new elements that support software.
Concern expressed that packaged solution for creating an environment, like “AWS/AMI”, are not recognized as good enough, that an editor requested a d

Speakers
avatar for Daniel S. Katz

Daniel S. Katz

Assistant Dir. for Scientific Software & Applications, NCSA; Research Assoc. Prof., CS, ECE, iSchool, University of Illinois
avatar for Lesley Wyborn

Lesley Wyborn

Adjunct Fellow, Australian National University


Wednesday January 8, 2020 11:00am - 12:30pm EST
Forest Glen
  Forest Glen, Working Session

11:00am EST

FAIRtool.org, Serverless workflows for cubesats, Geoweaver ML workflow management, 3D printed weather stations
Come hear what ESIP Lab PIs have built over the past year. Speakers include:

Abdullah Alowairdhi: FAIRTool Project Update
Ziheng Sun: Geoweaver Project
Amanda Tan: Serverless Workflow Project
Agbeli Ameko: 3D-Printed Weather Stations

Presentations:
https://doi.org/10.6084/m9.figshare.11626284.v1

View Recording: https://youtu.be/vrRwEQRAIZ4

Takeaways



Speakers
avatar for Amanda Tan

Amanda Tan

Data Scientist, University of Washington
Cloud computing, distributed systems
avatar for Abdullah Alowairdhi

Abdullah Alowairdhi

PhD Candedate, U of Idaho
avatar for Ziheng Sun

Ziheng Sun

Research Assistant Professor, George Mason University
My research interests are mainly on geospatial cyberinfrastructure and agricultural remote sensing.
avatar for Annie Burgess

Annie Burgess

ESIP Lab Director, ESIP


Wednesday January 8, 2020 11:00am - 12:30pm EST
Salon A-C
  Salon A-C, Breakout
  • Skill Level Skim the Surface, Jump In
  • Keywords Cloud Computing, Machine Learning
  • Collaboration Area Tags Science Software
  • Remote Participation Link: https://global.gotomeeting.com/join/195545333
  • Remote Participation Phone #: (571) 317-3129
  • Remote Participation Access Code 195-545-333
  • Additional Phone #'s: Australia: +61 2 8355 1050 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 923 17 0568 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 15 360 728 Italy: +39 0 230 57 81 42 Netherlands: +31 207 941 377 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 912 71 8491 Sweden: +46 853 527 836 Switzerland: +41 225 4599 78 United Kingdom: +44 330 221 0088

2:00pm EST

Citizen Science Data and Information Quality
The ESIP Information Quality Cluster (IQC) has formally defined information quality as a combination of the following four aspects of quality, spanning the full life cycle of data products: scientific quality, product quality, stewardship quality, and service quality. Focus of the IQC has been quality of Earth science data captured by scientists/experts. For example, the whitepaper “Understanding the Various Perspectives of Earth Science Observational Data Uncertainty”, published by IQC in the fall of 2019, mainly addresses uncertainty information from the perspective of satellite-based remote sensing. With the advance of mobile computing technologies, including smart phones, Citizen Science (CS) data have been increasingly becoming more and more important sources for Earth science research. CS data have their own unique challenges regarding data quality, compared with data captured through traditional scientific approaches. The purpose of this session is to broaden the scope of IQC efforts, present the community with the state-of-the-art of research on CS data quality, and foster a collaborative interchange of technical information intended to help advance the assessment, improvement, capturing, conveying, and use of quality information associated with CS data. This session will summarize the scope of what we mean by CS data (including examples of platforms/sensors commonly used in collecting CS data) and include presentations from both past and current CS projects focusing on the topics such as challenges with CS data quality; strategies to assess, ensure, and improve CS data quality; approaches to capturing CS data quality information and conveying it to users; and use of CS data quality information for scientific discovery. 

Agenda (Click titles to view presentations)
  1. Introduction - Yaxing Wei - 5 mins
  2. Citizen Science Data Quality: The GLOBE Program – Helen M. Amos (NASA GSFC) – 18 (15+3) mins.
  3. Can we trust the power of the crowd? A look at citizen science data quality from NOAA case studies - Laura Oremland (NOAA) – 18 (15+3) mins.
  4. Turning Citizen Science into Community Science - Stephen C. Diggs (Scripps Institution of Oceanography / UCSD) and Andrea Thomer (University of Michigan)  – 18 (15+3) mins.
  5. Earth Challenge 2020: Understanding and Designing for Data Quality at Scale - Anne Bowser (Wilson Center) – 18 (15+3) mins.
  6. Discussion and Key Takeaways – All – 13 mins.

    View Recording: https://youtu.be/xaTLP4wqwe8

    Takeaways

Notes Page:
https://docs.google.com/document/d/1lRp19SF9U727ureKjY38PHOF3EGUgE-BixYDs2KlmII/edit?usp=sharing

Presentation Abstracts

  • Citizen Science Data Quality: The GLOBE Program - Helen M. Amos (NASA GSFC)
The Global Learning and Observations to Benefit the Environment (GLOBE) Program is an international program that provides a way for students and the public to contribute Earth system observations. Currently 122 countries, more than 40,000 schools, and 200,000 citizen scientists are participating in GLOBE. Since 1995, participants have contributed 195 million observations. Modes of data collection and data entry have evolved with technology over the lifetime of the program, including the launch of the GLOBE Observer mobile app in 2016 to broaden access and public participation in data collection. GLOBE must meet the data needs of a diverse range of stakeholders, from elementary school classrooms to scientists across the globe, including NASA scientists. Operational quality assurance measures include participant training, adherence to standardized data collection protocols, range and logic checks, and an approval process for photos submitted with an observation. In this presentation, we will discuss the current state of operational data QA/QC, as well as additional QA/QC processes recently explored and future directions. 
  • Can we trust the power of the crowd? A look at citizen science data quality from NOAA case studies - Laura Oremland (NOAA)
NOAA has a rich history in citizen science dating back hundreds of years.  Today NOAA’s citizen science covers a wide range of topics such as weather, oceans, and fisheries with volunteers contributing over 500,000 hours annually to these projects. The data are used to enhance NOAA’s science and monitoring programs.   But how do we know we can trust these volunteer-based efforts to provide data that reflect the high standards of NOAA’s scientific enterprise? This talk will provide an overview of NOAA’s citizen science, describe the data quality assurance and quality control processes applied to different programs, and summarize common themes and recommendations for collecting high quality citizen science data. 
  • Earth Challenge 2020: Understanding and Designing for Data Quality at Scale - Anne Bowser (Wilson Center)
April 22nd, 2020 marks the 50th anniversary of Earth day.  In recognition of this milestone Earth Day Network, the Woodrow Wilson International Center for Scholars, and the U.S. Department of State are launching Earth Challenge 2020 as the world’s largest coordinated citizen science campaign.  For 2020, the project focuses on six priority areas: air quality, water quality, insect populations, plastics pollution, food security, and climate change.  For each of these six areas, one work stream will focus on collaborating with existing citizen science projects to increase the amount of open and findable, accessible, interoperable, and reusable (FAIR) data.  A second work stream will focus on designing tools to support both existing and new citizen science activities, including a mobile application for data collection; an open, API-enabled data integration platform; data visualization tools; and, a metadata repository and data journal.
A primary value of Earth Challenge 2020 is recognizing, and elevating, ongoing citizen science activities.  Our approach seeks first to document a range of data quality practices that citizen science projects are already using to help the global research and public policy community understand these practices and assess fitness-for-use.  This information will be captured primarily through the metadata repository and data journal.  In addition, we are leveraging a range of data quality solutions for the Earth Challenge 2020 mobile app, including designing automated data quality checks and leveraging a crowdsourcing platform for expert-based data validation that will help train machine learning (ML) support.  Many of the processes designed for Earth Challenge 2020 app data can also be applied to other citizen science data sets, so maintaining information on processing level, readiness level, and provenance is a critical concern.  The goal of this presentation is to offer an overview of key Earth Challenge 2020 data documentation and data quality practices before inviting the ESIP community to offer concrete feedback and support for future work.

Speakers
avatar for David Moroni

David Moroni

Data Stewardship and User Services Team Lead, Jet Propulsion Laboratory, Physical Oceanography Distributed Active Archive Center
I am a Senior Science Data Systems Engineer at the Jet Propulsion Laboratory and Data Stewardship and User Services Team Lead for the PO.DAAC Project, which provides users with data stewardship services including discovery, access, sub-setting, visualization, extraction, documentation... Read More →
avatar for Ge Peng

Ge Peng

Research Scholar, CISESS/NCEI
Dataset-centric scientific data stewardship, data quality management
avatar for Yaxing Wei

Yaxing Wei

Scientist, Oak Ridge National Laboratory


Wednesday January 8, 2020 2:00pm - 3:30pm EST
Linden Oak
  Linden Oak, Breakout

2:00pm EST

AI for Augmenting Geospatial Information Discovery
Thanks to the rapid developments of hardware and computer science, we have seen a lot of exciting breakthroughs in self driving, voice recognition, street view recognition, cancer detection, check deposit, etc. Sooner or later the fire of AI will burn in Earth science field. Scientists need high-level automation to discover in-time accurate geospatial information from big amount of Earth observations, but few of the existing algorithms can ideally solve the sophisticated problems within automation. However, nowadays the transition from manual to automatic is actually undergoing gradually, a bit by a bit. Many early-bird researchers have started to transplant the AI theory and algorithms from computer science to GIScience, and a number of promising results have been achieved. In this session, we will invite speakers to talk about their experiences of using AI in geospatial information (GI) discovery. We will discuss all aspects of "AI for GI" such as the algorithms, technical frameworks, used tools & libraries, and model evaluation in various individual use case scenarios. How to Prepare for this Session: https://esip.figshare.com/articles/Geoweaver_for_Better_Deep_Learning_A_Review_of_Cyberinfrastructure/9037091
https://esip.figshare.com/articles/Some_Basics_of_Deep_Learning_in_Agriculture/7631615

Presentations:
https://doi.org/10.6084/m9.figshare.11626299.v1

View Recording: https://youtu.be/W0q8WiMw9Hs

Takeaways
  • There is a significant uptake of machine learning/artificial intelligence for earth science applications in the recent decade;
  • The challenge of machine learning applications for earth science domain includes:
    • the quality and availability of training data sets;
    • Requires a team with diverse skill background to implement the application
    • Need better understanding of the underlying mechanism of ML/AI models
  • There are many promising applications/ developments on streamlining the process and application of machine learning applications for different sectors of the society (weather monitoring, emergency responses, social good)



Speakers
avatar for Yuhan (Douglas) Rao

Yuhan (Douglas) Rao

Postdoctoral Research Scholar, CISESS/NCICS/NCSU
avatar for Aimee Barciauskas

Aimee Barciauskas

Data engineer, Development Seed
avatar for Annie Burgess

Annie Burgess

ESIP Lab Director, ESIP
avatar for Rahul Ramachandran

Rahul Ramachandran

Project Manager, Sr. Research Scientist, NASA
avatar for Ziheng Sun

Ziheng Sun

Research Assistant Professor, George Mason University
My research interests are mainly on geospatial cyberinfrastructure and agricultural remote sensing.


Wednesday January 8, 2020 2:00pm - 3:30pm EST
Salon A-C
  Salon A-C, Breakout

4:00pm EST

Emerging EnviroSensing Topics: Long-range, Low-power, Non-contact, Open-source Sensor Networks
Led by the ESIP EnviroSensing Cluster, this session is open to scientists, information managers, and technologists interested in the general topic of environmental sensing for science and management.

Rapid advances and decreasing costs in technology, as applied to environmental sensing systems, are promoting a shift from sparsely-distributed, single-mission observations toward employing affordable, high-fidelity, ecosystem monitoring networks driven by a need to forecast outcomes across timescales. In this session we will hear talks on new approaches to standing up long-range, low-power monitoring networks; the value(s) added by non-contact sensing (local-remote to satellite based sensing); as well as innovative sensor developments, including open-source approaches, that promote connectivity. The session will conclude with a 20-minute topical discussion open to all in attendance. How to Prepare for this Session:

List of speakers and presentation titles for this session:
  • Jacqueline Le Moigne: NASA
    Future Earth Science Measurements Using New Observing Strategies
  • David Coyle: USGS
    USGS NGWOS LPWAN Experiment: Leveraging LoRaWAN Sensor Platform Technologies
  • James Gallagher: OPeNDAP
    Sensors in Snowy Alpine Environments: Sensor Networks with LoRa, Progress Report
    View Slides: https://doi.org/10.6084/m9.figshare.11555784.v1 
  • Daniel Fuka: Va Tech
    Making Drones Interesting Again
    View Slides: https://doi.org/10.6084/m9.figshare.11663718.v1
  • Joseph Bell: USGS
    Deep-dive discussion after presentations. A topic of interest is documenting test efforts and the publication of peer-reviewed Test Reports

View Recording: https://youtu.be/dXTLqt-5Ai8

Takeaways
  • As monitoring expands across agencies and from point measures on the surface of the earth to monitoring using networks of satellites in space (internet of space) there is a growing need to increase communication among agencies and instrumentation alike
  • Inexpensive monitoring equipment is becoming readily available with large gains being made in the areas of function, reliability, and resolution/accuracy.
    • Market disruption
    • Edge -Computing (is this the current form of SDI-12-style monitoring?) local processing and storage, transmission of small/tiny data payloads
  • There appears to be a need across disciplines and agencies for a peer-reviewed test reports
    • Not resource intensive to publish
    • Available to all users (FAIR)
    • Provides details on test plan and provides test data whenever applicable.


Speakers
avatar for Joseph Bell

Joseph Bell

Hydrologist, USGS


Wednesday January 8, 2020 4:00pm - 5:30pm EST
Forest Glen
  Forest Glen, Breakout

4:00pm EST

Citizen Science Data in Earth Science: Challenges and Opportunities
Citizen science is scientific data collection and research performed primarily or in part by non-professional and amateur scientists. Citizen science data has been used in a variety of the physical sciences, including physics, ecology, biology, and water quality. As volunteer-contributed datasets continue to grow, they represent a unique opportunity to collect and analyze earth-science data on spatial and temporal scales impossible to achieve by individual researchers. This session will explore the ways open citizen science data sets can be used in earth science research and some of the associated challenges and opportunities for the ESIP community to use and partner with citizen science organizations.

Speakers:View Recording: https://youtu.be/jTNgWZI6Cik

Takeaways


How to Prepare for this Session: https://www.nationalgeographic.org/encyclopedia/citizen-science/
http://www.earthsciweek.org/citizen-science

Speakers
avatar for Alexis Garretson

Alexis Garretson

Community Fellow, ESIP
avatar for Kelsey Breseman

Kelsey Breseman

Archiving Program Lead, Environmental Data & Governance Initiative
Governmental accountability around public data & the environment. Decentralized web. Intersection of tech & ethics & civics.


Wednesday January 8, 2020 4:00pm - 5:30pm EST
Linden Oak
  Linden Oak, Breakout

4:00pm EST

Planning for new Agriculture and Climate Cluster focus area on automated agriculture with AI
The Agriculture and Climate (ACC) Cluster will host a planning session for a new focus area on automated agriculture and AI (""Agro-AI""). Some initial ideas on possible activities in this space were presented at the ACC October 2019 telecon, including those related to the “Data-to-Decisions” ESIP Lab project (https://www.esipfed.org/wp-content/uploads/2018/07/Wee.pdf). Currently, there are many initiatives and funding opportunities for automated agriculture with AI. The National Science Foundation, e.g., recently announced a program aimed at significantly advancing research in AI (https://www.nsf.gov/news/news_summ.jsp?cntn_id=299329&org=NSF&from=news), including, in its initial set of high-priority areas, “AI-Driven Innovation in Agriculture and the Food System.”
Among the topics for discussion in this planning session will be related proposal opportunities and sponsoring an ACC breakout session on agriculture and AI at the ESIP 2020 Summer Meeting. How to Prepare for this Session: TBD; there will be an intro presentation, prior to the group discussion. This presentation may be made available ahead of the meeting in the scheduled session page.

Presentations:

View Recording: https://youtu.be/GhnSINRFNBg

Takeaways
  • Next step 1: Conduct a survey of available dashboards, existing data, ML use cases, existing APIs
  • Next step 2: Decide on an example question for a use case
  • Next step 3: Define and survey potential users



Speakers
AA

Arif Albayrak

Senior Software Engineer, ADNET (GESDISC)
avatar for Bill Teng

Bill Teng

NASA GES DISC (ADNET)


Wednesday January 8, 2020 4:00pm - 5:30pm EST
Salon A-C
  Salon A-C, Business Meeting
 
Thursday, January 9
 

10:15am EST

Do you have a labeling problem? Three tools for labeling data
The ESIP community and others in machine learning regularly lament the lack of labeled datasets, needed for certain classes of training algorithms. Generating accurate, useful labels is a hard problem, with no general automated solution in sight. Thus, labeling generally involves human effort, which is challenging because the volume of data needed for training can be very large.

Tools exist to help in labeling data. This session will demonstrate three labeling tools and associated processes:
  • Image Labeler, a fast, scalable cloud-based tool to facilitate the rapid development of Earth science event databases, to aid in automated ML-based image classification, Rahul Ramachandran
  • Labelimg, an open source graphical image annotation tool, https://github.com/tzutalin/labelImg, Ziheng Sun
  • Bokeh, a Python based plotting and annotation tool set for building arbitrary labeling workflows, https://bokeh.org/, Jim Bednar
Time permitting, the session will conclude with a short discussion of thoughts and tradeoffs about the tools.

This session is followed by a hands-on workshop for using Labelimg and Bokeh. Please see the session abstract for "Hands on Labeling Workshop" for information on preparing for that workshop if you are interested in participating.

Presentations
https://doi.org/10.6084/m9.figshare.11629110.v1
https://doi.org/10.6084/m9.figshare.11591739.v1

View Recording: https://youtu.be/3ufBOoD3M1E

Takeaways
  • Machine learning based classification applications require high-quality labelled data sets for both model training and evaluation. There are many existing tools for labeling images (including earth science data), but labeling tasks are very labor and time intensive.
  • If the pre-built labeling tools don’t work for your problem, Anaconda provides a general-purpose labeler-building toolkit based on Bokeh for Python users; see https://examples.pyviz.org/ml_annotators/ml_annotators.html
  • There is opportunity in combining partly automated, partly human labeling, to automate the easy cases while leaving the final call to a person. Currently not much tool support or good practices, hard to integrate.The art of avoiding extra work!

Speakers
avatar for Ziheng Sun

Ziheng Sun

Research Assistant Professor, George Mason University
My research interests are mainly on geospatial cyberinfrastructure and agricultural remote sensing.
avatar for Anne Wilson

Anne Wilson

Senior Software Engineer, Laboratory for Atmospheric and Space Physics
avatar for Yuhan (Douglas) Rao

Yuhan (Douglas) Rao

Postdoctoral Research Scholar, CISESS/NCICS/NCSU


Thursday January 9, 2020 10:15am - 11:45am EST
Glen Echo
  Glen Echo, Breakout

12:00pm EST

Fire effects on soil morphology across time scales: Data needs for near- and long-term land and hazard management
Fire impacts soil hydrology and biogeochemistry at both near (hours to days) and long (decades to centuries) time scales. Burns, especially in soils with high organic carbon stocks like peatlands, induce a loss of absolute soil carbon stock. Additionally, fire can alter the chemical makeup of the organic matter, potentially making it more resistant to decomposition. On the shorter timescales, fire can also change the water repellent properties or hydrophobicity of the soil, leading to an increased risk of debris flows and floods.

In this session, we will focus on the varying data needs for assessing the effects of burns across time scales, from informing emergency response managers in the immediate post-burn days, to monitoring post-burn recovery, to managing carbon in a landscape decades out.

Speaker abstracts (in order of presentation):

James MacKinnon (NASA GSFC)
Machine learning methods for detecting wildfires 

This talk shows the innovative use of deep neural networks, a type of machine learning, to detect wildfires in MODIS multispectral data. This effort attained a very high classification accuracy showing that neural networks could be useful in a scientific context, especially when dealing with sparse events such as fire anomalies. Furthermore, we laid the groundwork to continue beyond binary fire classification towards being able to detect the "state," or intensity of the fire, eventually allowing for more accurate fire modeling. With this knowledge, we developed software to enable neural networks to run on even the typically compute-limited spaceflight-rated computers, and tested it by building a drone payload equipped with a flight computer analog and flew it over controlled burns to prove its efficacy.

Kathe Todd-Brown (U. FL Gainesville)
An overview of effects of fire on ecosystems

Fire is a defining characteristic of many ecosystems worldwide, and, as the climate warms, both fire frequency and severity are expected to increase. In addition to the effects of smoke on the climate and human health, there are less apparent effects of fire on the terrestrial ecosystem. From alterations in the local soil properties to changes in the carbon budget as organic carbon is combusted into CO2 and pyrogenic carbon, fire is deeply impactful to the local landscape. The long-term climate implication of fire on the terrestrial carbon budget is a tension between carbon lost to the atmosphere as carbon dioxide and sequestered in the soil as recalcitrant pyrogenic carbon. Here we present a new model to simulate the interaction between ecosystem growth, decomposition, and fire on carbon dynamics. We find that the carbon lost to burned carbon dioxide will always be recovered, if there is any recalcitrant pyrogenic carbon generated by the fires. The time scale of this recovery, however, is highly variable and often not relevant to land managers. This model highlights key data gaps at the annual and decadal time scales. Quantifying and predicting the loss of soil, litter, and vegetation carbon in an individual fire event is a key unknown. Relatedly, the amount of pyrogenic carbon generated by fire events is another near-term data needed to better constrain this model. Finally, on the longer time scales, the degree of recalcitrancy of pyrogenic carbon is a critical unknown.

Daniel Fuka (VA Tech)

Rapidly improving the spatial representation of soil properties using topographically derived initialization with a proposed workflow for new data integration
Topography exerts critical controls on many hydrologic, geomorphologic, biophysical, and forest fire processes. However, in modeling these systems, the current use of topographic data neglects opportunities to account for topographic controls on processes such as soil genesis, soil moisture distributions, and hydrological response; all factors that significantly characterize the post-fire effects and potential risks of the new landscape. In this presentation, we demonstrate a workflow that takes advantage of data brokering to combine the most recent topographic data and best available soil maps to increase the resolution and representational accuracy of spatial soil morphologic and hydrologic attributes: texture, depth, saturated conductivity, bulk density, porosity, and the water capacities at field and wilting point tensions. We show several proofs of concept and initial performance test the values of the topographically adjusted soil parameters against those from the NRCS SSURGO (Soil Survey Geographic database). Finally, we pose the potential for a quickly configurable opensource data brokering system (NSF BALTO) to be used to make available the most recently updated topographic and soils characteristics, so this workflow can rapidly re-characterize and increase the resolution of post-fire landscapes.

Dalia Kirschbaum (NASA GSFC)
Towards characterization of global post-fire debris flow hazard

Post-fire debris flows commonly occur in the western United States, but the extent of this hazard is little known in other regions. These events occur when rain falls on the ground with little vegetative cover and hydrophobic soils—two common side effects of wildfire. The storms that trigger post-fire debris flows are typically high-intensity, short-duration events. Thus, a first step towards global modeling of this hazard is to evaluate the ability of GPM IMERG and other global precipitation data to detect these storms. The second step is to determine the effectiveness of MCD64 and other globally available predictors in identifying locations susceptible to debris flows. Finally, rainfall and other variables can be combined into a single global model of post-fire debris flow occurrence. This research can show both where post-fire debris flows are currently most probable, as well as where the historical impact has been greatest.

How to Prepare for this Session:

Presentations

View Recording: https://youtu.be/I89om-kBYB0

Takeaways
  • Modeling and detecting fires and fire impacts is changing (e.g. neural networks, carbon modeling) and needs to continue to improve
  • There are many data needs to be able to operationalize post-fire debris flow and soil modeling
  • Fires severely change ecosystems and soils and we do not really understand the exact changes yet, need more research in this area


Speakers
KT

Kathe Todd-Brown

University of Florida Gainesville
DF

Dan Fuka

Virginia Tech
avatar for Bill Teng

Bill Teng

NASA GES DISC (ADNET)


Thursday January 9, 2020 12:00pm - 1:30pm EST
Salon A-C
  Salon A-C, Breakout

12:00pm EST

Datacubes for Analysis-Ready Data: Standards & State of the Art
This workshop session will follow up on the OGC Coverage Analytics sprint, focusing specifically on advanced services for spatio-temporal datacubes. In the Earth sciences datacubes are accepted as an enabling paradigm for offering massive spatio-temporal Earth data analysis-ready, more generally: easing access, extraction, analysis, and fusion. Also, datacubes homogenizes APIs across dimensions, allowing unified wrangling of 1-D sensor data, 2-D imagery, 3-D x/y/t image timeseries and x/y/z geophysics voxel data, and 4-D x/y/z/t climate and weather data.
Based on the OGC datacube reference implementation we introduce datacube concepts, state of standardization, and real-life 2D, 3D, and 4D examples utilizing services from three continents. Ample time will be available for discussion, and Internet-connected participants will be able to replay and modify many of the examples shown. Further, key datacube activities worldwide, within and beyond Earth sciences, will be related to.
Session outcomes could take a number of forms: ideas and issues for OGC, ISO, or ESIP to consider; example use cases; challenges not yet addressed sufficiently, and entirely novel use cases; work and collaboration plans for future ESIP work. Outcomes of the session will be reported at the next OGC TC meeting's Big Data and Coverage sessions. How to Prepare for this Session: Introductory and advanced material is available from http://myogc.org/go/coveragesDWG

Presentations
https://doi.org/10.6084/m9.figshare.11562552.v1

View Recording: https://youtu.be/82WG7soc5bk

Takeaways
  • Abstract coverage construct defines the base which can be filled up with a coverage implementation schema. Important as previously implementation wasn’t interoperable with different servers and clients. 
  • Have embedded the coordinate system retrieved from sensors reporting in real time into their xml schema to be able to integrate the sensor data into the broader system. Can deliver the data in addition to GML but JSON, and RDF which could be used to link into semantic web tech. 
  • Principle is send HTTP url-encoded query to server and get some results that are extracted from datacube, e.g., sources from many hyperspectral images.

Speakers

Thursday January 9, 2020 12:00pm - 1:30pm EST
White Flint