Loading…
This event has ended. Create your own event on Sched.
Join the 2020 ESIP Winter Meeting Highlights Webinar on Feb. 5th at 3 pm ET for a fast-paced overview of what took place at the meeting. More info here.

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Monday, January 6
 

9:00am

ESIP and OGC Coverage Processing and Analysis Sprint Day 1
The Earth Science Information Partners (ESIP) and the Open Geospatial Consortium (OGC) are convening an agile development sprint to advance APIs for analytics on coverages, arrays, and gridded data. This will be a key event in the development of OGC APIs for geospatial resources and building blocks for community APIs. The event will be co-located with the 2020 ESIP Winter Meeting, which draws Earth Science data and information professionals from across the public, private, and academic sectors. A previous OGC API Hackathon in June 2019 advanced common elements across OGC APIs for Features, Coverages, Map Tiles, Processing and Catalogs. The next sprints are advancing specific elements of the individual APIs.
Please note this is a 2-day event (1/6/20-1/7/20).

Slide Deck | Gitter Channel for Chat | Remote Participation Details Below
________
AGENDA
Monday, January 6
  • 7:30 - 9 am: Breakfast & Coffee
  • 9 - 10:30 am: Opening Session 
    • Welcome (George & Annie)
    • API Approach (Chuck)
      • Common (Chuck), Coverages (Stephan & Peter), Processes (Benjamin & Ziheng)
    • Use Cases 
      • Multipoint (David), Raster (Ethan), Analytics (Chris)
  • 10:30 - 11 am: Coffee Break
  • 11 - 12 pm: Development Group Formation and Discussion
    • Groups based on the discussion in the opening session
  • 12 - 1 pm: Lunch  
  • 1 - 4:00 pm: Group development and discissions
  • 4 - 5 pm: End of day discussion; updates of days results.

Tuesday, January 7
  • 7:30 - 8:30 am: Breakfast & Coffee
  • 8:30 - 9 am: ESIP Meeting Overview
  • 9 to 9:30 am: Morning coordination: plan for the day
  • 9:30 - 10:30 am: Interoperability Testing
  • 10:30 - 11 am: Coffee Break
  • 11 - 12pm: Interoperability Testing
  • 12 - 1 pm: Lunch
  • 1 - 3:30 pm: Populate Slides for full Report out
  • 3:30 - 4 pm: Coffee Break
  • 4 - 5:30 pm: Concluding Session: Demos and reports from use case development; API Specification Updates; Next Steps

Outside of the Sprint
  • Participate in ESIP Meeting.
  • Open spaces in hotel for some continued Sprint Discussions
  • ESIP sessions have been proposed on related topics; encourage coordination. 
How to Prepare: https://github.com/opengeospatial/CoverageProcessAnalytics/blob/master/README.md

Speakers
avatar for Ingo Simonis

Ingo Simonis

Director Innovation Programs & Science, OGC
Dr. Ingo Simonis is director of interoperability programs and science at the Open Geospatial Consortium (OGC), an international consortium of more than 525 companies, government agencies, research organizations, and universities participating in a consensus process to develop publicly... Read More →
avatar for George Percivall

George Percivall

CTO, Chief Engineer, Open Geospatial Consortium
As CTO and Chief Engineer of the Open Geospatial Consortium (OGC), George Percivall is responsible for the OGC Interoperability Program and the OGC Compliance Program. His roles include articulating OGC standards as a coherent architecture, as well as addressing implications of technology... Read More →


Monday January 6, 2020 9:00am - 5:00pm
Brookside A

4:00pm

Council of Data Facilities General Assembly Meeting
The Council of Data Facilities (CDF) is committed to working with relevant agencies, professional associations, initiatives, and other complementary efforts to enable transformational science, innovative education, and informed public policy through increased coordination, collaboration, and innovation in the acquisition, curation, preservation, and dissemination of geoscience data, tools, models, and services. Existing and emerging geoscience data facilities – through the Council – are committed to serving as an effective foundation for EarthCube. The General Assembly meeting is open to the official representatives from all member data facilities, additional member organization personnel as desired by the members, as well as observers. How to

Agenda:
400-415 Welcome/introductions/sign-in - Danie415-430 High level Summary of OKN workshop - TBA
430-435 Updates on shared infrastructure - Kerstin, Danie
435-445 Update on COPDESS-Kerstin, Shelley
445-515 Update and next steps on P419-Doug, Adam
515-530 Progress on EC supplements for CCHDO and MagIC related to P418/P419 (GeoCODES)-Steve
530-550 Update from tech team EarthCube Office-Kenton McHenry
550-600 Summer topics - Danie
      • Suggested Charter changes (to be voted on at july 2020)
      • Announce  CDF exec elections in july 2020 - 2 co-chair and 3 at large positions


Speakers
avatar for Jessica Hausman

Jessica Hausman

Data Engineer, PO.DAAC JPL


Monday January 6, 2020 4:00pm - 6:00pm
Glen Echo
 
Tuesday, January 7
 

8:30am

Meeting Overview & Plenary
View Live Stream here: ESIP 2020 Winter Meeting - Day 1 Plenary

  • 8:30 am - Welcome & Overview (Erin Robinson & Karl Benedict)
  • 9:00 am - Nadine Alameh: "Putting data to work: Insights from the Earth Science domain"
    We live in exciting times: explosive availability of data about nearly every aspect of human activity, increasing variety of data sources from mobile devices, to remote sensing to Internet of Things, revolutionary advances in computing technologies and big data analytics, maturation of artificial intelligence and machine learning. Let’s also not forget a whole new generation of “digital natives” who think, plan, collaborate and execute differently than older generations.

    It’s no wonder that many domains are struggling with how to evolve in such exciting times, addressing not only the technical aspects of data (and interoperability) but also the business implications. Take for instance the Aviation domain scaling to accommodate millions of UAVs vs. the thousands of air traffic-controlled airplanes; or how the National Mapping Agencies/ Transportation Departments worldwide are exploring how to best incorporate HD maps generated by the autonomous vehicles market; or how businesses now have an increasing choice of weather data from several commercial weather data providers as opposed to traditional government National Weather Services.

    These cases are no different from how the Earth Science community has embraced and leveraged the small satellite market towards improved science results and decision making. I would argue that, in many ways, the Earth Science community is ahead of other domains in our evolution with the times – mostly because we have always relied on “big data”, have always required integrating data from multiple sources, have always emphasized the value of metadata for determining fitness for purpose, and mostly have always appreciated and encouraged partnerships for the greater good.

    In short, the Earth Science community is an expert at putting data to work- at transforming data to information to knowledge to value by leveraging innovation and partnerships! This talk is an opportunity to bubble up some insights gained from our community to share with the world as well as ground us as we continue to evolve. 
  • 9:30 am - Paco Nathan: "Rich Context: providing support for cross-agency data stewardship, and measuring dataset impact on public policy"
    This talk explores the Rich Context project based in the Coleridge Initiative at NYU Wagner, a public-private partnership, which leverages advanced machine learning to support cross-agency data stewardship and measure dataset impact on public policy. In particular we'll focus on perspectives from industry, such as open source projects based in Silicon Valley that are finding close corollaries and applications in government data management. The project also hosts a public machine learning competition that has engaged top AI research teams worldwide to address semantic harmonization problems in scientific communications.

    Coleridge Initiative produces the ADRF platform, currently used by 15 federal, state, and local agencies in the US, to provide a FedRAMP compliant environment on GovCloud for data analytics. On the one hand, this helps analysts use sensitive data without having to work within an air-gap data facility. On the other hand, this assists data stewards at agencies to monitor data usage and provide support to their customers. The team partners with Deutsche Bundesbank for similar cross-agency work in EU, where they have have pioneered a "data impact factor" metric for use with economic datasets (banking microdata) associated with the German central bank.

    The Rich Context project intakes metadata from the agencies involved with ADRF to build a knowledge graph of metadata about dataset usage. Our focus for 2020 is working with NOAA to apply this knowledge graph work for the agency. Specifically, this focuses on coastal communities which use NOAA data for resiliency planning. We leverage machine learning to identify linkages between Earth science data and socioeconomic policy impact within local communities. The collaboration with NOAA is intended as a case study for other agencies to reuse, in support of the Federal Data Strategy and its Year-1 Action Plan.
    View Slides here: https://doi.org/10.6084/m9.figshare.11573343.v1.
  • 10:00 am - Karl Benedict, Crista Straub, Carl Shapiro & others: Public-Private Partnerships Panel
    Maximizing the value and impact of Earth Observation data requires active participation throughout the complete value chain from initial acquisition and processing, publication and sharing through archives and repositories, through ultimate use in decision-making and other applications. ESIP has a long hisory of providing a venue for the development of partnerships along the full EO data value chain.

    This plenary panel discussion will provide a set of short descriptions of public-private partnership experiences from organizations that have strong experience in developing and maintaining these partnerships that are focused on maximizing the value of Earth Observation and geospatial data. Following the short presentations from the panelists we will have a brief Q&A with the panel and follow up with a breakout session focused on identifying thematic areas in which there are opportunities for developing new partnerships and having disscussions around the nature and characteristics of those potential partnerships.
The Panelists for this plenary session will include:
  • Jeff Donze - ESRI
  • Ana Pinheiro Privette - Amazon
  • Timothy Stryker - USGS
  • Ajay Mehta - NOAA NESDIS

Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services & Information Technology, University of New Mexico
For over 33 years Karl Benedict has had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when he arrived at UNM he has worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director... Read More →
avatar for Erin Robinson

Erin Robinson

Executive Director, ESIP
I work at the intersection of community informatics, Earth science and non-profit management. Over more than 10 years, I’ve honed an eclectic skill set both technical and managerial, creating communities and programs with lasting impact around science, data, and technology.
avatar for Paco Nathan

Paco Nathan

Managing Partner, Derwen, Inc.
Known as a "player/coach", with core expertise in data science, natural language, machine learning, cloud computing; 35+ years tech industry experience, ranging from Bell Labs to early-stage start-ups. Co-chair for Rev conference, former co-chair for JupyterCon. Advisor for NYU Coleridge... Read More →
avatar for Nadine Alameh

Nadine Alameh

CEO, Open Geospatial Consortium
Dr. Nadine Alameh is the recently appointed CEO of the Open Geospatial Consortium (OGC), an international organization dedicated to making Location information Findable, Accessible, Interoperable and Reusable (FAIR) via a process that combines consensus-based standards, collaborative... Read More →


Tuesday January 7, 2020 8:30am - 10:30am
Salon A-C
  • Remote Participation Link: https://global.gotomeeting.com/join/195545333
  • Remote Participation Phone #: (571) 317-3129
  • Remote Participation Access Code 195-545-333
  • Additional Phone #'s: Australia: +61 2 8355 1050 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 923 17 0568 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 15 360 728 Italy: +39 0 230 57 81 42 Netherlands: +31 207 941 377 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 912 71 8491 Sweden: +46 853 527 836 Switzerland: +41 225 4599 78 United Kingdom: +44 330 221 0088

9:00am

ESIP and OGC Coverage Processing and Analysis Sprint Day 2
The Earth Science Information Partners (ESIP) and the Open Geospatial Consortium (OGC) are convening an agile development sprint to advance APIs for analytics on coverages, arrays, and gridded data. This will be a key event in the development of OGC APIs for geospatial resources and building blocks for community APIs. The event will be co-located with the 2020 ESIP Winter Meeting, which draws Earth Science data and information professionals from across the public, private, and academic sectors. A previous OGC API Hackathon in June 2019 advanced common elements across OGC APIs for Features, Coverages, Map Tiles, Processing and Catalogs. The next sprints are advancing specific elements of the individual APIs.
Please note this is a 2-day event (1/6/20-1/7/20).

Slide Deck | Gitter Channel for Chat | Remote Participation Details Below
________
AGENDA
Tuesday, January 7
  • 7:30 - 8:30 am: Breakfast & Coffee
  • 8:30 - 9 am: ESIP Meeting Overview
  • 9 to 9:30 am: Morning coordination: plan for the day
  • 9:30 - 10:30 am: Interoperability Testing
  • 10:30 - 11 am: Coffee Break
  • 11 - 12pm: Interoperability Testing
  • 12 - 1 pm: Lunch
  • 1 - 3:30 pm: Populate Slides for full Report out
  • 3:30 - 4 pm: Coffee Break
  • 4 - 5:30 pm: Concluding Session: Demos and reports from use case development; API Specification Updates; Next Steps

Outside of the Sprint
  • Participate in ESIP Meeting.
  • Open spaces in hotel for some continued Sprint Discussions
  • ESIP sessions have been proposed on related topics; encourage coordination. 
How to Prepare: https://github.com/opengeospatial/CoverageProcessAnalytics/blob/master/README.md

Speakers
avatar for George Percivall

George Percivall

CTO, Chief Engineer, Open Geospatial Consortium
As CTO and Chief Engineer of the Open Geospatial Consortium (OGC), George Percivall is responsible for the OGC Interoperability Program and the OGC Compliance Program. His roles include articulating OGC standards as a coherent architecture, as well as addressing implications of technology... Read More →


Tuesday January 7, 2020 9:00am - 3:30pm
Brookside A

10:30am

Networking Break
Tuesday January 7, 2020 10:30am - 11:00am
Salon A-C Foyer

11:00am

FAIR Metadata Recommendations
We will discuss the FAIR metadata recommendations that were introduced at the ESIP Summer Meeting. How to Prepare for this Session: Use git repository: Issues

Links:
Glossary
Use git repository: 
Issues

View Recording:https://youtu.be/5hwZOLQ1p9M.

Takeaways
  • NCEAS is continuing to work on pinning down what are the fundamental characteristics for FAIR data. Have the suite of checks (e.g. is title present). 54 are currently implemented and they are working toward a community define 1.0 check suite. This is a good tool for data curators but has the potential to be misunderstood or misused - need a public FAIR metric. Public FAIR metric is high level and simple and includes only items that everyone agrees upon.
  • Future plans to create community specific custom FAIR suite checks to handle the variability of how metadata is hosted. Continually evaluating if checks are helping/hurting the data curators. Work is needed on the user interface - how do we ensure that metadata evaluation is a positive experience regardless of the score.
  • Reusability is typically low throughout the data repositories. Accessibility needs a greater focus as it’s hindered by broken/missing links. “When you decide what fields are mandatory (vs optional) you decide what metadata you get”


Speakers
avatar for Ted Habermann

Ted Habermann

Chief Game Changer, Metadata Game Changers
I am interested in all facets of metadata needed to discover, access, use, and understand data of any kind. Also evaluation and improvement of metadata collections, translation proofing. Ask me about the Metadata Game.
avatar for Matt Jones

Matt Jones

Director, DataONE Program, DataONE, UC Santa Barbara
DataONE | Arctic Data Center | Open Science | Provenance and Semantics | Scientific Synthesis


Tuesday January 7, 2020 11:00am - 12:30pm
Forest Glen

11:00am

Analytic Centers for Air Quality
The Analytic Center Framework (ACF) is a concept to support scientific investigations with a harmonized collection of data from a wide range of sources and vantage points, tools and computational resources. Four recent NASA AIST competitive awards are focused on either ACFs or components which could feed into AQ ACF's. Previous projects have developed tools and improved the accessibility and usability of data for Air Quality analysis, and have tried to address issues related to inconsistent metadata, uncertainty quantification, interoperability among tools and computing resources and visualization to aid scientific investigation or applications. The format for this meeting will be a series of brief presentati.ons by invited speakers followed by a discussion. This generally follows the panel model How to Prepare for this Session: A link to a set of pre-read materials will be provided.

View Recording: https://youtu.be/fy4eoOfSbpo.

Takeaways
  • Is there enough interest to start an Air Quality cluster? Yes!
  • Technologists and scientists should both be involved in the cluster to ensure usability through stakeholder engagement


Speakers
ML

Mike Little

ESTO, NASA
Computational Technology to support scientific investigations



Tuesday January 7, 2020 11:00am - 12:30pm
Glen Echo

11:00am

Creating a Data at Risk Commons at DataAtRisk.org
Several professional organizations have become increasingly concerned about the loss of reusable data from primary sources such as individual researchers, projects, and agencies. DataAtRisk.org aims to connect people with data in need, to data expertise, and is a response to the clear need for a community building application. This “Data at Risk” commons will allow individuals to submit and request help with threatened datasets and connect these datasets to experts who can provide resources and skills to help rescue data through a secure, professional mechanism to facilitate self-identification and discovery.

This session will provide an overview of the current status of the DataAtRisk.org project, and aims to expand the network of individuals involved in the development and implementation of DataAtRisk.org

How to Prepare for this Session: Please check out https://dataatrisk.org/ for some background on the activities.

Presentations: http://bit.ly/303gig7, https://doi.org/10.6084/m9.figshare.11536317.v1
Link to use case / user scenario: https://tinyurl.com/yh4rnk7b

View Recording: https://youtu.be/96NMQwx_EtI

Takeaways
  • Perfection is the enemy of getting stuff done
  • Something is better than nothing
  • Triage will be necessary at several places in the process



Speakers
avatar for Denise Hills

Denise Hills

Director, Energy Investigations, Geological Survey of Alabama
Long tail data, data preservation, connecting physical samples to digital information, geoscience policy, science communication


Tuesday January 7, 2020 11:00am - 12:30pm
Linden Oak

11:00am

Public-Private Partnerships for Earth Observations
The USGS Earth Observation Community is interested in investigating public-private partnerships including "how might these partnerships work;" "how would the data be used;" and “what are the potential benefits of the partnerships.”  For example, a motivating question is: "would a public-private partnership allow a sufficient business case for Landsat and what would that look like?". This question is supported by the recent Landsat Advisory Group (LAG) report (https://www.fgdc.gov/ngac/meetings/june-2019/ngac-paper-evaluation-of-a-range-of-landsat-data.pdf) that indicated "LAG recommends further research on the viability of a PPP model for Landsat. Such research should include dialogue with industry as early as possible to make sure its concerns are considered".

This breakout session will build on the discussion started in the plenary session earlier this morning in which the diverse experiences of public- and private-sector participants in public-private partnerships were highlighted. At the beginning of the session, attendees will be asked to select 3-4 questions like those posed above to be the topic of tabletop discussions in which the following questions will be posed:


  • Who are the interested or potential partners in a partnership related to the topic?
  • What are the potential mechanisms for establishing the partnership?
  • What are the added-value benefits of the partnership vs. the status quo of the participants working on their own? I.e. what are the needs of the private- and public-sector participants in a potential partnership?
  • How can the ESIP community facilitate the identification and development of new public-private partnerships that will increase the value and impact of EO data?
We will then finish the breakout session with a 20-minute report-out and discussion of the outcomes of the discussions in the individual groups.
How to Prepare for this Session:

Come to the session prepared to discuss potential partnership ideas, your experience in building and supporting partnerships, and a willingness to see where new collaborations can take us.

We hope to bring out of this session a summary document that summarizes the insights and steps forward that come out of the tabletop discussions, including action items related to following on emerging partnerships and identifying concrete actions that ESIP can take to support community participation in developing partnerships.

Takeaways
  • ESIP can continue as a place to convene and build public and private partnership types. Include collaboration areas, guidance on concerns, or common structure.
  • Do we want to form a cluster to continue this? The private sector says yes. If interested, look at www.surveymonkey.com/r/P3_collab. ESIP housing knowledge tools as to the efficacy of different partnerships and stories from the community.
  • Public and private is n to expand the use of data at different levels, to provide it for both private and public groups.




Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services & Information Technology, University of New Mexico
For over 33 years Karl Benedict has had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when he arrived at UNM he has worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director... Read More →


Tuesday January 7, 2020 11:00am - 12:30pm
Salon A-C

11:00am

Interoperability of geospatial data with STAC
SpatioTemporal Asset Catalogs is an emerging specification of a common metadata model for geospatial data, and a way to make data catalogs indexable and searchable. We have already seen STAC being adopted for both public data and commercial data. Catalogs exist for several AWS Public Datasets, Landsat Collection 2 data will be published along with STAC metadata, and communities like Pangeo are using STAC to organize data repositories in a scalable way. Commercial companies like Planet and Digital Globe are starting to publish STAC metadata for some of their catalogs. Session talks may cover overviews of the STAC, software projects utilizing STAC, and use cases of STAC in organizations. How to Prepare for this Session: See https://stacspec.org/.

View Recording:https://youtu.be/BdZbJLQSNFE.

Takeaways


Speakers
avatar for Dan Pilone

Dan Pilone

Chief Technologist, Element 84
Dan Pilone is CEO/CTO of Element 84 and oversees the architecture, design, and development of Element 84's projects including supporting NASA, the USGS, Stanford University School of Medicine, and commercial clients. He has supported NASA's Earth Observing System for nearly 13 years... Read More →
avatar for Aimee Barciauskas

Aimee Barciauskas

Data engineer, Development Seed
MH

Matthew Hanson

Element 84
STAC


Tuesday January 7, 2020 11:00am - 12:30pm
White Flint

12:30pm

Lunch
Tuesday January 7, 2020 12:30pm - 2:00pm
Salon D

1:00pm

ESIP 101
A quick primer on all things ESIP to help you navigate the meeting and the community! Come meet other new and returning ESIP members and ESIP leadership.

Tuesday January 7, 2020 1:00pm - 1:30pm
Salon A-C

2:00pm

Making a Good First Impression: Metadata Quality Metrics for Earth Observation Data and Information
Metadata is often the first information that a user interacts with when looking for data. Understanding that there is typically only one chance to make a good impression, data and information repositories have placed an emphasis on metadata quality as a way of increasing the likelihood that a user will have a favorable first impression. This session will explore quality metrics, badging or scoring, and metadata quality assessment approaches within the Earth observation community. Discussion questions include:
● Does your organization implement metadata quality metrics and/or scores?
○ What are the key metrics that the scores are based on?
○ What priorities are driving your metadata quality metrics? For example, different repositories have different priorities. These priorities can include an emphasis on discoverability, accessibility, usability, provenance, etc...
● Does your organization make metadata quality scores publically viewable? What are the pros and cons of making the scores publically accessible?
How to Prepare for this Session:

Presentations:
https://doi.org/10.6084/m9.figshare.11553606.v1
https://doi.org/10.6084/m9.figshare.11551182.v1

View Recording: https://youtu.be/lbza3gEHmtQ

Takeaways
  • Visualizations of the metadata quality metrics need to be easily understood or well documented to be effective
  • There are diverse ideas and current metrics that are being rolled out soon (U.S. Global Change Research Program & NCA)
  • Ensuring that metrics interact with existing standards such as FAIR is also important

Speakers
avatar for Amrutha Elamparuthy

Amrutha Elamparuthy

GCIS Data Manager, U.S. Global Change Research Program


Tuesday January 7, 2020 2:00pm - 3:30pm
Forest Glen

2:00pm

ESIP Geoscience Community Ontology Engineering Workshop (GCOEW)
"Brains! Brains! Give us your brains!""
- Friendly neighbourhood machine minds
The collective knowledge in the ESIP community is immense and invaluable. During this session, we'd like to make sure that this knowledge drives the semantic technology (ontologies) being developed to move data with machine-readable knowledge in Earth and planetary science.
What we'll do:

In the first half hour of this session, we'll a) sketch out how and why we build ontologies and b) show you how to request that your knowledge gets added to ontologies (with nanocrediting).
We'll then have a 30-minute crowdsourcing jam session, during which participants can share their geoscience knowledge on the SWEET issue tracker. With a simple post, you can shape how the semantic layer will behave, making sure it does your field justice! Request content and share knowledge here: https://github.com/ESIPFed/sweet/issues
In the last, 30 minutes we'll take one request and demonstrate how we go about ""ontologising"" it in ENVO and how we link that to SWEET to create interoperable ontologies across the Earth and life sciences.

Come join us and help us shape the future of Geo-semantics!

Stuff you'll need:
A GitHub account available at https://github.com/
An ORCID (for nanocrediting your contributions) available at https://orcid.org How to Prepare for this Session:

Presentations:

View Recording:
https://youtu.be/tr0coi5ZQvM

Takeaways
  • Working toward a future (5-10 year goal) of making an open Earth & Space Science Foundry (from SWEET) similar to the OBO (Open Biological and Biomedical Ontology) Foundry. “Humans write queries”. Class definitions need to be machine-readable for interoperability, but must remain human-readable for authoring queries, ontology reuse, etc.
  • Please feel free to add phenomena of interest to the SWEET https://github.com/ESIPFed/sweet/issues/ or ENVO https://github.com/EnvironmentOntology/envo/issues/ issue trackers. 
  • At AGU they added a convention for changes to ontologies. Class level annotation convention. Can get now get textual defs from DBpedia for SWEET terms. See https://github.com/ESIPFed/sweet/wiki/SWEET-Class-Annotation-Convention


Speakers
avatar for Lewis McGibbney

Lewis McGibbney

Chair, ESIP Semantic Technologies Committee, NASA, JPL
My name is Lewis John McGibbney, I am currently a Data Scientist at the NASA Jet Propulsion Laboratory in Pasadena, California where I work in Computer Science and Data Intensive Applications. I enjoy floating up and down the tide of technologies @ The Apache Software Foundation having... Read More →


Tuesday January 7, 2020 2:00pm - 3:30pm
Glen Echo

2:00pm

Data Skills & Competencies Requirements for Data Stewards: Views from the ESIP Community & Beyond
At the ESIP Summer 2019, many ESIP community members offered their feedback on the range and importance of skills and competencies for data specialists whose job responsibilities focus upon offering data "advise" (e.g., from data curators) and data "service providers" (e.g., from data librarians). By means of an interactive poster, participants were asked to choose whether a competency was of high, medium, low or no importance from a subset of competencies identified by a European Open Science Cloud (EOSC) project. In this session, session leaders will present the results of the ESIP community feedback within the context of the full list of EOSC competencies, and visualized from both a poster synthesis and a research data lifecycle point of view. Session leaders are hoping to have the audience participate by providing feedback and engaging in discussion on the data and views presented. One outcome of this work will be a "Career Compass" to be published by the American Geoscience Institute for students interested in becoming data stewards. How to Prepare for this Session:

Presentations:

View Recording: https://youtu.be/1s1L3Jter8w

Takeaways



Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services & Information Technology, University of New Mexico
For over 33 years Karl Benedict has had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when he arrived at UNM he has worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director... Read More →


Tuesday January 7, 2020 2:00pm - 3:30pm
Linden Oak

2:00pm

COPDESS: Facilitating a Fair Publishing Workflow Ecosystem
COPDESS, the Coalition for Publishing Data in the Earth and Space Sciences (https://copdess.org/), was established in October 2014 as a platform for Earth and Space Science publishers and data repositories to jointly define, implement, and promote common policies and procedures for the publication and citation of data and other research results (e.g., samples, software, etc.) across Earth Science journals. In late 2018, COPDESS became a cluster of ESIP to give the initiative the needed sustainability to support a long-term FAIR publishing workflow ecosystem and be a springboard to pursue future enhancements of it.

In 2017, with funding from the Arnold Foundation, the ‘Enabling FAIR Data Project’ (https://copdess.org/enabling-fair-data-project/) moved mountains towards implementing the policies and standards that connect researchers, publishers, and data repositories in their desire to accelerate scientific discovery through open and FAIR data. Implementation of the new FAIR policies has advanced rapidly across Earth, Space, and Environmental journals, but supporting infrastructure, guidelines, and training for researchers, publishers, and data repositories has yet to catch up. The primary challenges are:
  • Repositories struggle to keep up with the demands of researchers, who want to be able to instantly deposit data and obtain a DOI, without considering the data quality/data ingest requirements and review procedures of individual repositories - producing a situation where data publication is inconsistent in quality and content.
  • Many publishers who have signed the Commitment Statement for FAIR Data (https://copdess.org/enabling-fair-data-project/commitment-statement-in-the-earth-space-and-environmental-sciences/) agree with it at a high, conceptual level. However, many journal editors and reviewers lack clarity on how to validate that datasets, which underpin scholarly publications, conform with the Commitment Statement.
  • Researchers experience confusion, and in some cases barriers to publication of their papers whilst they try and meet the requirements of the commitment statement. Clarity of requirements, timelines, and criteria for selecting repositories are needed to minimize the barriers to the joint publication of papers and associated data.

Funders have a role to play, in that they need to allow for time and resources required to curate data and ensure compliance, particularly WRT to the assignment of valid DOIs. Funders can also begin to reward those researchers who do take the effort to properly manage and make their data available, in a similar way to how they reward scholarly publications and citation of those publications.

The goal of this session is to start a conversation on developing an integrated publishing workflow ecosystem the seamlessly integrates researchers, repositories, publishers and funders. Perspectives from all viewpoints will be presented.

Notes document: https://docs.google.com/document/d/12M0F6mcUZSn2GdBN-Id__smXhYxbLzKDrAViPAgnH6w/edit?usp=sharing

Presentations:

View Recording: https://youtu.be/x6a1QRNbifQ

Takeaways
  • COPDESS has moved to ESIP as a cluster to ensure the sustainability of the project to address the publishing & citation of research data



Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services & Information Technology, University of New Mexico
For over 33 years Karl Benedict has had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when he arrived at UNM he has worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director... Read More →
avatar for Kerstin Lehnert

Kerstin Lehnert

President, IGSN e.V.
Kerstin Lehnert is Senior Research Scientist at the Lamont-Doherty Earth Observatory of Columbia University and Director of the NSF-funded data facility IEDA (Interdisciplinary Earth Data Alliance). Kerstin holds a Ph.D in Petrology from the University of Freiburg in Germany.Over... Read More →
LW

Lesley Wyborn

Australian National University


Tuesday January 7, 2020 2:00pm - 3:30pm
Salon A-C

2:00pm

Current Data that are available on the Cloud
NASA, NOAA and USGS are in the process of moving data onto the cloud. While they have discussed what types of services are available and future plans of what data can be found, it is not completely clear what datasets users can currently access. This session will go over what datasets are currently up in the cloud and what data to expect in the near future. This way as users are transitioning to the cloud for their compute, they can also know what data are available to them on the cloud as well. There will also be presentations from AWS. Speakers:
Katie Baynes - NASA/EOSDIS
Jon O'Neil - NOAA
Jeff de La Beaujardiere - NCAR
Kristi Kliene - USGS/EROS
Joe Flasher - AWS

Presentations: See attached.

View Recording: https://youtu.be/yssgXB7iaxw

Takeaways
  • Petabyte scale data is being moved into the cloud. This is concentrated in AWS, Google Cloud and Microsoft depending on the agency and dataset
  • Some concern around partnerships with companies (AWS most discussed) in terms of long term relationships, moving data etc. and how those things might impact access or data use
  • Need to make clear the authoritative source of the data, who is stewarding it, and any modifications done when copying to cloud. Users should exercise due diligence in selecting and using data.



Speakers
JO

Jon O'Neil

Director, NOAA Big Data Project, NOAA OCIO
avatar for Joe Flasher

Joe Flasher

Open Geospatial Data Lead, Amazon Web Services
Joe Flasher is the Open Geospatial Data Lead at Amazon Web Services helping organizations most effectively make data available for analysis in the cloud. The AWS open data program has democratized access to petabytes of data, including satellite imagery, genomic data, and data used... Read More →
avatar for Chris Lynnes

Chris Lynnes

System Architect, NASA/GSFC
avatar for Jessica Hausman

Jessica Hausman

Data Engineer, PO.DAAC JPL
avatar for Jeff de La Beaujardière

Jeff de La Beaujardière

Director, Information Systems Division, NCAR
Big data, cloud computing, object storage, data management.
avatar for Dave Meyer

Dave Meyer

GES DISC manager, NASA



Tuesday January 7, 2020 2:00pm - 3:30pm
White Flint

3:30pm

Networking Break
Tuesday January 7, 2020 3:30pm - 4:00pm
Salon A-C Foyer

4:00pm

Bringing Science Data Uncertainty Down to Earth - Sub-orbital, In Situ, and Beyond
In the Fall of 2019, the Information Quality Cluster (IQC) published a white paper entitled “Understanding the Various Perspectives of Earth Science Observational Data Uncertainty”. The intention of this paper is to provide a diversely sampled exposition of both prolific and unique policies and practices, applicable in an international context of diverse policies and working groups, made toward quantifying, characterizing, communicating and making use of uncertainty information throughout the diverse, cross-disciplinary Earth science data landscape; to these ends, the IQC addressed uncertainty information from the following four perspectives: Mathematical, Programmatic, User, and Observational. These perspectives affect policies and practices in a diverse international context, which in turn influence how uncertainty is quantified, characterized, communicated and utilized. The IQC is now in a scoping exercise to produce a follow-on paper that is intended to provide a set of recommendations and best practices regarding uncertainty information. It is our hope that we can consider and examine additional areas of opportunity with regard to the cross-domain and cross-disciplinary aspects of Earth science data. For instance, the existing white paper covers uncertainty information from the perspective of satellite-based remote sensing well, but does not adequately address the in situ or airborne (i.e., sub-orbital) perspective. This session intends to explore such opportunities to expand the scope of the IQC’s awareness of what is being done with regard to uncertainty information, while also providing participants and observers with an opportunity to weigh in on how best to move forward with the follow-on paper. How to Prepare for this Session:Agenda:
  1. "IQC Uncertainty White Paper Status Summary and Next Steps" - Presented by: David Moroni (15 minutes)
  2. "Uncertainty quantification for in situ ocean data: The S-MODE sub-orbital campaign" - Presented by: Fred Bingham (15 minutes)
  3. "Uncertainty Quantification for Spatio-Temporal Mapping of Argo Float Data" - Presented by Mikael Kuusela (20 minutes)
  4. Panel Discussion (35 minutes)
  5. Closing Comments (5 minutes)
Notes Page: https://docs.google.com/document/d/1vfYBK_DLTAt535kMZusTPVCBAjDqptvT0AA5D6oWrEc/edit?usp=sharing

Presentations:
https://doi.org/10.6084/m9.figshare.11553681.v1

View Recording: https://youtu.be/vC2O8FRgvck

Takeaways

Speakers
avatar for David Moroni

David Moroni

Data Stewardship and User Services Team Lead, Jet Propulsion Laboratory, Physical Oceanography Distributed Active Archive Center
I am a Senior Science Data Systems Engineer at the Jet Propulsion Laboratory and Data Stewardship and User Services Team Lead for the PO.DAAC Project, which provides users with data stewardship services including discovery, access, sub-setting, visualization, extraction, documentation... Read More →
avatar for Ge Peng

Ge Peng

Research Scholar, CICS-NC/NCEI
Dataset-centric scientific data stewardship, data quality management
FB

Fred Bingham

University of North Carolina at Wilmington
MK

Mikael Kuusela

Carnegie Mellon University


Tuesday January 7, 2020 4:00pm - 5:30pm
Forest Glen

4:00pm

ESIP/OGC Coverage Processing and Analysis Sprint Report-Out
Learn what came out of two days of sprinting on how to advance APIs for analytics on coverages, arrays, and gridded data.

Join the conversation on the Gitter Channel or check out issues on Github.

Presentations:

View Recording:
https://youtu.be/blWnKTlrgKY

Takeaways
  • Overview of OGC API Sprint: What is distinction between Features and Coverages? Answer can be self-referential. Looking to gather feedback at https://github.com/opengeospatial/ogc_api_coverages
  • Future API will be view of data. It will support feature view or coverage view as layers, functional render of underlying data.
  • Idea is to provide single, holistic API that lets you daisy-chain any level of complexity by combining modular sub-APIs into workflow ‘processes’.


Speakers
avatar for Ingo Simonis

Ingo Simonis

Director Innovation Programs & Science, OGC
Dr. Ingo Simonis is director of interoperability programs and science at the Open Geospatial Consortium (OGC), an international consortium of more than 525 companies, government agencies, research organizations, and universities participating in a consensus process to develop publicly... Read More →
avatar for George Percivall

George Percivall

CTO, Chief Engineer, Open Geospatial Consortium
As CTO and Chief Engineer of the Open Geospatial Consortium (OGC), George Percivall is responsible for the OGC Interoperability Program and the OGC Compliance Program. His roles include articulating OGC standards as a coherent architecture, as well as addressing implications of technology... Read More →


Tuesday January 7, 2020 4:00pm - 5:30pm
Glen Echo

4:00pm

Defining the Bull's Eye of Sample Metadata
In recent years, the integration of physical collections and samples into digital data infrastructure has received increased attention in the context of Open Science and FAIR research results. In order to support open, transparent, and reproducible science, physical samples need to be uniquely identified, findable in online catalogues, well documented, and linked to related data, publications, people, and other relevant digital information. Substantial progress has been made through wide-spread implementation of the IGSN as a persistent unique identifier. What is missing is the development and implementation of protocols and best practices for sample metadata. Effort to do this have shown that it is impossible to develop a common vocabulary that describes all samples collected: one size does not fit all and each domain e.g. soil scientists, volcanologists, cosmochemists, paleoclimate scientists, and granite researchers – to name a few examples - all have their own vocabularies. Yet there is a minimum set of attributes that are common to all samples, the ‘Bull’s Eye of sample metadata’. This session invites participants from all walks of earth and environmental science to help define what is the minimum set of attributes needed to describe physical samples that are at the heart of much of Earth and environmental research.

How to Prepare for this Session:
Participations should come with a list of the mimimum metadata requirements for their institutions or domains.  They should be prepared to give a brief introduction to their needs.

Session Agenda:
  1. Introduction to the issue
  2. Review of existing examples and discussion of the limitations
  3. Discuss minimal requirements; propose changes/addition
  4. Summarize outcomes and discuss next steps
Google doc with the current metadata list and proposed changes

Presentations: ​​​​

View Recording: https://youtu.be/bxhTmrNqkCA

Takeaways

Speakers
LW

Lesley Wyborn

Australian National University
avatar for Kerstin Lehnert

Kerstin Lehnert

President, IGSN e.V.
Kerstin Lehnert is Senior Research Scientist at the Lamont-Doherty Earth Observatory of Columbia University and Director of the NSF-funded data facility IEDA (Interdisciplinary Earth Data Alliance). Kerstin holds a Ph.D in Petrology from the University of Freiburg in Germany.Over... Read More →


Tuesday January 7, 2020 4:00pm - 5:30pm
Linden Oak

4:00pm

Schema.org - Developing a Plan to Govern science-on-schema.org
This session will walkthrough the ESIP Github repository at https://github.com/ESIPFed/science-on-schema.org
Discussion:
* How do we govern as a cluster?
* Monitoring updates to schema.org?
* Strategies for proposing changes to core schema.org?
* Extensions at geoschemas.org

How to Prepare for this Session: Review the contents of https://github.com/ESIPFed/science-on-schema.org

Notes: https://doi.org/10.6084/m9.figshare.11542068.v1

View Recording: https://youtu.be/jPeuyOeIKzg

Takeaways
  • Governance issue for cluster: using Github issues to manage development is this suitable or do people need translation from github issues/json blobs into english for non-technical users.
  • TODO item potentially for COR: create and or use existing CC license ontology to reference urls to licenses properly and unambiguously.
  • Decision made to update the guidance documents to recommend appropriate ontology (e.g., CC licenses) and if nothing else exists use spec URL. License vocab https://spdx.org/licenses/ (text or URL)



Speakers
avatar for Adam Shepherd

Adam Shepherd

Technical Director, Co-PI, WHOI
schema.org | Data Containerization | Linked Data | Semantic Web | Knowledge Representation | Ontologies


Tuesday January 7, 2020 4:00pm - 5:30pm
Salon A-C
  • Skill Level Jump In, Deep Dive
  • Keywords Semantics
  • Collaboration Area Tags Schema.org
  • Remote Participation Link: https://global.gotomeeting.com/join/195545333
  • Remote Participation Phone #: (571) 317-3129
  • Remote Participation Access Code 195-545-333
  • Additional Phone #'s: Australia: +61 2 8355 1050 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 923 17 0568 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 15 360 728 Italy: +39 0 230 57 81 42 Netherlands: +31 207 941 377 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 912 71 8491 Sweden: +46 853 527 836 Switzerland: +41 225 4599 78 United Kingdom: +44 330 221 0088

4:00pm

Experiences Migrating Mission Scale Data in the Cloud
We will describe our project to upload a 2.4 PB dataset encapsulated into ~80K fused files from the 5 instruments on the Terra satellite into NASA AWS S3.
We will share the bottlenecks points and lessons learned during this process and expect to share experiences with similar projects in order to understand the best practices and collect guidelines for future projects that are adopting cloud solutions for their data needs.

We'll discuss data volumes, data integrity strategies for migration, S3 bucket organization, metadata curation, transfer rates, transfer pipelines, etc. We will also discuss and share data access patterns, costs, and architectures and how we can construct guidelines for access to these datasets efficiently.

We encourage the discussion among different projects that faced similar processes or are looking to migrate their datasets into the cloud.

https://drive.google.com/file/d/1fts06XDM2dbZxxljBTpplCEMSiTqfp6t/view?usp=sharing

Presentations:
https://doi.org/10.6084/m9.figshare.11553147.v1

View Recording: https://youtu.be/1xVJghJI4Gg

Takeaways
  • Project required/used a combination of NSF, NASA and AWS resources. Some interesting discussion around AWS or other cloud services as a stand in or follow on to limited term NSF assets
  • Some interesting discussion of tailoring to appropriate end users- wide range of potential users and thus requirements for the dataset. This includes access guidelines, user capabilities etc.
  • Project aimed to make a paradigm shift from understanding/observing physical processes to a full climate observing objective



Speakers
avatar for Ben Galewsky

Ben Galewsky

Research Programmer, National Center for Supercomputing Applications


Tuesday January 7, 2020 4:00pm - 5:30pm
White Flint
 
Wednesday, January 8
 

8:30am

State of ESIP
Wednesday January 8, 2020 8:30am - 10:30am
Salon A-C

10:30am

Networking Break
Wednesday January 8, 2020 10:30am - 11:00am
Salon A-C Foyer

11:00am

Accelerating convergence of earth and space data in teaching and learning through participatory design.
Bringing remote sensing and astronomical data to life for students is a challenge for earth and space science educators. This session will engage teachers and scientists in a participatory design process that will demonstrate the power of data science, identify challenges in teaching and learning, and seek pathways to develop next generation tools and curricula to close the gap between science practice and education. This workshop extends an NSF convergence accelerator for earth and space data and will also help inform an upcoming NSF-funded workshop titled: Data Science for High School Computer Science: Identifying Needs, Gaps and Resources.
We are proposing a working session, working directly with teachers on tool development using a participatory design kind of approach. The ESIP Education Committee is working to identify DC-area schools to work with over the long term, and this session could be a good first step in that relationship. For this workshop, a minimum of three DC-area teachers will work with ESIP Education Committee members and facilitators.

How to Prepare for this Session:

Presentations: https://doi.org/10.6084/m9.figshare.11591211.v1

View Recording: https://youtu.be/xSjLF_TbV30

Takeaways
  • There are many tools that already exist but they need to be more easily connected to the curriculum
  • There are constraints to which tools schools can use because they cannot have blogging features and present other security risks. Also, they have limited technological availability



Speakers
avatar for Shelly Olds

Shelly Olds

Science Education Specialist, UNAVCO
Data visualization tools, Earth science education, human dimensions of natural hazards, disaster risk reduction (DRR), resilience building.
avatar for Becky Reid

Becky Reid

Science Educator, Learners Without Walls
I discovered ESIP in the summer of 2009 when I was teaching science in Santa Barbara and attended the Summer meeting there. Ever since then, I have been volunteering with the ESIP Education Committee in various capacities, serving as Chair in 2013, 2019, and now, 2020! I currently... Read More →


Wednesday January 8, 2020 11:00am - 12:30pm
Brookside A

11:00am

Software Sustainability, Discovery and Accreditation
It is commonly understood that software is essential to research, in data collection, curation, analysis, and understanding, and it is also a critical element within any research infrastructure. This session will address two related software issues: 1) sustainability, and 2) discovery and accreditation.

Because scientific software is an instance of a software stack containing problem-specific software, discipline-specific tools, general tools and middleware, and infrastructural software, changes within the stack can cause the overall software to collapse and stop working, and as time goes on, work is increasingly needed to compensate for these problems, which we refer to as sustainability. Issues in which we are interested include incentives that encourage sustainability activities, business models for sustainability (including public-private partnership), software design that can reduce the sustainability burden, and metrics to measure sustainability (perhaps tied to the on-going process of defining FAIR software).

The second issue, discovery and accreditation, asks how we enable users to discover and access trustworthy and fit-for-purpose software to undertake science processing on the compute infrastructures to which they have access? And how do we ensure that publications cite the exact version of software that was used and is cited and properly credited the responsible authors?

This session will include a number of short talks, and at least two breakouts in parallel, one about the sustainability of software, and a second about discovery of sustainable and viable solutions.

Potential speakers who want to talk about an aspect of software sustainability, discovery, or accreditation should contact the session organizers.

Agenda/slides:
Presentations: See above

View Recording:
https://youtu.be/nsxjOC04JxQ

Key takeaways:

1. Funding agencies spend a large amount of money on software, but don't always know this because it's not something that they track.

OpenSource software is growing very quickly:
  • 2001: 208K SourceForge users
  • 2017: 20M GitHub users
  • 2019: 37M Github users
Software, like data, is a “first class citizen” in the ecosystem of tools and resources for scientific research and our community is accelerating their attention to this as they have for FAIR data


2. Ideas for changing our culture to better support and reward contributions to sustainable software:
  • Citation (ESIP guidelines) and/or software heritage IDs for credit and usage metrics and to meet publisher requirements (e.g. AGU)
  • Prizes
  • Incentives in hiring and promotion
  • Promote FAIR principles and/or Technical Readiness Levels for software
  • Increased use to make science more efficient through common software
  • Publish best practice materials in other languages, e.g. Mandarin, as software comes from a global community


3. A checklist of topics to consider for your community sustained software:
  • Repository with “cookie cutter” templates and sketches for forking
  • Licensing
  • Contributors Guide
  • Code of Conduct and Governance
  • Use of “Self-Documentation” features and standards
  • Easy step for trying out software
  • Continuous Integration builds
  • Unit tests
  • Good set of “known first issues” for new users trying out the software
  • Gitter or Slack Channel for feedback and communication, beyond a simple repo issues queue


Detailed notes:
The group then divided into 2 breakout sessions (Sustainability; Discovery and Accreditation), with notes as follows.

Notes from Sustainability breakout (by Daniel S. Katz):

What we think should be done:
  • Build a cookiecutter recipe for new projects, based on Ben’s slides?  What part of ESIP would be interested in this? And would do it, and support it?
  • Define governance as part of this? How do we store governance?
  • What is required, what is optional (maybe with different answers at different tiers)
  • Define types of projects (individual developer, community code, …)
  • Define for different languages – tooling needs to match needs
  • Is this specific to ESIP? Who could it be done with? The Carpentries?  SSI?

Other discussion:
  • What do we mean by sustainability – for how long?  Up to 50 years?  How do we run the system?
  • What’s the purpose of the software (use case) – transparency to see the software, actual reuse?
  • What about research objects that contain both software and data? How do we archive them? How do we cite them?
  • We have some overlap with research object citation cluster


Notes from Discovery and Accreditation breakout (by Shelley Stall):

Use Cases - Discovery
  1. science question- looking for software to support
  2. have some data output from a software process, need to gain access to the software to better understand the data.   

Example of work happening: Data and Software Preservation - NSF Funded
  • promote linked data to other research products
  • similar project in Australia - want to gain access to the chain of events that resulted in the data and/or software - the scientific drivers that resulted in this product
  • Provenance information is part of this concept.

A deeper look at discovery, once software is found, is to better understand how the software came into being. It is important to know the undocumented elements of a process that effected/impacted the chain of events that are useful information to understand for a particular piece of software.
How do we discover existing packages?
Dependency management helps to discover new elements that support software.
Concern expressed that packaged solution for creating an environment, like “AWS/AMI”, are not recognized as good enough, that an editor requested a d

Speakers
avatar for Daniel S. Katz

Daniel S. Katz

Assistant Dir. for Scientific Software & Applications, NCSA; Research Assoc. Prof., CS, ECE, iSchool, University of Illinois at Champaign-Urbana
LW

Lesley Wyborn

Australian National University


Wednesday January 8, 2020 11:00am - 12:30pm
Forest Glen

11:00am

Pangeo in Action
The NSF-funded Pangeo project (http://pangeo.io/) is a community-driven architectural framework for big data geoscience. A typical Pangeo software stack leverages Python open-development libraries including elements such as Jupyter Notebooks for interactive data analysis, Intake catalogs to provide a higher level of abstraction, Dask for scalable, parallelized data access, and Xarray for working with labeled multi-dimensional arrays of data, and can support data formats including NetCDF as well the cloud-optimized Zarr format for chunked, compressed, N-dimensional arrays.

This session includes presentations describing implementations, results, or lessons learned from using these tools, as well as some time for open discussion. We encourage attendance by people interested in knowing more about Pangeo.

Draft schedule:
Dr. Amanda Tan, U. Washington: Pangeo overview and lessons learned
Dr. Rich Signell, USGS: The USGS EarthMap Pangeo: Success Stories and Lessons Learned
Dr. Jeff de La Beaujardière, NCAR: Climate model outputs on AWS using Pangeo framework
Dr. Karl Benedict, UNM: Pangeo as a platform for workshops
Open discussion

How to Prepare for this Session:

Presentations:
https://doi.org/10.6084/m9.figshare.11559174.v1

View Recording: https://youtu.be/VNfpGIIjL3E.

Takeaways
  • Pangeo is a community platform for Big Data geoscience; A cohesive ecosystem of open community, open source software, open ecosystem; Three core python packages: jupyter, xarray, Dask
  • Deploying Pangeo on cloud face challenges
    • Cloud costs
    • Cloud skills
    • Need of cloud-optimized data
    • Best strategy of pangeo deployment in the changing cloud services platform
  • Pangeo can be applied to leverage the jupyter notebook and other resources for different level of data users (NCAR: scientists new to cloud computing platform; University of New Mexico: workshop platform etc)

Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services & Information Technology, University of New Mexico
For over 33 years Karl Benedict has had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when he arrived at UNM he has worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director... Read More →
avatar for Rich Signell

Rich Signell

Oceanographer, USGS
Ocean Modeling, Python, NetCDF, THREDDS, ERDDAP, UGRID, SGRID, CF-Conventions, Jupyter, JupyterHub, CSW, TerriaJS
avatar for Amanda Tan

Amanda Tan

Data Scientist, University of Washington
Cloud computing, distributed systems
avatar for Jeff de La Beaujardière

Jeff de La Beaujardière

Director, Information Systems Division, NCAR
Big data, cloud computing, object storage, data management.



Wednesday January 8, 2020 11:00am - 12:30pm
Linden Oak

11:00am

FAIRtool.org, Serverless workflows for cubesats, Geoweaver ML workflow management, 3D printed weather stations
Come hear what ESIP Lab PIs have built over the past year. Speakers include:

Abdullah Alowairdhi: FAIRTool Project Update
Ziheng Sun: Geoweaver Project
Amanda Tan: Serverless Workflow Project
Agbeli Ameko: 3D-Printed Weather Stations

Presentations:
https://doi.org/10.6084/m9.figshare.11626284.v1

View Recording: https://youtu.be/vrRwEQRAIZ4

Takeaways



Speakers
avatar for Amanda Tan

Amanda Tan

Data Scientist, University of Washington
Cloud computing, distributed systems
avatar for Abdullah Alowairdhi

Abdullah Alowairdhi

PhD Candedate, U of Idaho
avatar for Ziheng Sun

Ziheng Sun

Research Assistant Professor, George Mason University
My research interests are mainly on geospatial cyberinfrastructure and agricultural remote sensing.
avatar for Annie Burgess

Annie Burgess

ESIP Lab Director, ESIP


Wednesday January 8, 2020 11:00am - 12:30pm
Salon A-C
  • Skill Level Skim the Surface, Jump In
  • Keywords Cloud Computing, Machine Learning
  • Collaboration Area Tags Science Software
  • Remote Participation Link: https://global.gotomeeting.com/join/195545333
  • Remote Participation Phone #: (571) 317-3129
  • Remote Participation Access Code 195-545-333
  • Additional Phone #'s: Australia: +61 2 8355 1050 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 923 17 0568 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 15 360 728 Italy: +39 0 230 57 81 42 Netherlands: +31 207 941 377 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 912 71 8491 Sweden: +46 853 527 836 Switzerland: +41 225 4599 78 United Kingdom: +44 330 221 0088

11:00am

Earth Observation Process and Application Discovery, Machine Learning, and Federated Cloud Analytics: Putting data to work using OGC Standards
This session provides an overview of the results from the recent OGC Research & Development initiative Testbed-15. The 9-months 5M USD initiative addressed six different topics, Earth Observation Process and Application Discovery, Machine Learning, Federated Cloud Analytics, Open Portrayal Framework, Delta Updates, and Data Centric Security. This session focuses on the results produced by the first three.

Earth Observation Process and Application Discovery developed draft specifications and models for discovery of cloud-provided process and applications. This was achieved by extending existing standards with process and application specific extensions. Now, data processing software can be made available as a service, discovered using catalog interfaces, and executed on demand by customers. This allows to locate the process execution physically close to the data and reduces data transport overheads.

The Machine Learning research developed models in the areas of earth observation data processing, image classification, feature extraction and segmentation, vector attribution, discovery and cataloguing, forest inventory management & optimization, and semantic web-link building and triple generation. Both model discovery and access took place through standardized interfaces.

The Federated Cloud Analytics research analysed how to handle data and processing capacities that are provided by individual cloud environments transparently to the user. The research included how federated membership, resource, and access policy management can be provided within a security environment, while also providing portability and interoperability to all stakeholders. Additionally, the initiative conducted a study of the application of Distributed Ledger Technologies (DLTs), and more specifically Blockchains, for managing provenance information in Federated Cloud.

The other three topics will be briefly introduced in addition. The Open Portrayal Framework provides a fully interoperable portrayal and styling suite of standards. Here, the initiative developed new OGC APIs for styles, maps, images, and tiles. Delta updates explored incremental updates and thus reduced communication payloads between clients and servers, whereas the Data Centric Security thread examined the use of encrypted container formats on standard metadata bindings. How to Prepare for this Session: Al results will be made available as public Engineering Reports that provide full details. These become stepwise available at http://docs.opengeospatial.org/per/

Presentations:
https://doi.org/10.6084/m9.figshare.11551563.v1

View Recording: https://youtu.be/ojMrcIE-SgE

Takeaways
  • OGC innovation program: Test fitness for purpose of geospatial community initiatives. TESTBED-15 concluded last November results available soon from document repository. End to end cloud pipeline for data processing and analytics. Call for TESTBED-16 due Feb 9th 2020! 1.6M in funding available. Three major threads: earth observation clouds, data integration and analytics, and modeling and packaging. 
  • Way to synergize between needs of user communities competing and collaborating projects, contributing to a more interoperable world. Provides applications, process and catalogues for data processing. 
  • Testbeds center around an exploitation/processing platform (for data with relevant applications) like an application market with cloud services. Having some trouble finding application developers. Finding web services with relevant data can be problematic.



Speakers
avatar for Ingo Simonis

Ingo Simonis

Director Innovation Programs & Science, OGC
Dr. Ingo Simonis is director of interoperability programs and science at the Open Geospatial Consortium (OGC), an international consortium of more than 525 companies, government agencies, research organizations, and universities participating in a consensus process to develop publicly... Read More →


Wednesday January 8, 2020 11:00am - 12:30pm
White Flint

12:30pm

Lunch | Peer Recognition Ceremony
Wednesday January 8, 2020 12:30pm - 2:00pm
Salon D

2:00pm

Participatory design and evaluation of a 3D-Printed Automatic Weather Station to explore hardware, software and data needs for community-driven decision making
The development of low-cost, 3D-printed weather stations aims to revolutionize the way communities collect long-term data about local weather phenomenon, as well as develop climate resilience strategies to adapt to the impacts of increasingly uncertain climate trends. This session will engage teachers and scientists in the evaluation and participatory design of the IoTwx 3D-printed weather station that is designed to be constructed and extended by students in middle and high school. We aim to explore the full spectrum of the station from construction (from pre-printed parts), to data collection and development of learning activities, to analysis of scientific phenomenon within the data. The stations also represent a unique opportunity to develop community-based strategies to extend the capabilities of the platform, and in the session we are encouraging full discussion of data collection and sensing technologies of specific relevance to communities adopting the stations.

In this working session, we will work directly with teachers on evaluation and development using a participatory design approach to stimulate and encourage relationships between ESIP Education Committee members and teachers.

Preparing for this Session: TBD

Presentations:

View Recording: https://youtu.be/AfvWhZBkQd8

Takeaways
  • Very valuable for the schools and community. It is an opportunity to include multiple departments within the school system (engineering, computer science, maths, earth science, etc.)
  • Need to understand the constraints that school systems may present: security, wifi, processing power, cloud access, only required for part of the year



Speakers
avatar for Shelly Olds

Shelly Olds

Science Education Specialist, UNAVCO
Data visualization tools, Earth science education, human dimensions of natural hazards, disaster risk reduction (DRR), resilience building.
avatar for Becky Reid

Becky Reid

Science Educator, Learners Without Walls
I discovered ESIP in the summer of 2009 when I was teaching science in Santa Barbara and attended the Summer meeting there. Ever since then, I have been volunteering with the ESIP Education Committee in various capacities, serving as Chair in 2013, 2019, and now, 2020! I currently... Read More →


Wednesday January 8, 2020 2:00pm - 3:30pm
Brookside A

2:00pm

FAIR Laboratory Instrumentation, Analytical Procedures, and Data Quality
Acquisition and analysis of data in the laboratory are pervasive in the Earth, environmental, and planetary sciences. Analytical and experimental laboratory data, often acquired with sophisticated and expensive instrumentation, are fundamental for understanding past, present, and future processes in natural systems, from the interior of the Earth to its surface environments on land, in the oceans, and in the air, to the entire solar system. Despite the importance of provenance information for analytical data including, for example, sample preparation or experimental set up, instrument type and configuration, calibration, data reduction, and analytical uncertainties, there are no consistent community-endorsed best practices and protocols for describing, identifying, and citing laboratory instrumentation and analytical procedures, and documenting data quality. This session is intended as a kick-off working session to engage researchers, data managers, and system engineers, to contribute ideas how to move forward with and accelerate the development of global standard protocols and the promulgation of best practices for analytical laboratory data. How to Prepare for this Session:

Presentations:

View Recording:
https://youtu.be/LOfb_4r7DBA

Takeaways
  • Analytical and experimental data are collected widely in both the field and laboratory settings from a variety of earth environmental and planetary sciences, spanning a variety of disciplines. FAIR use of such data is dependent of data provenance. 
  • Need community exchange of such data consider use of data is broader than the original use of data in the domain. Brings to mind interoperability of such data. Need networks of these data to be plugged into evolving CI systems. In seismology a common standard for data implemented by early visionaries was a massive boon to the field. 
  • Documentation of how analytical data were generated is time consuming for data curators providers etc. Having standards/protocols for data exchange protocols is urgently required for emerging global data networks. OneGeochemistry as example use case for international research group to establish a global network for discoverable geochemical data.


Speakers
LW

Lesley Wyborn

Australian National University
avatar for Kerstin Lehnert

Kerstin Lehnert

President, IGSN e.V.
Kerstin Lehnert is Senior Research Scientist at the Lamont-Doherty Earth Observatory of Columbia University and Director of the NSF-funded data facility IEDA (Interdisciplinary Earth Data Alliance). Kerstin holds a Ph.D in Petrology from the University of Freiburg in Germany.Over... Read More →


Wednesday January 8, 2020 2:00pm - 3:30pm
Forest Glen

2:00pm

Citizen Science Data and Information Quality
The ESIP Information Quality Cluster (IQC) has formally defined information quality as a combination of the following four aspects of quality, spanning the full life cycle of data products: scientific quality, product quality, stewardship quality, and service quality. Focus of the IQC has been quality of Earth science data captured by scientists/experts. For example, the whitepaper “Understanding the Various Perspectives of Earth Science Observational Data Uncertainty”, published by IQC in the fall of 2019, mainly addresses uncertainty information from the perspective of satellite-based remote sensing. With the advance of mobile computing technologies, including smart phones, Citizen Science (CS) data have been increasingly becoming more and more important sources for Earth science research. CS data have their own unique challenges regarding data quality, compared with data captured through traditional scientific approaches. The purpose of this session is to broaden the scope of IQC efforts, present the community with the state-of-the-art of research on CS data quality, and foster a collaborative interchange of technical information intended to help advance the assessment, improvement, capturing, conveying, and use of quality information associated with CS data. This session will summarize the scope of what we mean by CS data (including examples of platforms/sensors commonly used in collecting CS data) and include presentations from both past and current CS projects focusing on the topics such as challenges with CS data quality; strategies to assess, ensure, and improve CS data quality; approaches to capturing CS data quality information and conveying it to users; and use of CS data quality information for scientific discovery. 

Agenda (Click titles to view presentations)
  1. Introduction - Yaxing Wei - 5 mins
  2. Citizen Science Data Quality: The GLOBE Program – Helen M. Amos (NASA GSFC) – 18 (15+3) mins.
  3. Can we trust the power of the crowd? A look at citizen science data quality from NOAA case studies - Laura Oremland (NOAA) – 18 (15+3) mins.
  4. Turning Citizen Science into Community Science - Stephen C. Diggs (Scripps Institution of Oceanography / UCSD) and Andrea Thomer (University of Michigan)  – 18 (15+3) mins.
  5. Earth Challenge 2020: Understanding and Designing for Data Quality at Scale - Anne Bowser (Wilson Center) – 18 (15+3) mins.
  6. Discussion and Key Takeaways – All – 13 mins.

    View Recording: https://youtu.be/xaTLP4wqwe8

    Takeaways

Notes Page:
https://docs.google.com/document/d/1lRp19SF9U727ureKjY38PHOF3EGUgE-BixYDs2KlmII/edit?usp=sharing

Presentation Abstracts

  • Citizen Science Data Quality: The GLOBE Program - Helen M. Amos (NASA GSFC)
The Global Learning and Observations to Benefit the Environment (GLOBE) Program is an international program that provides a way for students and the public to contribute Earth system observations. Currently 122 countries, more than 40,000 schools, and 200,000 citizen scientists are participating in GLOBE. Since 1995, participants have contributed 195 million observations. Modes of data collection and data entry have evolved with technology over the lifetime of the program, including the launch of the GLOBE Observer mobile app in 2016 to broaden access and public participation in data collection. GLOBE must meet the data needs of a diverse range of stakeholders, from elementary school classrooms to scientists across the globe, including NASA scientists. Operational quality assurance measures include participant training, adherence to standardized data collection protocols, range and logic checks, and an approval process for photos submitted with an observation. In this presentation, we will discuss the current state of operational data QA/QC, as well as additional QA/QC processes recently explored and future directions. 
  • Can we trust the power of the crowd? A look at citizen science data quality from NOAA case studies - Laura Oremland (NOAA)
NOAA has a rich history in citizen science dating back hundreds of years.  Today NOAA’s citizen science covers a wide range of topics such as weather, oceans, and fisheries with volunteers contributing over 500,000 hours annually to these projects. The data are used to enhance NOAA’s science and monitoring programs.   But how do we know we can trust these volunteer-based efforts to provide data that reflect the high standards of NOAA’s scientific enterprise? This talk will provide an overview of NOAA’s citizen science, describe the data quality assurance and quality control processes applied to different programs, and summarize common themes and recommendations for collecting high quality citizen science data. 
  • Earth Challenge 2020: Understanding and Designing for Data Quality at Scale - Anne Bowser (Wilson Center)
April 22nd, 2020 marks the 50th anniversary of Earth day.  In recognition of this milestone Earth Day Network, the Woodrow Wilson International Center for Scholars, and the U.S. Department of State are launching Earth Challenge 2020 as the world’s largest coordinated citizen science campaign.  For 2020, the project focuses on six priority areas: air quality, water quality, insect populations, plastics pollution, food security, and climate change.  For each of these six areas, one work stream will focus on collaborating with existing citizen science projects to increase the amount of open and findable, accessible, interoperable, and reusable (FAIR) data.  A second work stream will focus on designing tools to support both existing and new citizen science activities, including a mobile application for data collection; an open, API-enabled data integration platform; data visualization tools; and, a metadata repository and data journal.
A primary value of Earth Challenge 2020 is recognizing, and elevating, ongoing citizen science activities.  Our approach seeks first to document a range of data quality practices that citizen science projects are already using to help the global research and public policy community understand these practices and assess fitness-for-use.  This information will be captured primarily through the metadata repository and data journal.  In addition, we are leveraging a range of data quality solutions for the Earth Challenge 2020 mobile app, including designing automated data quality checks and leveraging a crowdsourcing platform for expert-based data validation that will help train machine learning (ML) support.  Many of the processes designed for Earth Challenge 2020 app data can also be applied to other citizen science data sets, so maintaining information on processing level, readiness level, and provenance is a critical concern.  The goal of this presentation is to offer an overview of key Earth Challenge 2020 data documentation and data quality practices before inviting the ESIP community to offer concrete feedback and support for future work.

Speakers
avatar for David Moroni

David Moroni

Data Stewardship and User Services Team Lead, Jet Propulsion Laboratory, Physical Oceanography Distributed Active Archive Center
I am a Senior Science Data Systems Engineer at the Jet Propulsion Laboratory and Data Stewardship and User Services Team Lead for the PO.DAAC Project, which provides users with data stewardship services including discovery, access, sub-setting, visualization, extraction, documentation... Read More →
avatar for Ge Peng

Ge Peng

Research Scholar, CICS-NC/NCEI
Dataset-centric scientific data stewardship, data quality management
avatar for Yaxing Wei

Yaxing Wei

Scientist, Oak Ridge National Laboratory


Wednesday January 8, 2020 2:00pm - 3:30pm
Linden Oak

2:00pm

AI for Augmenting Geospatial Information Discovery
Thanks to the rapid developments of hardware and computer science, we have seen a lot of exciting breakthroughs in self driving, voice recognition, street view recognition, cancer detection, check deposit, etc. Sooner or later the fire of AI will burn in Earth science field. Scientists need high-level automation to discover in-time accurate geospatial information from big amount of Earth observations, but few of the existing algorithms can ideally solve the sophisticated problems within automation. However, nowadays the transition from manual to automatic is actually undergoing gradually, a bit by a bit. Many early-bird researchers have started to transplant the AI theory and algorithms from computer science to GIScience, and a number of promising results have been achieved. In this session, we will invite speakers to talk about their experiences of using AI in geospatial information (GI) discovery. We will discuss all aspects of "AI for GI" such as the algorithms, technical frameworks, used tools & libraries, and model evaluation in various individual use case scenarios. How to Prepare for this Session: https://esip.figshare.com/articles/Geoweaver_for_Better_Deep_Learning_A_Review_of_Cyberinfrastructure/9037091
https://esip.figshare.com/articles/Some_Basics_of_Deep_Learning_in_Agriculture/7631615

Presentations:
https://doi.org/10.6084/m9.figshare.11626299.v1

View Recording: https://youtu.be/W0q8WiMw9Hs

Takeaways
  • There is a significant uptake of machine learning/artificial intelligence for earth science applications in the recent decade;
  • The challenge of machine learning applications for earth science domain includes:
    • the quality and availability of training data sets;
    • Requires a team with diverse skill background to implement the application
    • Need better understanding of the underlying mechanism of ML/AI models
  • There are many promising applications/ developments on streamlining the process and application of machine learning applications for different sectors of the society (weather monitoring, emergency responses, social good)



Speakers
avatar for Yuhan Rao

Yuhan Rao

Ph.D., North Carolina Institute for Climate Studies
avatar for Aimee Barciauskas

Aimee Barciauskas

Data engineer, Development Seed
avatar for Annie Burgess

Annie Burgess

ESIP Lab Director, ESIP
avatar for Rahul Ramachandran

Rahul Ramachandran

Project Manager, Sr. Research Scientist, NASA
avatar for Ziheng Sun

Ziheng Sun

Research Assistant Professor, George Mason University
My research interests are mainly on geospatial cyberinfrastructure and agricultural remote sensing.


Wednesday January 8, 2020 2:00pm - 3:30pm
Salon A-C

2:00pm

Advancing Data Integration approaches of the structured data web
Political, economic, social or scientific decision making is often based on integrated data from multiple sources across potentially many disciplines. To be useful, data need to be easy to discover and integrate.
This session will feature presentations highlighting recent breakthroughs and lessons learned from experimentation and implementation of open knowledge graph, linked data concepts and Discrete Global Grid Systems. Practicality and adoptability will be the emphasis - focusing on incremental opportunities that enable transformational capabilities using existing technologies. Best practices from the W3C Spatial Data on the Web Working Group, OGC Environmental Linked Features Interoperability Experiment, ESIP Science on Schema.org; implementation examples from Geoscience Australia, Ocean Leadership Consortium, USGS and other organisations will featured across the entire session.
This session will highlight how existing technologies and best practices can be combined to address important and common use cases that have been difficult if not impossible until recent developments. A follow up session will be used to seed future collaborative development through co-development, github issue creation, and open documentation generation.

How to Prepare for this Session: Review: https://opengeospatial.github.io/ELFIE/, https://github.com/ESIPFed/science-on-schema.org, https://www.w3.org/TR/sdw-bp/, and http://locationindex.org/.

Notes, links, and attendee contact info here.

View Recording: https://youtu.be/-raMt2Y1CdM

Session Agenda:
1.  2.00- 2.10,  Sylvain Grellet, Abdelfettah Feliachi, BRGM, France
'Linked data' the glue within interoperable information systems
“Our Environmental Information Systems are exposing environmental features, their monitoring systems and the observation they generate in an interoperable way (technical and semantic) for years. In Europe, there is even a legal obligation to such practices via the INSPIRE directive. However, the practice inducing data providers to set up services in a "Discovery > View > Download data" pattern hides data behind the services. This hinders data discovery and reuse. Linked Data on the Web Best Practices put this stack upside down and data is now back in the first line. This completely revamp the design and capacities of our Information Systems. We'll highlight the new data frontiers opened by such practices taking examples on the French National Groundwater Information Network”
View Slides: https://doi.org/10.6084/m9.figshare.11550570.v1

2.  2.10 - 2.20,  Adam Leadbetter, Rob Thomas, Marine Institute, Ireland
Using RDF Data Cubes for data visualization: an Irish pilot study for publishing environmental data to the semantic web
The Irish Wave and Weather Buoy Networks return metocean data at 5-60 minute intervals from 9 locations in the seas around Ireland. Outside of the Earth Sciences an example use case for these data is in supporting Blue Economy development and growth (e.g. renewable energy device development). The Marine Institute, as the operator of the buoy platforms, in partnership with the EU H2020 funded Open Government Intelligence project has published daily summary data from these buoys using the RDF DataCube model[1]. These daily statistics are available as Linked Data via a SPARQL endpoint making these data semantically interoperable and machine readable. This API underpins a pilot dashboard for data exploration and visualization. The dashboard presents the user with the ability to explore the data and derive plots for the historic summary data, while interactively subsetting from the full resolution data behind the statistics. Publishing environmental data with these technologies makes accessing environmental data available to developers outside those with Earth Science involvement and effectively lowers the entry bar for usage to those familiar with Linked Data technologies.
View Slides: https://doi.org/10.6084/m9.figshare.11550570.v1

3. 2.20 - 2.30,  Boyan Brodaric, Eric Boisvert, Geological Survey of Canada, Canada; David Blodgett, USGS, USA
Toward a Linked Water Data Infrastructure for North America
We will describe progress on a pilot project using Linked Data approaches to connect a wide variety of water-related information within Canada and the US, as well as across the shared border
View Slides: https://doi.org/10.6084/m9.figshare.11541984.v1

4.  2.30 - 2.40,  Dalia Varanka, E. Lynn Usery, USGS, USA
The Map as Knowledge Base; Integrating Linked Open Topographic Data from The National Map of the U.S. Geological Survey
This presentation describes the objectives, models, and approaches for a prototype system for cross-thematic topographic data integration based on semantic technology. The system framework offers a new perspectives on conceptual, logical, and physical system integration in contrast to widely used geographic information systems (GIS).
View Slides: https://doi.org/10.6084/m9.figshare.11541615.v1

5.  2.40 – 2.50,  Alistair Ritchie, Landcare, New Zealand
ELFIE at Landcare Research, New Zealand
Landcare Research, a New Zealand Government research institute, creates, manages and publishes a large set of observational and modelling data describing New Zealand’s land, soil, terrestrial biodiversity and invasive species. We are planning to use the findings of the ELFIE initiatives to guide the preparation of a default view of the data to help discovery (by Google), use (by web developers) and integration (into the large environmental data commons managed by other agencies). This integration will not only link data about the environment together, but will also expose more advanced data services. Initial work is focused on soil observation data, and the related scientific vocabularies, but we anticipate near universal application across our data holdings.
View Slides: https://doi.org/10.6084/m9.figshare.11550369.v1

6.  2.50 - 3.00,  Irina Bastrakova, Geoscience Australia, Australia
Location Index Project (Loc-I) – integration of data on people, business & the environment
Location Index (Loc-I) is a framework that provides a consistent way to seamlessly integrate data on people, business, and the environment.
Location Index aims to extend the characteristics of the foundation spatial data of taking geospatial data (multiple geographies) which is essential to support public safety and wellbeing, or critical for a national or government decision making that contributes significantly to economic, social and environmental sustainability and linking it with observational data. Through providing the infrastructure to suppo

Speakers
avatar for Jonathan Yu

Jonathan Yu

Research data scientist/architect, CSIRO
Jonathan is a data scientist/architect with the Environmental Informatics group in CSIRO. He has expertise in information and web architectures, data integration (particularly Linked Data), data analytics and visualisation. Dr Yu is currently the technical lead for the Loc-I project... Read More →
avatar for Dalia Varanka

Dalia Varanka

Research Physical Scientist, U.S. Geological Survey
Principle Investigator and Project Lead, The Map as Knowledge Base
AR

Alastair Richie

Landcare Research NZ
AL

Adam Leadbetter

Marine Institute
RT

Rob Thomas

Marine Institute
BB

Boyan Brodaric

Natural Resources Canada
EB

Eric Boisvert

Natural Resources Canada
avatar for Irina  Bastrakova

Irina Bastrakova

Director, Spatial Data Architecture, Geoscience Australia
I have been actively involved with international and national geoinformatics communities for more than 19 years. I am the Chair of the Australian and New Zealand Metadata Working Group. My particular interest is in developing and practical application of geoscientific and geospatial... Read More →



Wednesday January 8, 2020 2:00pm - 3:30pm
White Flint

3:30pm

Networking Break
Wednesday January 8, 2020 3:30pm - 4:00pm
Salon A-C Foyer

4:00pm

Emerging EnviroSensing Topics: Long-range, Low-power, Non-contact, Open-source Sensor Networks
Led by the ESIP EnviroSensing Cluster, this session is open to scientists, information managers, and technologists interested in the general topic of environmental sensing for science and management.

Rapid advances and decreasing costs in technology, as applied to environmental sensing systems, are promoting a shift from sparsely-distributed, single-mission observations toward employing affordable, high-fidelity, ecosystem monitoring networks driven by a need to forecast outcomes across timescales. In this session we will hear talks on new approaches to standing up long-range, low-power monitoring networks; the value(s) added by non-contact sensing (local-remote to satellite based sensing); as well as innovative sensor developments, including open-source approaches, that promote connectivity. The session will conclude with a 20-minute topical discussion open to all in attendance. How to Prepare for this Session:

List of speakers and presentation titles for this session:
  • Jacqueline Le Moigne: NASA
    Future Earth Science Measurements Using New Observing Strategies
  • David Coyle: USGS
    USGS NGWOS LPWAN Experiment: Leveraging LoRaWAN Sensor Platform Technologies
  • James Gallagher: OPeNDAP
    Sensors in Snowy Alpine Environments: Sensor Networks with LoRa, Progress Report
    View Slides: https://doi.org/10.6084/m9.figshare.11555784.v1 
  • Daniel Fuka: Va Tech
    Making Drones Interesting Again
    View Slides: https://doi.org/10.6084/m9.figshare.11663718.v1
  • Joseph Bell: USGS
    Deep-dive discussion after presentations. A topic of interest is documenting test efforts and the publication of peer-reviewed Test Reports

View Recording: https://youtu.be/dXTLqt-5Ai8

Takeaways
  • As monitoring expands across agencies and from point measures on the surface of the earth to monitoring using networks of satellites in space (internet of space) there is a growing need to increase communication among agencies and instrumentation alike
  • Inexpensive monitoring equipment is becoming readily available with large gains being made in the areas of function, reliability, and resolution/accuracy.
    • Market disruption
    • Edge -Computing (is this the current form of SDI-12-style monitoring?) local processing and storage, transmission of small/tiny data payloads
  • There appears to be a need across disciplines and agencies for a peer-reviewed test reports
    • Not resource intensive to publish
    • Available to all users (FAIR)
    • Provides details on test plan and provides test data whenever applicable.


Speakers
avatar for Joseph Bell

Joseph Bell

Hydrologist, USGS


Wednesday January 8, 2020 4:00pm - 5:30pm
Forest Glen

4:00pm

Developing, Using and Testing Tools to Assess Learning Resources from two Perspectives: the Teacher and the Learner
Session leaders will describe tools being developed to assess the learning resources in the ESIP"s Data Management Training Clearinghouse (DMTC) from the perspectives of both instructors and students. The feedback collected through these tools will aid in identifying and choosing resources appropriate for their needs. First efforts have been focused on using DataONE's EEVA tool to identify and adapt questions. Feedback will be requested from participants to help guide the content, look and feel of the tool. How to Prepare for this Session: Visiting ESIP's Data Management Training Clearinghouse (https://dmtclearinghouse.esipfed.org) would be helpful but not required for productive participation in the session.

Presentations:

View Recording: https://youtu.be/uc4tbjyePpI

Takeaways


Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services & Information Technology, University of New Mexico
For over 33 years Karl Benedict has had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when he arrived at UNM he has worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director... Read More →



Wednesday January 8, 2020 4:00pm - 5:30pm
Glen Echo

4:00pm

Citizen Science Data in Earth Science: Challenges and Opportunities
Citizen science is scientific data collection and research performed primarily or in part by non-professional and amateur scientists. Citizen science data has been used in a variety of the physical sciences, including physics, ecology, biology, and water quality. As volunteer-contributed datasets continue to grow, they represent a unique opportunity to collect and analyze earth-science data on spatial and temporal scales impossible to achieve by individual researchers. This session will explore the ways open citizen science data sets can be used in earth science research and some of the associated challenges and opportunities for the ESIP community to use and partner with citizen science organizations.

Speakers:View Recording: https://youtu.be/jTNgWZI6Cik

Takeaways


How to Prepare for this Session: https://www.nationalgeographic.org/encyclopedia/citizen-science/
http://www.earthsciweek.org/citizen-science

Speakers
avatar for Alexis Garretson

Alexis Garretson

Community Fellow, ESIP
avatar for Kelsey Breseman

Kelsey Breseman

Archiving Program Lead, Environmental Data & Governance Initiative
Governmental accountability around public data & the environment. Decentralized web. Intersection of tech & ethics & civics.


Wednesday January 8, 2020 4:00pm - 5:30pm
Linden Oak

4:00pm

Planning for new Agriculture and Climate Cluster focus area on automated agriculture with AI
The Agriculture and Climate (ACC) Cluster will host a planning session for a new focus area on automated agriculture and AI (""Agro-AI""). Some initial ideas on possible activities in this space were presented at the ACC October 2019 telecon, including those related to the “Data-to-Decisions” ESIP Lab project (https://www.esipfed.org/wp-content/uploads/2018/07/Wee.pdf). Currently, there are many initiatives and funding opportunities for automated agriculture with AI. The National Science Foundation, e.g., recently announced a program aimed at significantly advancing research in AI (https://www.nsf.gov/news/news_summ.jsp?cntn_id=299329&org=NSF&from=news), including, in its initial set of high-priority areas, “AI-Driven Innovation in Agriculture and the Food System.”
Among the topics for discussion in this planning session will be related proposal opportunities and sponsoring an ACC breakout session on agriculture and AI at the ESIP 2020 Summer Meeting. How to Prepare for this Session: TBD; there will be an intro presentation, prior to the group discussion. This presentation may be made available ahead of the meeting in the scheduled session page.

Presentations:

View Recording: https://youtu.be/GhnSINRFNBg

Takeaways
  • Next step 1: Conduct a survey of available dashboards, existing data, ML use cases, existing APIs
  • Next step 2: Decide on an example question for a use case
  • Next step 3: Define and survey potential users



Speakers
AA

Arif Albayrak

senior software Engineer, ADNET (GESDISC)
avatar for Bill Teng

Bill Teng

NASA GES DISC (ADNET)



Wednesday January 8, 2020 4:00pm - 5:30pm
Salon A-C

4:00pm

Structured data web and coverages integration working session
This working session will follow on the "Advancing Data Integration approaches of the structured data web” session and the Coverage Analytics sprint as an opportunity for those interested in building linked data information products that integrate spatial features, coverage data, and more. As such, inspiration will be drawn from projects like science on schema.org, the Environmental Linked Features Interoperability Experiment, the Australian Location Index, and those that session attendees take part in. Participants will self organize into use-case or technology focused groups to discuss and synthesize the outcomes of the sprint and structured data web session. Session outcomes could take a number of forms: linked data and web page mock ups, ideas and issues for OGC, W3C, or ESIP groups to consider, example data or use cases for relevant software development projects to consider, or work plans and proposals for suture ESIP work. The session format is expected to be fluid with an ideation and group formation exercise followed by structured discussion to explore a set of ideas then narrow on a focused valuable outcome. Participants will be encouraged to work together prior to the meeting to design and plan the session structure. Outcomes of the session will be reported at an Information Technology and Interoperability webinar in early 2020. How to Prepare for this Session: Attend the coverage sprint and the "Advancing Data Integration approaches of the structured data web" session.

Shared document for session here.

Full Notes: https://doi.org/10.6084/m9.figshare.11559087.v1

Presentations:

View Recording: https://youtu.be/u2x3I0cr46A

  • Takeaways
    Breakout session information interoperability committee and webinar series. See notes: https://docs.google.com/document/d/1LpcTMwP0mAD4G4Gb8mStI5uSDV61_qWPUkQ9nI1x1cI/edit?usp=sharing
  • Foster cross-project consistency via breakouts. Such as dealing with science on schema.org issue of Links to “in-band” linked (meta)data and “out of band” linked data. Content negotiation and in-band and out of band links Use blank nodes with link properties for rdf elements that are URI for out of band content. Identify in band links with sdo @id, out of band links with sdo:URL
  • Incorporating Spatial Coverages in Knowledge Graphs; Next Steps? Need to explore more on tessellations as an intermediate index. Will carry forward some of these ideas at the EDR SWG Will represent some of these ideas to the OGC-API Coverages SWG Will mention these ideas to the UFOKN Role of ‘spatial’ knowledge graphs Will spatial data analysis and transformation tools grow to adopt/support RDF as an underlying data structure for spatial information or will RDF continue to be a ‘view’ of existing (legacy) spatial data in GI systems?


Speakers
avatar for Adam Shepherd

Adam Shepherd

Technical Director, Co-PI, WHOI
schema.org | Data Containerization | Linked Data | Semantic Web | Knowledge Representation | Ontologies
avatar for Irina  Bastrakova

Irina Bastrakova

Director, Spatial Data Architecture, Geoscience Australia
I have been actively involved with international and national geoinformatics communities for more than 19 years. I am the Chair of the Australian and New Zealand Metadata Working Group. My particular interest is in developing and practical application of geoscientific and geospatial... Read More →
WF

William Francis

Geoscience Australia
avatar for Jonathan Yu

Jonathan Yu

Research data scientist/architect, CSIRO
Jonathan is a data scientist/architect with the Environmental Informatics group in CSIRO. He has expertise in information and web architectures, data integration (particularly Linked Data), data analytics and visualisation. Dr Yu is currently the technical lead for the Loc-I project... Read More →
DF

Doug Fils

Consortium for Ocean Leadership


Wednesday January 8, 2020 4:00pm - 5:30pm
White Flint

6:00pm

Poster & Demo Session
Wednesday January 8, 2020 6:00pm - 8:00pm
Salon A-C Foyer
 
Thursday, January 9
 

8:30am

Plenary Talks
View Live Stream here: ESIP 2020 Winter Meeting - Day 3 Plenary

  • 8:30 am - Samantha Snell: Preparedness and Response in Collections Emergencies (PRICE)
  • 9:00 am - Dan Pilone: Looking over the edge: Bridging the gaps between geospatial data, cloud computing, and local disaster response organizations

Speakers
avatar for Dan Pilone

Dan Pilone

Chief Technologist, Element 84
Dan Pilone is CEO/CTO of Element 84 and oversees the architecture, design, and development of Element 84's projects including supporting NASA, the USGS, Stanford University School of Medicine, and commercial clients. He has supported NASA's Earth Observing System for nearly 13 years... Read More →
avatar for Samantha Snell

Samantha Snell

Collections Management Specialist, Smithsonian Institution
Samantha is the Collections Management Specialist for the Smithsonian Institution’s National Collections Program (NCP).  In this position, she works to improve Smithsonian-wide collections emergency management and collections management professional development training as well... Read More →


Thursday January 9, 2020 8:30am - 10:00am
Salon A-C
  • Remote Participation Link: https://global.gotomeeting.com/join/195545333
  • Remote Participation Phone #: (571) 317-3129
  • Remote Participation Access Code 195-545-333
  • Additional Phone #'s: Australia: +61 2 8355 1050 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 923 17 0568 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 15 360 728 Italy: +39 0 230 57 81 42 Netherlands: +31 207 941 377 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 912 71 8491 Sweden: +46 853 527 836 Switzerland: +41 225 4599 78 United Kingdom: +44 330 221 0088

10:00am

Networking Break
Thursday January 9, 2020 10:00am - 10:15am
Salon A-C Foyer

10:15am

Working Group for the Data Stewardship Committee
This session is a working group for the 2020-2021 year for the Data Stewardship committee. We will discuss priorities for the next year, potential collaborative outputs, and review the work in progress from the last year. 

Notes Document: https://docs.google.com/document/d/1B_0K5jGnFgH72U3P2-oGr5vEqHOGU8CWU-IkZ6pjXbM/edit?ts=5e174588

Presentations

View Recording: https://youtu.be/am-ZLfHgM4w

Takeaways
  • Wow, the members of the Committee really are active! Practically everyone has their own cluster or two!
  • Six activities proposed for the upcoming year have champions who will lead the effort to define the outputs of their selected activity.


Speakers
avatar for Alexis Garretson

Alexis Garretson

Community Fellow, ESIP
avatar for Kelsey Breseman

Kelsey Breseman

Archiving Program Lead, Environmental Data & Governance Initiative
Governmental accountability around public data & the environment. Decentralized web. Intersection of tech & ethics & civics.


Thursday January 9, 2020 10:15am - 11:45am
Forest Glen

10:15am

Do you have a labeling problem? Three tools for labeling data
The ESIP community and others in machine learning regularly lament the lack of labeled datasets, needed for certain classes of training algorithms. Generating accurate, useful labels is a hard problem, with no general automated solution in sight. Thus, labeling generally involves human effort, which is challenging because the volume of data needed for training can be very large.

Tools exist to help in labeling data. This session will demonstrate three labeling tools and associated processes:
  • Image Labeler, a fast, scalable cloud-based tool to facilitate the rapid development of Earth science event databases, to aid in automated ML-based image classification, Rahul Ramachandran
  • Labelimg, an open source graphical image annotation tool, https://github.com/tzutalin/labelImg, Ziheng Sun
  • Bokeh, a Python based plotting and annotation tool set for building arbitrary labeling workflows, https://bokeh.org/, Jim Bednar
Time permitting, the session will conclude with a short discussion of thoughts and tradeoffs about the tools.

This session is followed by a hands-on workshop for using Labelimg and Bokeh. Please see the session abstract for "Hands on Labeling Workshop" for information on preparing for that workshop if you are interested in participating.

Presentations
https://doi.org/10.6084/m9.figshare.11629110.v1
https://doi.org/10.6084/m9.figshare.11591739.v1

View Recording: https://youtu.be/3ufBOoD3M1E

Takeaways
  • Machine learning based classification applications require high-quality labelled data sets for both model training and evaluation. There are many existing tools for labeling images (including earth science data), but labeling tasks are very labor and time intensive.
  • If the pre-built labeling tools don’t work for your problem, Anaconda provides a general-purpose labeler-building toolkit based on Bokeh for Python users; see https://examples.pyviz.org/ml_annotators/ml_annotators.html
  • There is opportunity in combining partly automated, partly human labeling, to automate the easy cases while leaving the final call to a person. Currently not much tool support or good practices, hard to integrate.The art of avoiding extra work!

Speakers
avatar for Ziheng Sun

Ziheng Sun

Research Assistant Professor, George Mason University
My research interests are mainly on geospatial cyberinfrastructure and agricultural remote sensing.
avatar for Anne Wilson

Anne Wilson

Senior Software Engineer, Laboratory for Atmospheric and Space Physics
avatar for Yuhan Rao

Yuhan Rao

Ph.D., North Carolina Institute for Climate Studies


Thursday January 9, 2020 10:15am - 11:45am
Glen Echo

10:15am

Identifying ESIP
Permanent Identifiers (PIDs) make connections across the scholarly community possible. We are familiar with DOI's for data, but how about ORCIDs for people or RORs for organizations. How is the ESIP community using identifiers and how can we benefit from that usage?

This is the first report from the Identifying ESIP Connections Funding Friday Project that started last summer. The focus so far has been on identifying organizations associated with ESIP using the Research Organization Registry. During this session we will introduces identifiers at four levels: U.S. Federal Agencies and Departments, ESIP Sponsors, ESIP Members, and ESIP Participants. Information on all of these levels is available on the ESIP Wiki.
  1. Maria Gould, the ROR Project lead at the California Digital Library will fill us in on ROOR and answer questions about RORs. (Presentation)
  2. Ted Habermann the PI of Identifying ESIP Connections will discuss this work and lead a working discussion of RORs

Click here to participate: http://wiki.esipfed.org/index.php/Category:Identifying_ESIP_Connections


Presentations
https://doi.org/10.6084/m9.figshare.11794182.v1

View Recording: https://youtu.be/iUYmTaDdJGQ

Takeaways
  • Generally positive attitude about using identifiers for organizations but all organizations in ESIP may not end up with RORs...
  • The granularity of RORs is an ongoing challenge and spans many challenges - multi-organization projects, changes as function of time.
  • How are research organizations defined? Do repositories have RORs? Wiki pages were good way to share information.



Speakers
avatar for Ted Habermann

Ted Habermann

Chief Game Changer, Metadata Game Changers
I am interested in all facets of metadata needed to discover, access, use, and understand data of any kind. Also evaluation and improvement of metadata collections, translation proofing. Ask me about the Metadata Game.



Thursday January 9, 2020 10:15am - 11:45am
Linden Oak

10:15am

Mapping Data & Operational Readiness Levels (ORLs) to Community Lifelines
Approach: The Disaster Lifecycle Cluster has seen great success in its efforts to put Federated arms around “trusted data for decision makers” as a way to accelerate situational awareness and decision-making. By identifying trust levels for data. This session will build upon the Summer meeting and align perfectly with the overall ESIP theme of: Data to Action: Increasing the Use and Value of Earth Science Data and Information.

The ESIP Disaster Lifecycle Cluster has evolved into one of the most operationally active clusters in the Federation with a thirst for applying datasets to decision-making environments while building trust levels that manifest themselves as ORLs. Duke Energy, All Hazards Consortium’s Sensitive Information Sharing environment (SISE), DHS and FEMA are all increasing their interest in ORLs with their sights set on implementing them in the near future. Data is available everywhere and more of it is on the way. Trusted data is available some places and can help decision makers such as utilities make 30-second decisions that can save lives, property and get the lights back on sooner, saving millions of dollars.

This session will provide the venue to discuss emerging projects from NASA’s Applied Sciences Division (A.37), Initiatives at JPL and Federal Agency data portal access that can accelerate decision making today and in the future. We will also discuss drone data and European satellite data that is available for access and use when disasters threaten. Come and join us, the data you have may just save a life.

Agenda:
  1. Greg McShane, DHS CISA - The Critical Nature of the Public-Private Trusted Information Sharing Paradigm (10 min) Presented by Tom Moran, All Hazards Consortium Executive Director
  2. Dave Jones, StormCenter/GeoCollaborate - The status of ORLs, where we are, ESIP Announcement at GEO in Australia, AHC SISE, Next Steps (10 min)
  3. Maggi Glassco, NASA Disasters Program, JPL - New Applied Sciences Disasters Projects, Possible Lifeline Support Information Sources in the Future (10 min)
  4. Bob Chen/Bob Downs, Columbia Univ./SEDAC/CIESIN - Specific Global and Local Population Data for Community Lifeline Decision Making (10 min)
  5. Discussion/Q&A Period (40 min)

Presentations

View Recording: https://youtu.be/gJ93R6SlMkM

Key Takeaways for this Session: 
  1. Through the All Hazards Consortium, a new research institute will begin to help bring candidate research products into operations. An imagery committee, consisting of private and research members under SISE, will identify and evaluate use-case driven candidate imagery data within the ORL context using Geo-Collaborate.
  2. NASA grant opportunities within the disasters program requires co-funding by end user partners to guide usage needs and adoption (using ARL success criteria). This should increase adoption of NASA funded ASP project data and/or services. The cluster would like to work with NASA ASP as a testbed for funded projects to connect to additional user communities.
  3. We discussed the need / value of population data (current and predictions on affected populations) for preparedness activities and emergency response. We would like to leverage additional data services from SEDAC to test with operational decision makers. 


Speakers
avatar for Dave Jones

Dave Jones

StormCenter Communications, StormCenter Communications
Real-time data access, sharing and collaboration across multiple platforms. Collaborative Common Operating Pictures, Decision Making, Situational Awareness, connecting disparate mapping systems to share data, cross-product data sharing and collaboration. SBIR Phase III status with... Read More →
avatar for Karen Moe

Karen Moe

NASA Goddard Emeritus



Thursday January 9, 2020 10:15am - 11:45am
Salon A-C

10:15am

Connecting Data with Data Usage: a Graph Approach
We will investigate graph-based methods of connecting data with the uses made and the knowledge gained from those data, from science research to applications to strategic planning. We will examine the diverse capabilities enabled by connecting uses with data for a variety of stakeholders, and explore how to connect existing knowledge graphs together to scale out across the ESIP federation and related communities toward an inter-connected mega-graph.

0-5 min: Chris Lynnes (NASA): Documenting how data matters...
5-15 min: Doug Newman (NASA): EOSDIS Knowledge Graph
https://doi.org/10.6084/m9.figshare.11561805.v1
15-25 min: Reid Sherman (GCIS): Global Change Information System
https://doi.org/10.6084/m9.figshare.11560011.v1
25-35 min: Dave Blodgett (USGS): SELFIE
https://doi.org/10.6084/m9.figshare.11559093.v1
35-45 min: Joe Conran (NOAA): Interagency Coordination of Satellite Needs
https://doi.org/10.6084/m9.figshare.11561946.v1
45-55 min: Wil Doane (IDA): Assessing the Impact of Land Imaging
https://doi.org/10.6084/m9.figshare.11561913.v1
55-90 min: The Way Forward:
1 - Got Use Case?
2 - ESIP Cluster? https://www.esipfed.org/get-involved/collaborate
3 - Who's In?

Session Notes

View Recording:
https://youtu.be/yi05crW6Ya0\

Takeaways
  • How to connect data with the uses of that data = Documenting how data matter.
    Federating knowledge bases is daunting task but possible.
  • Connect research and data to place (but gap around using place identifiers in linked data).
    Discussion of potentially make a new cluster or using another one. Decision to recharter/repurpose/rename the data discovery cluster.
  • Sin of computer science is giving people impression that things are mostly 1 to 1 relationship, but more accurately life and universe is full of many-to-many relationships, i.e., graph databases > RDBMS




Speakers
avatar for Chris Lynnes

Chris Lynnes

System Architect, NASA/GSFC
avatar for Doug Newman

Doug Newman

EED Data Use Architect



Thursday January 9, 2020 10:15am - 11:45am
White Flint

11:45am

Networking Break
Thursday January 9, 2020 11:45am - 12:00pm
Salon A-C Foyer

12:00pm

License Up! What license works for you and your downstream repositories?
Many repositories are seeing an increase in the use and diversity of licenses and other intellectual property management (IPM) tools applied to externally-created data submissions and software developed by staff. However, adding a license to data files may have unexpected or unintended consequences in the downstream use or redistribution of those data. Who “owns” the intellectual property rights to data collected by university researchers using Federal and State (i.e., public) funding that must be deposited at a Federal repository? What license is appropriate for those data and what — exactly — does that license allow and disallow? What kind of license or other IPM instrument is appropriate for software written by a team of Federal and Cooperative Institute software engineers? Is there a significant difference between Creative Commons, GNU, and other ‘open source licenses’?

We have invited a panel of legal advisors from Federal and other organizations to discuss the implications of these questions for data stewards and the software teams that work collaboratively with those stewards. We may also discuss the latest information about Federal data licenses as it applies to the OPEN Government Data Act of 2019. How to Prepare for this Session: Consider what, if any, licenses, copyright, or other intellectual property rights management you apply or think applies to your work. Also consider Federal requirements such as the OPEN Government Data Act of 2019, Section 508 of the Rehabilitation Act of 1973.

Speakers:
Dr. Robert J. Hanisch is the Director of the Office of Data and Informatics, Material Measurement Laboratory, at the National Institute of Standards and Technology in Gaithersburg, Maryland. He is responsible for improving data management and analysis practices and helping to assure compliance with national directives on open data access. Prior to coming to NIST in 2014, Dr. Hanisch was a Senior Scientist at the Space Telescope Science Institute, Baltimore, Maryland, and was the Director of the US Virtual Astronomical Observatory. For more than twenty-five years Dr. Hanisch led efforts in the astronomy community to improve the accessibility and interoperability of data archives and catalogs.
Henry Wixon is Chief Counsel for the National Institute of Standards and Technology (NIST) of the U.S. Department of Commerce. His office provides programmatic legal guidance to NIST, as well as intellectual property counsel and representation to the Department of Commerce and other Department bureaus. In this role, it interacts with principal developers and users of research, including private and public laboratories, universities, corporations and governments. Responsibilities of Mr. Wixon’s office include review of NIST Cooperative Research and Development Agreements (CRADAs), licenses, Non-Disclosure Agreements (NDAs) and Material Transfer Agreements (MTAs), and the preparation and prosecution of the agency’s patent applications. As Chief Counsel, Mr. Wixon is active in standing Interagency Working Groups on Technology Transfer, on Bayh-Dole, and on Research Misconduct, as well as in the Federal Laboratory Consortium. He is a Certified Licensing Professional and a Past Chair of the Maryland Chapter of the Licensing Executives Society, USA and Canada (LES), and is a member of the Board of Visitors of the College of Computer, Mathematical and Natural Sciences of the University of Maryland, College Park.

Presentations
See attached

View Recording: https://youtu.be/5Ng5FDW1LXk.

Takeaways



Speakers
DC

Donald Collins

Oceanographer, NESDIS/NCEI Archive Branch
Send2NCEI, NCEI archival processes, records management



Thursday January 9, 2020 12:00pm - 1:30pm
Forest Glen

12:00pm

Hands-on labeling workshop
Intended as a follow on to the "Do You Have a Labeling Problem?" session and to get your feet wet, this working session is for people to experiment with two of the tools presented in that session, Labelimg and Bokeh. Presenters will provide some sample data for participants to work with. Attendees can also bring some of their own data to work with in the time remaining after the planned activities.

It would be best for workshop participants to preinstall Labelimg before coming to the session.   Regarding Bokeh, Anaconda is providing 25 accounts for workshop participants. (Thank you, Jim and Anaconda!).  Installing Bokeh is also an option.  Links for getting these tools are:
  • Labelimg via https://github.com/tzutalin/labelImg#installation
  • Bokeh as part of the HoloViz suite via http://holoviz.org/installation.html

Presentations

View Recording: https://youtu.be/y8NqTLgT8Ao

Takeaways


Speakers
avatar for Ziheng Sun

Ziheng Sun

Research Assistant Professor, George Mason University
My research interests are mainly on geospatial cyberinfrastructure and agricultural remote sensing.
avatar for Anne Wilson

Anne Wilson

Senior Software Engineer, Laboratory for Atmospheric and Space Physics
avatar for Yuhan Rao

Yuhan Rao

Ph.D., North Carolina Institute for Climate Studies


Thursday January 9, 2020 12:00pm - 1:30pm
Glen Echo

12:00pm

Research Object Citation Cluster Working Session
ESIP has published guidelines for citing data and for citing software and services. These have been important and influential ESIP products. Now a new cluster is working to address the issues of “research object” citation writ large. The cluster has been working to identify the various types of research objects that could or should be cited such as samples, instruments, annotations, and other artifacts. We have also been examining the various concerns that may be addressed in citing the objects such as access, credit or attribution, and scientific reproducibility. We find that citation of different types of objects may need to address different concerns and that different approaches may be necessary for different concerns and objects. We have, therefore, been working through a matrix that attempts to map all the various objects and citation concerns.

In this working session, we will provide a brief overview of the cluster's work to date on determining when different research objects get IDs. We will then work in small groups to determine when different research objects need to be identified to ensure reproducibility or validity of a result. For this purpose, we define reproducibility as the ability to independently recreate or confirm a result (not the data). A result could be a finding in a scientific paper, a legal brief, a policy recommendation, a model output or derived product — essentially any formal, testable assertion. This is essentially a provenance use case. It is very broad, but distinct from the credit and even the access concerns of citation. This is primarily about unambiguous reference. When does an object become a first-class research object?
To approach the problem, we will break up into 4-5 groups to define and give examples of different clusters of research objects and then work to answer When or under what circumstance is it necessary to identify an object to enable reproducibility. 
Potential groups include:
  1. Literature and related objects (not to be discussed)
  2. Software and related objects — Dan Katz
  3. Data and related objects — Mark Parsons
  4. Samples — Sarah Ramdeen
  5. Ontologies and vocabularies — Ruth Duerr
  6. Complex research objects (esp. but not exclusively learning resources) — Nancy Hoebelheinrich
  7. Instruments and facilites — Mike Daniels
  8. Organizations 
  9. Activities
We encourage everyone  to start drafting definitions and examples in the spreadsheet now: https://docs.google.com/spreadsheets/d/1VEYPLgTsCR_zbMUbThonBrqaYqBiMT4e525NzFi7ql8/edit#gid=1494916301
Our goal is to have a draft recommendation or complete matrix by the end of the meeting as well as potential follow-on activities for the cluster.

How to Prepare for this Session: Participants should be familiar with existing ESIP citation guidelines and have reviewed the minutes of the last several meetings, especially the "Objects and Concerns Matrix". See http://wiki.esipfed.org/index.php/Research_Object_Citation

Presentations

View Recording:
https://youtu.be/5MXzBLu7hjg (abbreviated due to breakout group emphasis of session).

Takeaways
  • What is a ‘thing’? ‘Research object’ is a defined term in other communities. Our conception is broader. Perhaps we need a new term, but much of the issue is defining when something becomes a ‘thing’ that is named and located.
  • The particular citation use case matters a lot. Reproducibility demands different considerations than credit. The cluster will consider more use cases.
  • There appears to be classes of things that can be treated similarly, but we haven’t sorted that out yet.



Speakers
avatar for Jessica Hausman

Jessica Hausman

Data Engineer, PO.DAAC JPL
avatar for Mark Parsons

Mark Parsons

Research Scientist, Rensselaer Polytechnic Institute



Thursday January 9, 2020 12:00pm - 1:30pm
Linden Oak

12:00pm

Fire effects on soil morphology across time scales: Data needs for near- and long-term land and hazard management
Fire impacts soil hydrology and biogeochemistry at both near (hours to days) and long (decades to centuries) time scales. Burns, especially in soils with high organic carbon stocks like peatlands, induce a loss of absolute soil carbon stock. Additionally, fire can alter the chemical makeup of the organic matter, potentially making it more resistant to decomposition. On the shorter timescales, fire can also change the water repellent properties or hydrophobicity of the soil, leading to an increased risk of debris flows and floods.

In this session, we will focus on the varying data needs for assessing the effects of burns across time scales, from informing emergency response managers in the immediate post-burn days, to monitoring post-burn recovery, to managing carbon in a landscape decades out.

Speaker abstracts (in order of presentation):

James MacKinnon (NASA GSFC)
Machine learning methods for detecting wildfires 

This talk shows the innovative use of deep neural networks, a type of machine learning, to detect wildfires in MODIS multispectral data. This effort attained a very high classification accuracy showing that neural networks could be useful in a scientific context, especially when dealing with sparse events such as fire anomalies. Furthermore, we laid the groundwork to continue beyond binary fire classification towards being able to detect the "state," or intensity of the fire, eventually allowing for more accurate fire modeling. With this knowledge, we developed software to enable neural networks to run on even the typically compute-limited spaceflight-rated computers, and tested it by building a drone payload equipped with a flight computer analog and flew it over controlled burns to prove its efficacy.

Kathe Todd-Brown (U. FL Gainesville)
An overview of effects of fire on ecosystems

Fire is a defining characteristic of many ecosystems worldwide, and, as the climate warms, both fire frequency and severity are expected to increase. In addition to the effects of smoke on the climate and human health, there are less apparent effects of fire on the terrestrial ecosystem. From alterations in the local soil properties to changes in the carbon budget as organic carbon is combusted into CO2 and pyrogenic carbon, fire is deeply impactful to the local landscape. The long-term climate implication of fire on the terrestrial carbon budget is a tension between carbon lost to the atmosphere as carbon dioxide and sequestered in the soil as recalcitrant pyrogenic carbon. Here we present a new model to simulate the interaction between ecosystem growth, decomposition, and fire on carbon dynamics. We find that the carbon lost to burned carbon dioxide will always be recovered, if there is any recalcitrant pyrogenic carbon generated by the fires. The time scale of this recovery, however, is highly variable and often not relevant to land managers. This model highlights key data gaps at the annual and decadal time scales. Quantifying and predicting the loss of soil, litter, and vegetation carbon in an individual fire event is a key unknown. Relatedly, the amount of pyrogenic carbon generated by fire events is another near-term data needed to better constrain this model. Finally, on the longer time scales, the degree of recalcitrancy of pyrogenic carbon is a critical unknown.

Daniel Fuka (VA Tech)

Rapidly improving the spatial representation of soil properties using topographically derived initialization with a proposed workflow for new data integration
Topography exerts critical controls on many hydrologic, geomorphologic, biophysical, and forest fire processes. However, in modeling these systems, the current use of topographic data neglects opportunities to account for topographic controls on processes such as soil genesis, soil moisture distributions, and hydrological response; all factors that significantly characterize the post-fire effects and potential risks of the new landscape. In this presentation, we demonstrate a workflow that takes advantage of data brokering to combine the most recent topographic data and best available soil maps to increase the resolution and representational accuracy of spatial soil morphologic and hydrologic attributes: texture, depth, saturated conductivity, bulk density, porosity, and the water capacities at field and wilting point tensions. We show several proofs of concept and initial performance test the values of the topographically adjusted soil parameters against those from the NRCS SSURGO (Soil Survey Geographic database). Finally, we pose the potential for a quickly configurable opensource data brokering system (NSF BALTO) to be used to make available the most recently updated topographic and soils characteristics, so this workflow can rapidly re-characterize and increase the resolution of post-fire landscapes.

Dalia Kirschbaum (NASA GSFC)
Towards characterization of global post-fire debris flow hazard

Post-fire debris flows commonly occur in the western United States, but the extent of this hazard is little known in other regions. These events occur when rain falls on the ground with little vegetative cover and hydrophobic soils—two common side effects of wildfire. The storms that trigger post-fire debris flows are typically high-intensity, short-duration events. Thus, a first step towards global modeling of this hazard is to evaluate the ability of GPM IMERG and other global precipitation data to detect these storms. The second step is to determine the effectiveness of MCD64 and other globally available predictors in identifying locations susceptible to debris flows. Finally, rainfall and other variables can be combined into a single global model of post-fire debris flow occurrence. This research can show both where post-fire debris flows are currently most probable, as well as where the historical impact has been greatest.

How to Prepare for this Session:

Presentations

View Recording: https://youtu.be/I89om-kBYB0

Takeaways
  • Modeling and detecting fires and fire impacts is changing (e.g. neural networks, carbon modeling) and needs to continue to improve
  • There are many data needs to be able to operationalize post-fire debris flow and soil modeling
  • Fires severely change ecosystems and soils and we do not really understand the exact changes yet, need more research in this area


Speakers
KT

Kathe Todd-Brown

University of Florida Gainesville
DF

Dan Fuka

Virginia Tech
avatar for Bill Teng

Bill Teng

NASA GES DISC (ADNET)



Thursday January 9, 2020 12:00pm - 1:30pm
Salon A-C

12:00pm

Datacubes for Analysis-Ready Data: Standards & State of the Art
This workshop session will follow up on the OGC Coverage Analytics sprint, focusing specifically on advanced services for spatio-temporal datacubes. In the Earth sciences datacubes are accepted as an enabling paradigm for offering massive spatio-temporal Earth data analysis-ready, more generally: easing access, extraction, analysis, and fusion. Also, datacubes homogenizes APIs across dimensions, allowing unified wrangling of 1-D sensor data, 2-D imagery, 3-D x/y/t image timeseries and x/y/z geophysics voxel data, and 4-D x/y/z/t climate and weather data.
Based on the OGC datacube reference implementation we introduce datacube concepts, state of standardization, and real-life 2D, 3D, and 4D examples utilizing services from three continents. Ample time will be available for discussion, and Internet-connected participants will be able to replay and modify many of the examples shown. Further, key datacube activities worldwide, within and beyond Earth sciences, will be related to.
Session outcomes could take a number of forms: ideas and issues for OGC, ISO, or ESIP to consider; example use cases; challenges not yet addressed sufficiently, and entirely novel use cases; work and collaboration plans for future ESIP work. Outcomes of the session will be reported at the next OGC TC meeting's Big Data and Coverage sessions. How to Prepare for this Session: Introductory and advanced material is available from http://myogc.org/go/coveragesDWG

Presentations
https://doi.org/10.6084/m9.figshare.11562552.v1

View Recording: https://youtu.be/82WG7soc5bk

Takeaways
  • Abstract coverage construct defines the base which can be filled up with a coverage implementation schema. Important as previously implementation wasn’t interoperable with different servers and clients. 
  • Have embedded the coordinate system retrieved from sensors reporting in real time into their xml schema to be able to integrate the sensor data into the broader system. Can deliver the data in addition to GML but JSON, and RDF which could be used to link into semantic web tech. 
  • Principle is send HTTP url-encoded query to server and get some results that are extracted from datacube, e.g., sources from many hyperspectral images.

Speakers

Thursday January 9, 2020 12:00pm - 1:30pm
White Flint

1:30pm

Lunch
Thursday January 9, 2020 1:30pm - 2:15pm
Salon D