Loading…
This event has ended. Create your own event on Sched.
Join the 2020 ESIP Winter Meeting Highlights Webinar on Feb. 5th at 3 pm ET for a fast-paced overview of what took place at the meeting. More info here.

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Tuesday, January 7
 

11:00am EST

Analytic Centers for Air Quality
The Analytic Center Framework (ACF) is a concept to support scientific investigations with a harmonized collection of data from a wide range of sources and vantage points, tools and computational resources. Four recent NASA AIST competitive awards are focused on either ACFs or components which could feed into AQ ACF's. Previous projects have developed tools and improved the accessibility and usability of data for Air Quality analysis, and have tried to address issues related to inconsistent metadata, uncertainty quantification, interoperability among tools and computing resources and visualization to aid scientific investigation or applications. The format for this meeting will be a series of brief presentati.ons by invited speakers followed by a discussion. This generally follows the panel model How to Prepare for this Session: A link to a set of pre-read materials will be provided.

View Recording: https://youtu.be/fy4eoOfSbpo.

Takeaways
  • Is there enough interest to start an Air Quality cluster? Yes!
  • Technologists and scientists should both be involved in the cluster to ensure usability through stakeholder engagement


Speakers
ML

Mike Little

ESTO, NASA
Computational Technology to support scientific investigations


Tuesday January 7, 2020 11:00am - 12:30pm EST
Glen Echo
  Glen Echo, Working Session

2:00pm EST

Data Skills & Competencies Requirements for Data Stewards: Views from the ESIP Community & Beyond
At the ESIP Summer 2019, many ESIP community members offered their feedback on the range and importance of skills and competencies for data specialists whose job responsibilities focus upon offering data "advise" (e.g., from data curators) and data "service providers" (e.g., from data librarians). By means of an interactive poster, participants were asked to choose whether a competency was of high, medium, low or no importance from a subset of competencies identified by a European Open Science Cloud (EOSC) project. In this session, session leaders will present the results of the ESIP community feedback within the context of the full list of EOSC competencies, and visualized from both a poster synthesis and a research data lifecycle point of view. Session leaders are hoping to have the audience participate by providing feedback and engaging in discussion on the data and views presented. One outcome of this work will be a "Career Compass" to be published by the American Geoscience Institute for students interested in becoming data stewards. How to Prepare for this Session:

Presentations:

View Recording: https://youtu.be/1s1L3Jter8w

Takeaways



Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services & Information Technology, University of New Mexico
Since 1986 I have had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when I arrived at UNM I have worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director, and currently... Read More →


Tuesday January 7, 2020 2:00pm - 3:30pm EST
Linden Oak
  Linden Oak, Breakout

2:00pm EST

COPDESS: Facilitating a Fair Publishing Workflow Ecosystem
COPDESS, the Coalition for Publishing Data in the Earth and Space Sciences (https://copdess.org/), was established in October 2014 as a platform for Earth and Space Science publishers and data repositories to jointly define, implement, and promote common policies and procedures for the publication and citation of data and other research results (e.g., samples, software, etc.) across Earth Science journals. In late 2018, COPDESS became a cluster of ESIP to give the initiative the needed sustainability to support a long-term FAIR publishing workflow ecosystem and be a springboard to pursue future enhancements of it.

In 2017, with funding from the Arnold Foundation, the ‘Enabling FAIR Data Project’ (https://copdess.org/enabling-fair-data-project/) moved mountains towards implementing the policies and standards that connect researchers, publishers, and data repositories in their desire to accelerate scientific discovery through open and FAIR data. Implementation of the new FAIR policies has advanced rapidly across Earth, Space, and Environmental journals, but supporting infrastructure, guidelines, and training for researchers, publishers, and data repositories has yet to catch up. The primary challenges are:
  • Repositories struggle to keep up with the demands of researchers, who want to be able to instantly deposit data and obtain a DOI, without considering the data quality/data ingest requirements and review procedures of individual repositories - producing a situation where data publication is inconsistent in quality and content.
  • Many publishers who have signed the Commitment Statement for FAIR Data (https://copdess.org/enabling-fair-data-project/commitment-statement-in-the-earth-space-and-environmental-sciences/) agree with it at a high, conceptual level. However, many journal editors and reviewers lack clarity on how to validate that datasets, which underpin scholarly publications, conform with the Commitment Statement.
  • Researchers experience confusion, and in some cases barriers to publication of their papers whilst they try and meet the requirements of the commitment statement. Clarity of requirements, timelines, and criteria for selecting repositories are needed to minimize the barriers to the joint publication of papers and associated data.

Funders have a role to play, in that they need to allow for time and resources required to curate data and ensure compliance, particularly WRT to the assignment of valid DOIs. Funders can also begin to reward those researchers who do take the effort to properly manage and make their data available, in a similar way to how they reward scholarly publications and citation of those publications.

The goal of this session is to start a conversation on developing an integrated publishing workflow ecosystem the seamlessly integrates researchers, repositories, publishers and funders. Perspectives from all viewpoints will be presented.

Notes document: https://docs.google.com/document/d/12M0F6mcUZSn2GdBN-Id__smXhYxbLzKDrAViPAgnH6w/edit?usp=sharing

Presentations:

View Recording: https://youtu.be/x6a1QRNbifQ

Takeaways
  • COPDESS has moved to ESIP as a cluster to ensure the sustainability of the project to address the publishing & citation of research data



Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services & Information Technology, University of New Mexico
Since 1986 I have had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when I arrived at UNM I have worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director, and currently... Read More →
avatar for Kerstin Lehnert

Kerstin Lehnert

President, IGSN e.V.
Kerstin Lehnert is Senior Research Scientist at the Lamont-Doherty Earth Observatory of Columbia University and Director of EarthChem, the System for Earth Sample Registration, and the Astromaterials Data System. Kerstin holds a Ph.D in Petrology from the University of Freiburg in... Read More →
avatar for Lesley Wyborn

Lesley Wyborn

Adjunct Fellow, Australian National University


Tuesday January 7, 2020 2:00pm - 3:30pm EST
Salon A-C
  Salon A-C, Breakout

2:00pm EST

Current Data that are available on the Cloud
NASA, NOAA and USGS are in the process of moving data onto the cloud. While they have discussed what types of services are available and future plans of what data can be found, it is not completely clear what datasets users can currently access. This session will go over what datasets are currently up in the cloud and what data to expect in the near future. This way as users are transitioning to the cloud for their compute, they can also know what data are available to them on the cloud as well. There will also be presentations from AWS. Speakers:
Katie Baynes - NASA/EOSDIS
Jon O'Neil - NOAA
Jeff de La Beaujardiere - NCAR
Kristi Kliene - USGS/EROS
Joe Flasher - AWS

Presentations: See attached.

View Recording: https://youtu.be/yssgXB7iaxw

Takeaways
  • Petabyte scale data is being moved into the cloud. This is concentrated in AWS, Google Cloud and Microsoft depending on the agency and dataset
  • Some concern around partnerships with companies (AWS most discussed) in terms of long term relationships, moving data etc. and how those things might impact access or data use
  • Need to make clear the authoritative source of the data, who is stewarding it, and any modifications done when copying to cloud. Users should exercise due diligence in selecting and using data.



Speakers
JO

Jon O'Neil

Director, NOAA Big Data Program, NOAA
avatar for Joe Flasher

Joe Flasher

Open Geospatial Data Lead, Amazon Web Services
Joe Flasher is the Open Geospatial Data Lead at Amazon Web Services helping organizations most effectively make data available for analysis in the cloud. The AWS open data program has democratized access to petabytes of data, including satellite imagery, genomic data, and data used... Read More →
avatar for Christopher Lynnes

Christopher Lynnes

Systems Architect, NASA/EOSDIS, NASA/GSFC
Christopher Lynnes is currently System Architect for NASA’s Earth Observing System Data and Information System, known as EOSDIS. He has been working on EOSDIS since 1992, over which time he has worked multiple generations of data archive systems, search engines and interfaces, science... Read More →
avatar for Jessica Hausman

Jessica Hausman

Data Engineer, PO.DAAC JPL
avatar for Jeff de La Beaujardière

Jeff de La Beaujardière

Director, Information Systems Division, NCAR
Big data, cloud computing, object storage, data management.
avatar for Dave Meyer

Dave Meyer

GES DISC manager, NASA


Tuesday January 7, 2020 2:00pm - 3:30pm EST
White Flint
  White Flint, Breakout

4:00pm EST

Bringing Science Data Uncertainty Down to Earth - Sub-orbital, In Situ, and Beyond
In the Fall of 2019, the Information Quality Cluster (IQC) published a white paper entitled “Understanding the Various Perspectives of Earth Science Observational Data Uncertainty”. The intention of this paper is to provide a diversely sampled exposition of both prolific and unique policies and practices, applicable in an international context of diverse policies and working groups, made toward quantifying, characterizing, communicating and making use of uncertainty information throughout the diverse, cross-disciplinary Earth science data landscape; to these ends, the IQC addressed uncertainty information from the following four perspectives: Mathematical, Programmatic, User, and Observational. These perspectives affect policies and practices in a diverse international context, which in turn influence how uncertainty is quantified, characterized, communicated and utilized. The IQC is now in a scoping exercise to produce a follow-on paper that is intended to provide a set of recommendations and best practices regarding uncertainty information. It is our hope that we can consider and examine additional areas of opportunity with regard to the cross-domain and cross-disciplinary aspects of Earth science data. For instance, the existing white paper covers uncertainty information from the perspective of satellite-based remote sensing well, but does not adequately address the in situ or airborne (i.e., sub-orbital) perspective. This session intends to explore such opportunities to expand the scope of the IQC’s awareness of what is being done with regard to uncertainty information, while also providing participants and observers with an opportunity to weigh in on how best to move forward with the follow-on paper. How to Prepare for this Session:Agenda:
  1. "IQC Uncertainty White Paper Status Summary and Next Steps" - Presented by: David Moroni (15 minutes)
  2. "Uncertainty quantification for in situ ocean data: The S-MODE sub-orbital campaign" - Presented by: Fred Bingham (15 minutes)
  3. "Uncertainty Quantification for Spatio-Temporal Mapping of Argo Float Data" - Presented by Mikael Kuusela (20 minutes)
  4. Panel Discussion (35 minutes)
  5. Closing Comments (5 minutes)
Notes Page: https://docs.google.com/document/d/1vfYBK_DLTAt535kMZusTPVCBAjDqptvT0AA5D6oWrEc/edit?usp=sharing

Presentations:
https://doi.org/10.6084/m9.figshare.11553681.v1

View Recording: https://youtu.be/vC2O8FRgvck

Takeaways

Speakers
avatar for David Moroni

David Moroni

Data Stewardship and User Services Team Lead, Jet Propulsion Laboratory, Physical Oceanography Distributed Active Archive Center
I am a Senior Science Data Systems Engineer at the Jet Propulsion Laboratory and Data Stewardship and User Services Team Lead for the PO.DAAC Project, which provides users with data stewardship services including discovery, access, sub-setting, visualization, extraction, documentation... Read More →
avatar for Ge Peng

Ge Peng

Research Scholar, CISESS/NCEI
Dataset-centric scientific data stewardship, data quality management
FB

Fred Bingham

University of North Carolina at Wilmington
MK

Mikael Kuusela

Carnegie Mellon University


Tuesday January 7, 2020 4:00pm - 5:30pm EST
Forest Glen

4:00pm EST

Experiences Migrating Mission Scale Data in the Cloud
We will describe our project to upload a 2.4 PB dataset encapsulated into ~80K fused files from the 5 instruments on the Terra satellite into NASA AWS S3.
We will share the bottlenecks points and lessons learned during this process and expect to share experiences with similar projects in order to understand the best practices and collect guidelines for future projects that are adopting cloud solutions for their data needs.

We'll discuss data volumes, data integrity strategies for migration, S3 bucket organization, metadata curation, transfer rates, transfer pipelines, etc. We will also discuss and share data access patterns, costs, and architectures and how we can construct guidelines for access to these datasets efficiently.

We encourage the discussion among different projects that faced similar processes or are looking to migrate their datasets into the cloud.

https://drive.google.com/file/d/1fts06XDM2dbZxxljBTpplCEMSiTqfp6t/view?usp=sharing

Presentations:
https://doi.org/10.6084/m9.figshare.11553147.v1

View Recording: https://youtu.be/1xVJghJI4Gg

Takeaways
  • Project required/used a combination of NSF, NASA and AWS resources. Some interesting discussion around AWS or other cloud services as a stand in or follow on to limited term NSF assets
  • Some interesting discussion of tailoring to appropriate end users- wide range of potential users and thus requirements for the dataset. This includes access guidelines, user capabilities etc.
  • Project aimed to make a paradigm shift from understanding/observing physical processes to a full climate observing objective



Speakers
avatar for Ben Galewsky

Ben Galewsky

Research Programmer, National Center for Supercomputing Applications Connect Message


Tuesday January 7, 2020 4:00pm - 5:30pm EST
White Flint
  White Flint, Breakout
 
Wednesday, January 8
 

11:00am EST

Pangeo in Action
The NSF-funded Pangeo project (http://pangeo.io/) is a community-driven architectural framework for big data geoscience. A typical Pangeo software stack leverages Python open-development libraries including elements such as Jupyter Notebooks for interactive data analysis, Intake catalogs to provide a higher level of abstraction, Dask for scalable, parallelized data access, and Xarray for working with labeled multi-dimensional arrays of data, and can support data formats including NetCDF as well the cloud-optimized Zarr format for chunked, compressed, N-dimensional arrays.

This session includes presentations describing implementations, results, or lessons learned from using these tools, as well as some time for open discussion. We encourage attendance by people interested in knowing more about Pangeo.

Draft schedule:
Dr. Amanda Tan, U. Washington: Pangeo overview and lessons learned
Dr. Rich Signell, USGS: The USGS EarthMap Pangeo: Success Stories and Lessons Learned
Dr. Jeff de La Beaujardière, NCAR: Climate model outputs on AWS using Pangeo framework
Dr. Karl Benedict, UNM: Pangeo as a platform for workshops
Open discussion

How to Prepare for this Session:

Presentations:
https://doi.org/10.6084/m9.figshare.11559174.v1

View Recording: https://youtu.be/VNfpGIIjL3E.

Takeaways
  • Pangeo is a community platform for Big Data geoscience; A cohesive ecosystem of open community, open source software, open ecosystem; Three core python packages: jupyter, xarray, Dask
  • Deploying Pangeo on cloud face challenges
    • Cloud costs
    • Cloud skills
    • Need of cloud-optimized data
    • Best strategy of pangeo deployment in the changing cloud services platform
  • Pangeo can be applied to leverage the jupyter notebook and other resources for different level of data users (NCAR: scientists new to cloud computing platform; University of New Mexico: workshop platform etc)

Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services & Information Technology, University of New Mexico
Since 1986 I have had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when I arrived at UNM I have worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director, and currently... Read More →
avatar for Rich Signell

Rich Signell

Oceanographer, USGS
Ocean Modeling, Python, NetCDF, THREDDS, ERDDAP, UGRID, SGRID, CF-Conventions, Jupyter, JupyterHub, CSW, TerriaJS
avatar for Amanda Tan

Amanda Tan

Data Scientist, University of Washington
Cloud computing, distributed systems
avatar for Jeff de La Beaujardière

Jeff de La Beaujardière

Director, Information Systems Division, NCAR
Big data, cloud computing, object storage, data management.


Wednesday January 8, 2020 11:00am - 12:30pm EST
Linden Oak
  Linden Oak, Breakout

2:00pm EST

Participatory design and evaluation of a 3D-Printed Automatic Weather Station to explore hardware, software and data needs for community-driven decision making
The development of low-cost, 3D-printed weather stations aims to revolutionize the way communities collect long-term data about local weather phenomenon, as well as develop climate resilience strategies to adapt to the impacts of increasingly uncertain climate trends. This session will engage teachers and scientists in the evaluation and participatory design of the IoTwx 3D-printed weather station that is designed to be constructed and extended by students in middle and high school. We aim to explore the full spectrum of the station from construction (from pre-printed parts), to data collection and development of learning activities, to analysis of scientific phenomenon within the data. The stations also represent a unique opportunity to develop community-based strategies to extend the capabilities of the platform, and in the session we are encouraging full discussion of data collection and sensing technologies of specific relevance to communities adopting the stations.

In this working session, we will work directly with teachers on evaluation and development using a participatory design approach to stimulate and encourage relationships between ESIP Education Committee members and teachers.

Preparing for this Session: TBD

Presentations:

View Recording: https://youtu.be/AfvWhZBkQd8

Takeaways
  • Very valuable for the schools and community. It is an opportunity to include multiple departments within the school system (engineering, computer science, maths, earth science, etc.)
  • Need to understand the constraints that school systems may present: security, wifi, processing power, cloud access, only required for part of the year



Speakers
avatar for Shelley Olds

Shelley Olds

Science Education Specialist, UNAVCO
Data visualization tools, Earth science education, human dimensions of natural hazards, disaster risk reduction (DRR), resilience building.
avatar for Becky Reid

Becky Reid

Science Educator, Learners Without Walls
I discovered ESIP in the summer of 2009 when I was teaching science in Santa Barbara and attended the Summer meeting there. Ever since then, I have been volunteering with the ESIP Education Committee in various capacities, serving as Chair in 2013, 2019, and now, 2020! I currently... Read More →


Wednesday January 8, 2020 2:00pm - 3:30pm EST
Brookside A
  Brookside A, Working Session

2:00pm EST

Advancing Data Integration approaches of the structured data web
Political, economic, social or scientific decision making is often based on integrated data from multiple sources across potentially many disciplines. To be useful, data need to be easy to discover and integrate.
This session will feature presentations highlighting recent breakthroughs and lessons learned from experimentation and implementation of open knowledge graph, linked data concepts and Discrete Global Grid Systems. Practicality and adoptability will be the emphasis - focusing on incremental opportunities that enable transformational capabilities using existing technologies. Best practices from the W3C Spatial Data on the Web Working Group, OGC Environmental Linked Features Interoperability Experiment, ESIP Science on Schema.org; implementation examples from Geoscience Australia, Ocean Leadership Consortium, USGS and other organisations will featured across the entire session.
This session will highlight how existing technologies and best practices can be combined to address important and common use cases that have been difficult if not impossible until recent developments. A follow up session will be used to seed future collaborative development through co-development, github issue creation, and open documentation generation.

How to Prepare for this Session: Review: https://opengeospatial.github.io/ELFIE/, https://github.com/ESIPFed/science-on-schema.org, https://www.w3.org/TR/sdw-bp/, and http://locationindex.org/.

Notes, links, and attendee contact info here.

View Recording: https://youtu.be/-raMt2Y1CdM

Session Agenda:
1.  2.00- 2.10,  Sylvain Grellet, Abdelfettah Feliachi, BRGM, France
'Linked data' the glue within interoperable information systems
“Our Environmental Information Systems are exposing environmental features, their monitoring systems and the observation they generate in an interoperable way (technical and semantic) for years. In Europe, there is even a legal obligation to such practices via the INSPIRE directive. However, the practice inducing data providers to set up services in a "Discovery > View > Download data" pattern hides data behind the services. This hinders data discovery and reuse. Linked Data on the Web Best Practices put this stack upside down and data is now back in the first line. This completely revamp the design and capacities of our Information Systems. We'll highlight the new data frontiers opened by such practices taking examples on the French National Groundwater Information Network”
View Slides: https://doi.org/10.6084/m9.figshare.11550570.v1

2.  2.10 - 2.20,  Adam Leadbetter, Rob Thomas, Marine Institute, Ireland
Using RDF Data Cubes for data visualization: an Irish pilot study for publishing environmental data to the semantic web
The Irish Wave and Weather Buoy Networks return metocean data at 5-60 minute intervals from 9 locations in the seas around Ireland. Outside of the Earth Sciences an example use case for these data is in supporting Blue Economy development and growth (e.g. renewable energy device development). The Marine Institute, as the operator of the buoy platforms, in partnership with the EU H2020 funded Open Government Intelligence project has published daily summary data from these buoys using the RDF DataCube model[1]. These daily statistics are available as Linked Data via a SPARQL endpoint making these data semantically interoperable and machine readable. This API underpins a pilot dashboard for data exploration and visualization. The dashboard presents the user with the ability to explore the data and derive plots for the historic summary data, while interactively subsetting from the full resolution data behind the statistics. Publishing environmental data with these technologies makes accessing environmental data available to developers outside those with Earth Science involvement and effectively lowers the entry bar for usage to those familiar with Linked Data technologies.
View Slides: https://doi.org/10.6084/m9.figshare.11550570.v1

3. 2.20 - 2.30,  Boyan Brodaric, Eric Boisvert, Geological Survey of Canada, Canada; David Blodgett, USGS, USA
Toward a Linked Water Data Infrastructure for North America
We will describe progress on a pilot project using Linked Data approaches to connect a wide variety of water-related information within Canada and the US, as well as across the shared border
View Slides: https://doi.org/10.6084/m9.figshare.11541984.v1

4.  2.30 - 2.40,  Dalia Varanka, E. Lynn Usery, USGS, USA
The Map as Knowledge Base; Integrating Linked Open Topographic Data from The National Map of the U.S. Geological Survey
This presentation describes the objectives, models, and approaches for a prototype system for cross-thematic topographic data integration based on semantic technology. The system framework offers a new perspectives on conceptual, logical, and physical system integration in contrast to widely used geographic information systems (GIS).
View Slides: https://doi.org/10.6084/m9.figshare.11541615.v1

5.  2.40 – 2.50,  Alistair Ritchie, Landcare, New Zealand
ELFIE at Landcare Research, New Zealand
Landcare Research, a New Zealand Government research institute, creates, manages and publishes a large set of observational and modelling data describing New Zealand’s land, soil, terrestrial biodiversity and invasive species. We are planning to use the findings of the ELFIE initiatives to guide the preparation of a default view of the data to help discovery (by Google), use (by web developers) and integration (into the large environmental data commons managed by other agencies). This integration will not only link data about the environment together, but will also expose more advanced data services. Initial work is focused on soil observation data, and the related scientific vocabularies, but we anticipate near universal application across our data holdings.
View Slides: https://doi.org/10.6084/m9.figshare.11550369.v1

6.  2.50 - 3.00,  Irina Bastrakova, Geoscience Australia, Australia
Location Index Project (Loc-I) – integration of data on people, business & the environment
Location Index (Loc-I) is a framework that provides a consistent way to seamlessly integrate data on people, business, and the environment.
Location Index aims to extend the characteristics of the foundation spatial data of taking geospatial data (multiple geographies) which is essential to support public safety and wellbeing, or critical for a national or government decision making that contributes significantly to economic, social and environmental sustainability and linking it with observational data. Through providing the infrastructure to suppo

Speakers
avatar for Jonathan Yu

Jonathan Yu

Research data scientist/architect, CSIRO
Jonathan is a data scientist/architect with the Environmental Informatics group in CSIRO. He has expertise in information and web architectures, data integration (particularly Linked Data), data analytics and visualisation. Dr Yu is currently the technical lead for the Loc-I project... Read More →
avatar for Dalia Varanka

Dalia Varanka

Research Physical Scientist, U.S. Geological Survey
Principle Investigator and Project Lead, The Map as Knowledge Base
AR

Alastair Richie

Landcare Research NZ
AL

Adam Leadbetter

Marine Institute
RT

Rob Thomas

Marine Institute
BB

Boyan Brodaric

Natural Resources Canada
EB

Eric Boisvert

Natural Resources Canada
avatar for Irina  Bastrakova

Irina Bastrakova

Director, Spatial Data Architecture, Geoscience Australia
I have been actively involved with international and national geoinformatics communities for more than 19 years. I am the Chair of the Australian and New Zealand Metadata Working Group. My particular interest is in developing and practical application of geoscientific and geospatial... Read More →
avatar for David Blodgett

David Blodgett

U.S. Geological Survey


Wednesday January 8, 2020 2:00pm - 3:30pm EST
White Flint

4:00pm EST

Citizen Science Data in Earth Science: Challenges and Opportunities
Citizen science is scientific data collection and research performed primarily or in part by non-professional and amateur scientists. Citizen science data has been used in a variety of the physical sciences, including physics, ecology, biology, and water quality. As volunteer-contributed datasets continue to grow, they represent a unique opportunity to collect and analyze earth-science data on spatial and temporal scales impossible to achieve by individual researchers. This session will explore the ways open citizen science data sets can be used in earth science research and some of the associated challenges and opportunities for the ESIP community to use and partner with citizen science organizations.

Speakers:View Recording: https://youtu.be/jTNgWZI6Cik

Takeaways


How to Prepare for this Session: https://www.nationalgeographic.org/encyclopedia/citizen-science/
http://www.earthsciweek.org/citizen-science

Speakers
avatar for Alexis Garretson

Alexis Garretson

Community Fellow, ESIP
avatar for Kelsey Breseman

Kelsey Breseman

Archiving Program Lead, Environmental Data & Governance Initiative
Governmental accountability around public data & the environment. Decentralized web. Intersection of tech & ethics & civics.


Wednesday January 8, 2020 4:00pm - 5:30pm EST
Linden Oak
  Linden Oak, Breakout

4:00pm EST

Planning for new Agriculture and Climate Cluster focus area on automated agriculture with AI
The Agriculture and Climate (ACC) Cluster will host a planning session for a new focus area on automated agriculture and AI (""Agro-AI""). Some initial ideas on possible activities in this space were presented at the ACC October 2019 telecon, including those related to the “Data-to-Decisions” ESIP Lab project (https://www.esipfed.org/wp-content/uploads/2018/07/Wee.pdf). Currently, there are many initiatives and funding opportunities for automated agriculture with AI. The National Science Foundation, e.g., recently announced a program aimed at significantly advancing research in AI (https://www.nsf.gov/news/news_summ.jsp?cntn_id=299329&org=NSF&from=news), including, in its initial set of high-priority areas, “AI-Driven Innovation in Agriculture and the Food System.”
Among the topics for discussion in this planning session will be related proposal opportunities and sponsoring an ACC breakout session on agriculture and AI at the ESIP 2020 Summer Meeting. How to Prepare for this Session: TBD; there will be an intro presentation, prior to the group discussion. This presentation may be made available ahead of the meeting in the scheduled session page.

Presentations:

View Recording: https://youtu.be/GhnSINRFNBg

Takeaways
  • Next step 1: Conduct a survey of available dashboards, existing data, ML use cases, existing APIs
  • Next step 2: Decide on an example question for a use case
  • Next step 3: Define and survey potential users



Speakers
AA

Arif Albayrak

Senior Software Engineer, ADNET (GESDISC)
avatar for Bill Teng

Bill Teng

NASA GES DISC (ADNET)


Wednesday January 8, 2020 4:00pm - 5:30pm EST
Salon A-C
  Salon A-C, Business Meeting
 
Thursday, January 9
 

10:15am EST

Working Group for the Data Stewardship Committee
This session is a working group for the 2020-2021 year for the Data Stewardship committee. We will discuss priorities for the next year, potential collaborative outputs, and review the work in progress from the last year. 

Notes Document: https://docs.google.com/document/d/1B_0K5jGnFgH72U3P2-oGr5vEqHOGU8CWU-IkZ6pjXbM/edit?ts=5e174588

Presentations

View Recording: https://youtu.be/am-ZLfHgM4w

Takeaways
  • Wow, the members of the Committee really are active! Practically everyone has their own cluster or two!
  • Six activities proposed for the upcoming year have champions who will lead the effort to define the outputs of their selected activity.


Speakers
avatar for Alexis Garretson

Alexis Garretson

Community Fellow, ESIP
avatar for Kelsey Breseman

Kelsey Breseman

Archiving Program Lead, Environmental Data & Governance Initiative
Governmental accountability around public data & the environment. Decentralized web. Intersection of tech & ethics & civics.


Thursday January 9, 2020 10:15am - 11:45am EST
Forest Glen
  Forest Glen, Business Meeting

10:15am EST

Do you have a labeling problem? Three tools for labeling data
The ESIP community and others in machine learning regularly lament the lack of labeled datasets, needed for certain classes of training algorithms. Generating accurate, useful labels is a hard problem, with no general automated solution in sight. Thus, labeling generally involves human effort, which is challenging because the volume of data needed for training can be very large.

Tools exist to help in labeling data. This session will demonstrate three labeling tools and associated processes:
  • Image Labeler, a fast, scalable cloud-based tool to facilitate the rapid development of Earth science event databases, to aid in automated ML-based image classification, Rahul Ramachandran
  • Labelimg, an open source graphical image annotation tool, https://github.com/tzutalin/labelImg, Ziheng Sun
  • Bokeh, a Python based plotting and annotation tool set for building arbitrary labeling workflows, https://bokeh.org/, Jim Bednar
Time permitting, the session will conclude with a short discussion of thoughts and tradeoffs about the tools.

This session is followed by a hands-on workshop for using Labelimg and Bokeh. Please see the session abstract for "Hands on Labeling Workshop" for information on preparing for that workshop if you are interested in participating.

Presentations
https://doi.org/10.6084/m9.figshare.11629110.v1
https://doi.org/10.6084/m9.figshare.11591739.v1

View Recording: https://youtu.be/3ufBOoD3M1E

Takeaways
  • Machine learning based classification applications require high-quality labelled data sets for both model training and evaluation. There are many existing tools for labeling images (including earth science data), but labeling tasks are very labor and time intensive.
  • If the pre-built labeling tools don’t work for your problem, Anaconda provides a general-purpose labeler-building toolkit based on Bokeh for Python users; see https://examples.pyviz.org/ml_annotators/ml_annotators.html
  • There is opportunity in combining partly automated, partly human labeling, to automate the easy cases while leaving the final call to a person. Currently not much tool support or good practices, hard to integrate.The art of avoiding extra work!

Speakers
avatar for Ziheng Sun

Ziheng Sun

Research Assistant Professor, George Mason University
My research interests are mainly on geospatial cyberinfrastructure and agricultural remote sensing.
avatar for Anne Wilson

Anne Wilson

Senior Software Engineer, Laboratory for Atmospheric and Space Physics
avatar for Yuhan (Douglas) Rao

Yuhan (Douglas) Rao

Postdoctoral Research Scholar, CISESS/NCICS/NCSU


Thursday January 9, 2020 10:15am - 11:45am EST
Glen Echo
  Glen Echo, Breakout

10:15am EST

Connecting Data with Data Usage: a Graph Approach
We will investigate graph-based methods of connecting data with the uses made and the knowledge gained from those data, from science research to applications to strategic planning. We will examine the diverse capabilities enabled by connecting uses with data for a variety of stakeholders, and explore how to connect existing knowledge graphs together to scale out across the ESIP federation and related communities toward an inter-connected mega-graph.

0-5 min: Chris Lynnes (NASA): Documenting how data matters...
5-15 min: Doug Newman (NASA): EOSDIS Knowledge Graph
https://doi.org/10.6084/m9.figshare.11561805.v1
15-25 min: Reid Sherman (GCIS): Global Change Information System
https://doi.org/10.6084/m9.figshare.11560011.v1
25-35 min: Dave Blodgett (USGS): SELFIE
https://doi.org/10.6084/m9.figshare.11559093.v1
35-45 min: Joe Conran (NOAA): Interagency Coordination of Satellite Needs
https://doi.org/10.6084/m9.figshare.11561946.v1
45-55 min: Wil Doane (IDA): Assessing the Impact of Land Imaging
https://doi.org/10.6084/m9.figshare.11561913.v1
55-90 min: The Way Forward:
1 - Got Use Case?
2 - ESIP Cluster? https://www.esipfed.org/get-involved/collaborate
3 - Who's In?

Session Notes

View Recording:
https://youtu.be/yi05crW6Ya0\

Takeaways
  • How to connect data with the uses of that data = Documenting how data matter.
    Federating knowledge bases is daunting task but possible.
  • Connect research and data to place (but gap around using place identifiers in linked data).
    Discussion of potentially make a new cluster or using another one. Decision to recharter/repurpose/rename the data discovery cluster.
  • Sin of computer science is giving people impression that things are mostly 1 to 1 relationship, but more accurately life and universe is full of many-to-many relationships, i.e., graph databases > RDBMS




Speakers
avatar for Christopher Lynnes

Christopher Lynnes

Systems Architect, NASA/EOSDIS, NASA/GSFC
Christopher Lynnes is currently System Architect for NASA’s Earth Observing System Data and Information System, known as EOSDIS. He has been working on EOSDIS since 1992, over which time he has worked multiple generations of data archive systems, search engines and interfaces, science... Read More →
avatar for Doug Newman

Doug Newman

EED Data Use Architect


Thursday January 9, 2020 10:15am - 11:45am EST
White Flint
  White Flint, Panel

12:00pm EST

License Up! What license works for you and your downstream repositories?
Many repositories are seeing an increase in the use and diversity of licenses and other intellectual property management (IPM) tools applied to externally-created data submissions and software developed by staff. However, adding a license to data files may have unexpected or unintended consequences in the downstream use or redistribution of those data. Who “owns” the intellectual property rights to data collected by university researchers using Federal and State (i.e., public) funding that must be deposited at a Federal repository? What license is appropriate for those data and what — exactly — does that license allow and disallow? What kind of license or other IPM instrument is appropriate for software written by a team of Federal and Cooperative Institute software engineers? Is there a significant difference between Creative Commons, GNU, and other ‘open source licenses’?

We have invited a panel of legal advisors from Federal and other organizations to discuss the implications of these questions for data stewards and the software teams that work collaboratively with those stewards. We may also discuss the latest information about Federal data licenses as it applies to the OPEN Government Data Act of 2019. How to Prepare for this Session: Consider what, if any, licenses, copyright, or other intellectual property rights management you apply or think applies to your work. Also consider Federal requirements such as the OPEN Government Data Act of 2019, Section 508 of the Rehabilitation Act of 1973.

Speakers:
Dr. Robert J. Hanisch is the Director of the Office of Data and Informatics, Material Measurement Laboratory, at the National Institute of Standards and Technology in Gaithersburg, Maryland. He is responsible for improving data management and analysis practices and helping to assure compliance with national directives on open data access. Prior to coming to NIST in 2014, Dr. Hanisch was a Senior Scientist at the Space Telescope Science Institute, Baltimore, Maryland, and was the Director of the US Virtual Astronomical Observatory. For more than twenty-five years Dr. Hanisch led efforts in the astronomy community to improve the accessibility and interoperability of data archives and catalogs.
Henry Wixon is Chief Counsel for the National Institute of Standards and Technology (NIST) of the U.S. Department of Commerce. His office provides programmatic legal guidance to NIST, as well as intellectual property counsel and representation to the Department of Commerce and other Department bureaus. In this role, it interacts with principal developers and users of research, including private and public laboratories, universities, corporations and governments. Responsibilities of Mr. Wixon’s office include review of NIST Cooperative Research and Development Agreements (CRADAs), licenses, Non-Disclosure Agreements (NDAs) and Material Transfer Agreements (MTAs), and the preparation and prosecution of the agency’s patent applications. As Chief Counsel, Mr. Wixon is active in standing Interagency Working Groups on Technology Transfer, on Bayh-Dole, and on Research Misconduct, as well as in the Federal Laboratory Consortium. He is a Certified Licensing Professional and a Past Chair of the Maryland Chapter of the Licensing Executives Society, USA and Canada (LES), and is a member of the Board of Visitors of the College of Computer, Mathematical and Natural Sciences of the University of Maryland, College Park.

Presentations
See attached

View Recording: https://youtu.be/5Ng5FDW1LXk.

Takeaways



Speakers
DC

Donald Collins

Oceanographer, NESDIS/NCEI Archive Branch
Send2NCEI, NCEI archival processes, records management


Thursday January 9, 2020 12:00pm - 1:30pm EST
Forest Glen
  Forest Glen, Panel

12:00pm EST

Hands-on labeling workshop
Intended as a follow on to the "Do You Have a Labeling Problem?" session and to get your feet wet, this working session is for people to experiment with two of the tools presented in that session, Labelimg and Bokeh. Presenters will provide some sample data for participants to work with. Attendees can also bring some of their own data to work with in the time remaining after the planned activities.

It would be best for workshop participants to preinstall Labelimg before coming to the session.   Regarding Bokeh, Anaconda is providing 25 accounts for workshop participants. (Thank you, Jim and Anaconda!).  Installing Bokeh is also an option.  Links for getting these tools are:
  • Labelimg via https://github.com/tzutalin/labelImg#installation
  • Bokeh as part of the HoloViz suite via http://holoviz.org/installation.html

Presentations

View Recording: https://youtu.be/y8NqTLgT8Ao

Takeaways


Speakers
avatar for Ziheng Sun

Ziheng Sun

Research Assistant Professor, George Mason University
My research interests are mainly on geospatial cyberinfrastructure and agricultural remote sensing.
avatar for Anne Wilson

Anne Wilson

Senior Software Engineer, Laboratory for Atmospheric and Space Physics
avatar for Yuhan (Douglas) Rao

Yuhan (Douglas) Rao

Postdoctoral Research Scholar, CISESS/NCICS/NCSU


Thursday January 9, 2020 12:00pm - 1:30pm EST
Glen Echo
  Glen Echo, Workshop

12:00pm EST

Datacubes for Analysis-Ready Data: Standards & State of the Art
This workshop session will follow up on the OGC Coverage Analytics sprint, focusing specifically on advanced services for spatio-temporal datacubes. In the Earth sciences datacubes are accepted as an enabling paradigm for offering massive spatio-temporal Earth data analysis-ready, more generally: easing access, extraction, analysis, and fusion. Also, datacubes homogenizes APIs across dimensions, allowing unified wrangling of 1-D sensor data, 2-D imagery, 3-D x/y/t image timeseries and x/y/z geophysics voxel data, and 4-D x/y/z/t climate and weather data.
Based on the OGC datacube reference implementation we introduce datacube concepts, state of standardization, and real-life 2D, 3D, and 4D examples utilizing services from three continents. Ample time will be available for discussion, and Internet-connected participants will be able to replay and modify many of the examples shown. Further, key datacube activities worldwide, within and beyond Earth sciences, will be related to.
Session outcomes could take a number of forms: ideas and issues for OGC, ISO, or ESIP to consider; example use cases; challenges not yet addressed sufficiently, and entirely novel use cases; work and collaboration plans for future ESIP work. Outcomes of the session will be reported at the next OGC TC meeting's Big Data and Coverage sessions. How to Prepare for this Session: Introductory and advanced material is available from http://myogc.org/go/coveragesDWG

Presentations
https://doi.org/10.6084/m9.figshare.11562552.v1

View Recording: https://youtu.be/82WG7soc5bk

Takeaways
  • Abstract coverage construct defines the base which can be filled up with a coverage implementation schema. Important as previously implementation wasn’t interoperable with different servers and clients. 
  • Have embedded the coordinate system retrieved from sensors reporting in real time into their xml schema to be able to integrate the sensor data into the broader system. Can deliver the data in addition to GML but JSON, and RDF which could be used to link into semantic web tech. 
  • Principle is send HTTP url-encoded query to server and get some results that are extracted from datacube, e.g., sources from many hyperspectral images.

Speakers

Thursday January 9, 2020 12:00pm - 1:30pm EST
White Flint