The NSF-funded Pangeo project (http://pangeo.io/) is a community-driven architectural framework for big data geoscience. A typical Pangeo software stack leverages Python open-development libraries including elements such as Jupyter Notebooks for interactive data analysis, Intake catalogs to provide a higher level of abstraction, Dask for scalable, parallelized data access, and Xarray for working with labeled multi-dimensional arrays of data, and can support data formats including NetCDF as well the cloud-optimized Zarr format for chunked, compressed, N-dimensional arrays.
This session includes presentations describing implementations, results, or lessons learned from using these tools, as well as some time for open discussion. We encourage attendance by people interested in knowing more about Pangeo.
Draft schedule:
Dr. Amanda Tan, U. Washington: Pangeo overview and lessons learned
Dr. Rich Signell, USGS: The USGS EarthMap Pangeo: Success Stories and Lessons Learned
Dr. Jeff de La Beaujardière, NCAR: Climate model outputs on AWS using Pangeo framework
Dr. Karl Benedict, UNM: Pangeo as a platform for workshops
Open discussion
How to Prepare for this Session:Presentations:https://doi.org/10.6084/m9.figshare.11559174.v1View Recording: https://youtu.be/VNfpGIIjL3E.
Takeaways- Pangeo is a community platform for Big Data geoscience; A cohesive ecosystem of open community, open source software, open ecosystem; Three core python packages: jupyter, xarray, Dask
- Deploying Pangeo on cloud face challenges
- Cloud costs
- Cloud skills
- Need of cloud-optimized data
- Best strategy of pangeo deployment in the changing cloud services platform
- Pangeo can be applied to leverage the jupyter notebook and other resources for different level of data users (NCAR: scientists new to cloud computing platform; University of New Mexico: workshop platform etc)