- This event has passed.
Workshop Event: Frontiers of Research Computing
October 30, 2017 - October 31, 2017
The Center for Institutional Research Computing (CIRC) is pleased to announce a two-day seminar event focusing on the state-of-the-art and future of research computing, to be held on October 30 and 31 2017 from 8:30am to 5:00pm in Chinook 150.
The workshop will present plenary speakers from across the US to discuss the role that leadership computing plays in academic research. These presentations will be complemented by talks from our faculty as well as roundtable discussions on what research computing will look like in the years to come.
Schedule of Events – October 30th
8:30 am – Coffee and refreshments
8:50 am – Welcoming Remarks
10:40 am – JACK WELLS Oak Ridge National Laboratory (Director of Science: National Center for Computational Sciences) Approaching Exascale Computing: Energy constraints and diverse science requirements drive new paradigms in high-performance computing
11:30 am – Lunch Break (with Roundtable Discussions)
3:00 pm – Coffee and Networking Break
4:30 pm – Roundtable Discussions
5:30 pm – End of Day 1
Schedule of Events – October 31st
8:30 am – Coffee and refreshments
8:50 am – Welcoming Remarks
11:30 pm – Lunch Break with Roundtable Discussions
3:30 pm – Coffee and Networking Break
4:00 pm – Recommendations for investments/activities in CIRC from roundtable discussions
5:00 pm – End of Event
Guest Speakers – Bios and Presentations
Alan Craig, PhD
National Science Foundation – Extreme Science and Engineering Discovery Environment
Humanities, Arts, and Social Science Specialist
Alan B. Craig, PhD, is a consultant to the National Science Foundation’s Extreme Science and Engineering Discover Environment (XSEDE). His focus with XSEDE is to explore the application of high performance computing (HPC) in the humanities, arts, and social science and to engage with scholars around the nation who have interest in using HPC resources in support of their scholarship and teaching. Prior to his role at XSEDE he was a research scientist at the National Center for Supercomputing Applications (NCSA) and the Associate Director for Human-Computer Interaction at the Institute for Computing in Humanities, Arts, and Social Science. Additionally, he has been developing and studying virtual reality and augmented reality, and particularly their applications in education since the early 1990s. He has authored three books, including: Understanding Virtual Reality; Developing Virtual Reality Applications; and Understanding Augmented Reality, and holds three patents.
PRESENTING: High Performance Computing and Data Analysis in the Humanities, Arts, and Social Sciences
Data abounds that is of interest to scholars in the humanities, arts, and social science. High performance computing offers the opportunity to analyze large collections of data to aid in answering questions of interest to humankind, as well as for deriving new questions of interest. In this talk I will address the kinds of questions and problems that scholars in humanities, arts, and social science face with big data from large text collections, image collections, video collections, network databases and more, and discuss examples of projects that are currently underway. Likewise, I will discuss how data can be created that is of interest to scholars in these communities. I will address how to get started using high performance computing and in particular, the resources that are available from the National Science Foundation’s Extreme Science and Engineering Discovery Environment (XSEDE). These resources are free to scholars including those in humanities, arts, and social science. In addition, I will present tools, including gateways, that are in development to aid in doing analyses of interest.
Matthew Horton, PhD
University of California, Berkeley
Lawrence Berkeley National Laboratory
Following an undergraduate degree and masters in Materials Science at the University of Cambridge, Matthew earned his PhD at Imperial College London on the study of extended defects in device materials used for energy-efficient lighting. He is currently a postdoc at Lawrence Berkeley National Lab and UC Berkeley, working on a variety of projects within the Materials Project and for its industrial collaborators.
PRESENTING: The Materials Project: High-Throughput Materials Design and Discovery
The Materials Project is an open, collaborative effort to develop tools for the large scale, automated simulation of materials. High-throughput materials simulation has only become possible in recent years, but promises to pioneer the search, design and development of new materials. The Materials Project now offers its database of 70,000 materials and millions of materials properties for free on its website, and develops its full software stack in the open, under a open source license, for anyone to use. The Materials Project is growing, having recently hosted its second workshop, and will soon offer ways for users to contribute data back to the Materials Project website and share their own research data online. In this talk, I will discuss the methods and opportunities of high-throughput materials simulation, the history and current status of the Materials Project, and showcase some of our recent success stories.
Pacific Northwest National Laboratory
National Security Directorate
Rob Jasper is a manager in Pacific Northwest National Laboratory’s (PNNL) National Security Directorate. He serves as Initiative Lead for the Analysis in Motion streaming analytics initiative, Deputy Director for the Northwest Regional Technology Center for Homeland Security and Lead for the Innovation District Strategy.
Before joining PNNL, Rob served as Vice President of Data Sciences at compensation analytics provider PayScale. Prior to PayScale, Rob was Chief Technology Officer at Intelligent Results, developing the first SaaS-based machine learning and analytics platform. Intelligent Results was sold to financial transactions provider First Data in 2007. At First Data, Rob was Vice President of Information Services and Chief Technology Officer of the Analytics Center of Excellence where he oversaw the design, development and support of over 20 products in the areas of fraud, analytics, and data solutions.
Additionally, Rob has served as Chief Scientist at Fizzylab, was an Adjunct Professor of Software Engineering at Seattle University, and was a Research Manager at Boeing Research and Technology. Rob was owner of Octave Labs, LLC, an independent analytics consulting firm to Ignition Partners, Vulcan, and Clearsight Systems. He is an advisory board member for the University of Washington’s Certificate Program in Big Data Technologies and Master of Software Engineering program advisory board at Seattle University. Rob received his Master of Software Engineering from Seattle University.
PRESENTING: Interactive Streaming Analytics at Scale
Technological advances in electronics, connectivity, and network bandwidth have enabled the continuous streaming of data from a diverse set of sources including scientific instruments, sensors, and consumer electronics. Analysis of the streaming data allows both humans and machines to interpret and analyze the world in ways that would be impossible without the aid of technology.
PNNL’s Analysis in Motion Initiative is a five year, multi-million-dollar research effort focused on enabling interactive streaming analytics (ISA) at scale. In his talk, Mr. Jasper will define interactive streaming analytics and the six key challenges for ISA applications. He will describe two research use cases, electron microscopy and cyber insider threat, used across the initiative to conduct research. Finally, Mr. Jasper will describe several individual research projects conducted under the initiative.
Molly Maleckar, PhD
Allen Institute for Cell Science
Director of Modeling
Enabled by the rich, high-content imaging datasets resultant from the Institute’s live cell imaging pipeline, the Modeling team focuses on predictive modeling of cell organization and function, moving from data-driven to fundamental mechanistic approaches. Dr. Maleckar joined the Allen Institute for Cell Science in January 2017, with a decade of research experience at the boundary between agile corporate and academic cultures. Research foci include multi and mesoscale models of cells and tissues, multiphysical approaches to understanding structure-function relationships, and effective methodological development and tool delivery to support the wider community.
PRESENTING: Open Science: How the Allen Institute for Cell Science wants to revolutionize cell biology
The mission of the Allen Institute for Cell Science (AICS) is to create dynamic and multi-scale visual models of cell organization, dynamics and activities that capture experimental observation, theory and prediction to understand and predict cellular behavior in its normal, regenerative, and pathological contexts. Core to this mission are the concepts of big science (broad AND deep), team science, and open science. After an Introduction to AICS workflows and a look into its scientific outputs, Dr. Maleckar will take a deeper dive into the Modeling team – its purpose, vision, and current research activities. Finally, we’ll look at next steps for simulation and modelling at AICS and some completely fresh results for a few promising techniques.
Robert Rallo, PhD
Pacific Northwest National Laboratory
Dr. Robert Rallo is the Technical Group Manager of the Data Sciences Group in the Advanced Computing, Mathematics, and Data Division at Pacific Northwest National Laboratory. Before joining PNNL, he was an Associate Professor in Computer Science and Artificial Intelligence and Director of the Advanced Technology Innovation Center (ATIC) at the Universitat Rovira i Virgili in Catalonia. Dr. Rallo served in the European Commission as chair for the Modeling WG in the NanoSafety Cluster (2013-2016) and as co-chair of the US-EU Nano-Dialogue Community of Research on Predictive Modeling and Health (2013-2015). He served also as reviewer for research organizations such as the European Research Council, Horizon2020, COST and the NWO Research Council for Earth and Life Sciences (ALW). Dr. Rallo’s research interests are in data-driven analysis and modelling of complex systems of industrial, environmental and social relevance.
PRESENTING: Data Sciences in Environmental Impact Assessment
In this talk we will give an overview on the use of Data Sciences principles to develop computational models and tools for the assessment of the environmental impact of chemicals and nanomaterials. Specific examples such as the design of scientific data management systems, the use of machine learning techniques for developing structure-activity relationships, and the embedding of computational tools into web platforms will be presented. The talk will conclude by discussing the current challenges and future research directions.
Jack Wells, PhD
Oak Ridge Leadership Computing Facility
Oak Ridge National Laboratory
Director of Science
Jack Wells is the Director of Science for the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science national user facility, and the Titan supercomputer, located at Oak Ridge National Laboratory (ORNL). Wells is responsible for the scientific outcomes of the OLCF’s user programs.
Wells has previously lead both ORNL’s Computational Materials Sciences group in the Computer Science and Mathematics Division and the Nanomaterials Theory Institute in the Center for Nanophase Materials Sciences. Prior to joining ORNL as a Wigner Fellow in 1997, Wells was a postdoctoral fellow within the Institute for Theoretical Atomic and Molecular Physics at the Harvard-Smithsonian Center for Astrophysics.
Wells has a Ph.D. in physics from Vanderbilt University, and has authored or co-authored over 100 scientific papers and edited 1 book, spanning nanoscience, materials science and engineering, nuclear and atomic physics computational science, applied mathematics, and novel analytics measuring the impact of scientific publications.
PRESENTING: Approaching Exascale Computing: Energy constraints and diverse science requirements drive new paradigms in high-performance computing
The dawn of the twenty-first century has witnessed the widespread adoption of high-performance computing (HPC) as an essential tool in the modeling and simulation of complex scientific phenomena. And today, many research institutions consider excellence in modeling, simulation and data analysis via HPC to be essential in solving forefront problems in science and society. To this end, the United States (U.S.) Department of Energy’s (DOE) planned deployment of the Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF) in the 2018 timeframe will increase the computing capability available within the U.S. by an order of magnitude, for a performance of up to 200 petaflops. The system will result in a 5-10x increase in scientific capability, compared to today’s Titan supercomputer at ORNL. This increase in computing capability enables scientists in industry, universities, national laboratories, and other federal agencies to pursue ever more challenging research questions that in turn drive the need for even more powerful systems.
Looking forward toward the age of exascale computing, the DOE is engaged in an ambitious enterprise, integrating the Exascale Computing Project (ECP) (exascaleproject.org) and the computing facilities at major DOE Laboratories, such as Oak Ridge National Laboratory (ORNL), to procure and deploy exascale supercomputers in the 2021 to 2023 time frame to deliver 50x to 100x today’s capabilities. Supporting this effort is a wide range of research-community-engagement activities, including exascale ecosystem requirements workshops sponsored by the DOE Office of Science held over the past two years (exascaleage.org).
These combined efforts must address the key technical challenges to reach exascale computing capabilities: massive parallelism, memory and storage efficiencies, reliability, and energy consumption. Focusing on experiences within DOE’s Leadership Computing Facility Program at ORNL (OLCF), this presentation will highlight OLCF’s user programs, and the goals, current status, and next steps for DOE’s Summit project. I will highlight requirements from DOE’s Office of Science users for integrated compute- and data-intensive capabilities that are likely to drive new operational paradigms within HPC centers of the future, such as data-intensive machine-learning applications, and the integration of high-throughput computing and high-performance computing workloads.
WSU Faculty Abstracts
Associate Professor, Department of Civil and Environmental Engineering
Integrated Modeling to Inform Agricultural and Natural Resource Management Decisions
As managers of agricultural and natural resources are confronted with uncertainties in global change impacts, the complexities associated with the interconnected cycling of nitrogen, carbon, and water present daunting management challenges. Existing models provide detailed information on specific sub-systems (land, air, water, economics, etc). An increasing awareness of the unintended consequences of management decisions resulting from interconnectedness of these sub-systems, however, necessitates coupled regional earth system models (EaSMs). Decision makers’ needs and priorities can be integrated into the model design and development processes to enhance decision-making relevance and “usability” of EaSMs. BioEarth is a current research initiative with a focus on the U.S. Pacific Northwest region that explores the coupling of multiple stand-alone EaSMs to generate usable information for resource decision-making. Direct engagement between model developers and non-academic stakeholders involved in resource and environmental management decisions throughout the model development process is a critical component of this effort. BioEarth utilizes a “bottom-up” approach, upscaling a catchment-scale model to basin and regional scales, as opposed to the “top-down” approach of downscaling global models utilized by most other EaSM efforts. This paper describes the BioEarth initiative and highlights opportunities and challenges associated with coupling multiple stand-alone models to generate usable information for agricultural and natural resource decision-making.
Associate Professor, School of Mechanical and Materials Engineering
Modeling Based Design of Materials for Next-Generation Energy Conversion and Storage Devices
Addressing the energy challenge is reliant upon design of molecularly tailored materials that can be processed in a cost-effective manner to manufacture highly efficient energy conversion and storage devices such as batteries and solar cells. Identifying ideal materials for these devices requires (i) developing a detailed understanding of the key physical processes that determine device performance and (ii) establishing a relationship between the structural and functional properties of the materials, which constitute various components, and the overall performance. In this talk, Dr. Banerjee will introduce ongoing research efforts in his group, the computational nanoscience laboratory, towards modeling transport phenomena, electrochemistry and self-assembly occurring in liquid phase and liquid-solid and solid-solid interfaces relevant to batteries and solar cells. He will illustrate though examples, in the realm of batteries and thin-film photovoltaics, that atomistic simulations are powerful tools that can leverage recent advancements in high performance computing to provide mechanistic insights and also help identify novel materials for energy devices. For instance, in lithium batteries, the choice of ideal electrolytes that facilitate the transport of lithium ions and are stable against respective electrodes is critical in enhancing the electrochemical performance. In perovskite solar cells, the selection of solution processed photoactive layers with ideal electronic properties and morphology for charge transfer and transport determine the power conversion efficiency. Dr. Banerjee will provide an overview of how his group is working towards addressing these complex requirements through modeling efforts that span a range of length scales.
Associate Professor, School of the Environment
Computational Geodynamics: Why sometimes a rock hammer just won’t do
Geology/Earth Science is driven by a desire for deeper understanding our planet. Often geologists use observations and analyses of rocks and/or the Earth’s surface as their primary investigative tools. Others explore the planet’s interior using the application of various disciplines in physics such as acoustic waves, gravitational attraction, and magnetism. There is, however, a collection of us that take an entirely different approach to geology – modeling the dynamics of the Earth’s surface and interior. I will give a short overview of the type of numerical modeling I use, the complexities of modeling Earth processes, and why many geologists have put down the rock hammer and, instead, became early adapters of high performance computing.
Assistant Professor, School of Biological Sciences
A Population Genetics Approach to the Study of Host and Microbe Interactions
Disentangling how mutation, recombination, and selection interact to shape genetic variation and determine the genomic architecture of organisms are central questions in evolutionary biology of hosts and microbes. How important is homologous recombination for the evolution of traits involved in host shifts or adaptation to new environments? During which stages of the complex life cycle of organisms do we expect to find hotspots of adaptation? How do changes in the recombination landscape of the genome affect the process of accumulation of deleterious mutations and the spread of adaptive variants? To understand how these processes shape genomic variation, my lab applies a multidisciplinary approach that uses high performance computational tools in combination with population genetic/genomic analyses, phylogenetics, statistical methods, metagenomic analysis and simple mathematical modeling and wet lab experiments. Developing a better understanding of the forces shaping the genetic architecture of organisms will have enormous implications on the design of strategies for predicting susceptibility, drug resistance, and disease load in populations and species of interest. More importantly, it will allow us to better understand the probable outcomes from the interactions between hosts and microbes.
In my talk, I will discuss recent work showing how simulations and dense genotyping data from full genome sequencing of a large number of genomes can be used to better understand the evolution of DARC (the Duffy locus) in humans. DARC is a classic example of how selection from parasites have shaped genetic variation in humans. We are extending these analyses to other loci potentially involved in resistance to infectious diseases and expect to gain a deeper understanding of how parasites shape genetic variation in their hosts and apply it to other parasite-host relationships.
Assistant Professor, School of Electrical Engineering and Computer Science
Machine Learning for Data-Driven Science and Engineering
We are witnessing the rise of the “data-driven” paradigm, in which massive amounts of data can be analyzed to make sense of the data, and to make useful predictions. In this talk, I will provide a high-level overview of some of the research projects in my group to advance this paradigm: 1) To fully realize the promise of Big Data (e.g., text, images, videos, speech), we need automated systems that can transform unstructured data to structured format (e.g., resolving coreferences of entity and event mentions in a piece of text, interpreting a visual scene, translating from one language to another) so that we can query and reason with this data; 2) With Moore’s law aging quickly, we need innovative high-performance and energy-efficient computing systems for emerging Big Data applications; 3) Humans can improve the speed of their reasoning processes with experience. For example, as child learns to read or play chess, the reasoning processes involved become more automatic and perform better per unit time. How can we develop this learning capability of humans so that computers can automatically improve their speed for processing large-scale data?; and 4) How can we develop data-driven cyber-physical systems that are proactive by developing models to predict the future events and activities?
Assistant Professor, Department of Horticulture
A Dynamic Local and National Cyberinfrastructure in Support of Large-Scale Systems-Biology Analyses
Many complex traits of agricultural plants, such as disease resistance for example, have an important effect on human and environmental health and food security. Therefore, it is of high interest to understand the genetic and environmental context in which multiple genes will interact to affect important traits. To this end, Systems Genetics, a branch of Systems Biology, combines data collection and management (genomics, genetics, etc.), modeling (mathematics and statistics), and computing to predict the interaction of gene products underlying important complex traits. Fortunately, large online repositories, such as the National Center for Biotechnology Information (NCBI) Short Read Archive (SRA), house data collected from previous large-scale genomic experiments. As a result, potential for inter and intra-species systems-level discoveries exist by utilizing these large compendia of samples. However, processing these data to model inter and intra-species gene interactions requires appropriate Cyberinfrastructure for data storage, transfer and processing. To meet this end, the NSF-funded SciDAS (Scientific Data at Scale) project aims to address these needs for any large-scale analysis. This presentation reports on the efforts of SciDAS to construct a large-scale dynamic cyberinfrastructure that integrates both local and national resources, and a SciDAS-enabled Systems Biology use case for large-scale multi-species gene-interaction analysis.
Assistant Professor, Department of Physics and Astronomy
Neutron Stars and Superfluid Quantum Turbulence from Fermionic Density Functional Theory
Rotating neutron stars generally slow down as they lose angular momentum, but occasionally suddenly start rotating faster. This phenomenon, called a pulsar glitch, is thought to be caused by a superfluid on the interior of the star transferring angular momentum to the crust. Despite nearly half a century of study, the key microscopic mechanism of vortex pinning has remained unquantified due to the computational complexity of the problem. Recently, through a series of technical advances, and the availability of high performance computers, we have been able to quantify this basic ingredient by measuring the vortex-pinning force with real-time dynamical simulations.
In this talk I will discuss how we use a custom superfluid density functional theory (DFT) and leadership class computing to perform real-time simulations of superfluid dynamics. This approach exhibits both strong and weak scaling, and effectively uses the GPU accelerators responsible for the majority of the computing power at many leadership class facilities, enabling us to make scientific breakthroughs in nuclear astrophysics, fission, cold atom physics, and opening a new regime in which to study quantum turbulence.
Assistant Professor, School of Electrical Engineering and Computer Science
Effective Parallelization of Graph Algorithms in Computing Applications
Graph algorithms are critical kernels in many scientific computing and data analytics applications. When problems are of large scale, the graph computations need to be performed in parallel. Parallelizing graph algorithms with scalability and performance as a goal is especially challenging because graph algorithms characteristically involve irregular and memory-intensive computations and they generally permit low concurrency. In this talk, I will give an overview of methods we have developed over the years that proved effective in overcoming this challenge in the context of various graph problems and different parallel computing platforms. The platforms we consider include multicore and massively multithreaded architectures as well as distributed-memory platforms consisting of thousands of processors.
Assistant Professor, Department of Civil and Environmental Engineering
AIRPACT Regional Air Quality Forecasting System
Air quality (AQ) modeling is of particular value due to the well-known health risks by air pollution. Exposure to poor AQ can increase various adverse health outcomes such as asthma, heart, respiratory, and cardiovascular diseases. Nearly 3 million deaths are attributed to poor ambient AQ. Federal and State agencies and the public can use AQ forecast results to take actions to minimize health impacts.
AQ is a complex process involving direct emissions, chemical and physical transformations, advection, and removal, which strongly depends on meteorology (e.g., temperature, relative humidity, and precipitation). AQ forecasting is commonly based on a numerical model called 3-D Chemical Transport Model (CTM) that explicitly represents the atmospheric processes influencing AQ and describes the causal relationship of AQ with emissions, meteorology, deposition, and other factors.
We, Laboratory for Atmospheric Research (LAR) at WSU, run an operational regional AQ forecast over Pacific Northwest, AIRPACT, (http://lar.wsu.edu/airpact/), which is a CTM-based AQ forecast. AIRPACT uses a 4-km gridded domain over Pacific Northwest and provides 48-hr forecasts every day. It requires a large input data, approximately tens of gigabyte, that consists of spatial and temporal information of pollutant emissions, meteorology and initial and boundary conditions as input information. With these input data, AIRPACT solves for 155 air pollutant species concentration using mass conservation equation for each grid cell (~2.7 millions of grids) and each time step (varies from a few seconds to minutes) that accounts for advection, diffusion, gas and aerosol chemistry, aerosol dynamics, cloud and aqueous chemistry and removal processes. One-day AIRPACT forecast takes ~1.75 hours of computer time using 120 CPU processors, handling hundreds of gigabyte data as an input and output data daily. Clearly, AQ modeling requires High Performance Computing (HPC). Currently AIRPACT is running on VCEA HPC Aeolus cluster, and LAR is working on setting up a new AIRPACT system in the Kamiak cluster.
Assistant Professor, Department of Physics and Astronomy
Advances in Condensed Matter Physics with High Performance Computing
Abstract: Condensed matter physics is the largest branch of physics, which considers the physical properties of condensed phases of matter. The understanding obtained has arguably had the greatest impact on both our understanding of the Universe and on our daily lives. The underlying physical laws necessary for the mathematical theory are completely known. The exact application of them, however, leads to equations that are much too complicated to be solvable. The development of approximate practical methods though allows accurate numerical answers to be obtained. With high performance computing, these methods can be applied to study realistic systems. In the McMahon Research Group, we develop and apply such methods to address problems at the forefront of this field. In this talk, I will discuss these efforts. Results from recent applications will be presented.
Assistant Professor, Integrative Physiology and Neuroscience
Modeling Molecular Motors in Muscle Contraction