I am seeking a post-doctoral research associate to join a National Science Foundation (NSF) project on high-performance computing system reliability and energy efficiency modeling. If you are interested, or you know someone who might be interested, the details of the position are below.
Post-Doctoral Research Associate
Computer System Reliability and Energy Efficiency Modeling Research Project Description
As node counts for high-performance computing systems grow to tens of thousands and with proposed exascale systems likely to contain hundreds of thousands of nodes, overall system reliability and energy consumption are increasingly critical issues. New approaches to balance hardware, software and support costs are needed to address systemic resilience Likewise, the rising energy requirements of ever-larger high-performance computing systems now pose limits on the practicality of their deployment, due to both energy availability and cost.
To make larger systems useable and cost effective, we must develop and adopt new design and operational models that embody two important realities of large-scale systems: (a) frequent hardware component failures are part of normal operation and (b) system and application optimization must be multivariate, including energy cost and efficiency as complements to performance and scalability. New design ideas drawn from commercial cloud computing, including adaptable designs for hardware failure and energy efficiency, are needed if proposed exascale designs are to be feasible, much less practical.
The project focuses on development of (a) scalable, analytic and simulation models for hardware performability (performance plus reliability) based on the principle of near-complete decomposability, (b) assessment and sizing of zero-touch field replaceable hardware modules (FRMs) to reduce hardware repair errors and total cost of ownership (TCO) models, (c) energy-aware batch scheduling models that incorporate bounds on energy availability and energy costs and (d) user resource allocation cost models with energy as a cost proxy.
Candidates should have a PhD in computer science, electrical and computer engineering or an allied discipline. Experience in computer system simulation, analytic modeling, high-performance computing systems and parallel applications is highly desirable. The successful candidate will be expected to work independently on original research problems related to the project and help coordinate the activities of PhD students.
Desired Start Date
For more information, please contact
Professor Daniel A. Reed email@example.com
2660 University Capitol Centre +1 319-335-2132
Iowa City, Iowa 52242 USA www.hpcdan.org