Update: The proccedings from the meeting are now online.
Every year, the three U.S. Department of Energy (DOE) weapons laboratories (LANL, LLNL and SNL) organize a workshop on the state of high-performance computing and computational science. By long tradition, the meeting is held at the Salishan lodge on the Oregon coast. The attendees are drawn from the three weapons laboratories, the DOE Office of Science laboratories, other computing-intensive elements of the U.S. government (e.g., the NSA and DoD HPC Modernization Program), the NSF supercomputing centers, key academic researchers and industry high-performance computing leaders. I've been attending for many years, both as the former NCSA director and as a federal science policy wonk.
From Petascale to Exascale
The technology and economic hurdles to build an exascale system were the opening topics. Peter Kogge opened first full day of the meeting by outlining the device physics constraints technologies needed for exascale. Peter took care to note that although he is chairing the DARPA exascale study group, he was reporting his own opinions. He then estimated that billion-way parallelism would be required to reach exascale rates. Bill Camp (Intel), Chuck Moore (AMD) and Michael Paolini (IBM) echoed these challenges, from the corporate side.
All of this reminded me of the original petaflops meeting in 1994. (It also reminded me that I am getting older, but that's a different topic.) We met in the Pasadena Hilton, hosted by Paul Messina and Caltech, to debate the architectural, system software, programming and application challenges and opportunities. It was a stimulating event, later summarized in a book, and Seymour Cray gave the opening keynote, a rare treat itself. This was followed by a series of detailed planning meetings that culminated in a proposed federal program in petascale computing. Alas, it was not funded, despite the efforts of many of us.
At the time, there were considerable doubts that CMOS and clustered systems could scale to the desired levels. At the time, a teraflop was still a rare and precious thing, and many believed we would need more exotic superconducting technologies to reach petascale. We were wrong then, but we are now approaching some fundamental physical limits. The bounds on CMOS junction voltages are near, and the number of atoms on a transistor junction is countable. Parallelism, some new technologies or radical new approaches (quantum computing or biological computing), will be required. All of this is driving the multicore revolution.
In addition to exascale computing, the other debate at Salishan concerned programming models for multicore chips and the right mix of on-chip functions. There were the inevitable discussions of FPGAs and GPU accelerators, but broad realization that more general solutions would be required, particularly given the parallel programming challenges and the need for code portability. Several of us also reminded the audience that Amdahl's law still applied.
Achieved multicore performance, as with any parallel application, depends on the degree of parallelism and the remaining serial fraction. It also led to a discussion of parallel programming models. In this spirit, several speakers mentioned the Microsoft-Intel Universal Parallel Computing Research Centers (UPCRCs) at UC-Berkeley and UIUC, including Kathy Yelick and Vikram Adve. I also discussed the rationale for this investment in my after dinner talk.
The After Dinner Talk
This year, Adolfy Hoisie (LANL) asked me to give the after dinner presentation, which is normally the evening of the second day. An after dinner talk is always fraught with danger, as the desired attributes are brevity, humor, brevity and insight. Oh, and did I mention brevity? Such events are especially challenging when the after dinner topic is heterogeneous multicore futures. I am sure all of you see the clear and abundant opportunities for situational humor in multicore. Perhaps more importantly, I was the lone remaining obstacle between the attendees and an evening of informal technical conversations over wine.
Gathering my courage, I tried to make three points. First, we are at a parallel computing triple point, the confluence of (a) technology constraints (chip power and CMOS limitations), (b) rapidly rising application complexity and (c) diverse parallel programming models. Rarely during the past twenty years have been so "surrounded by opportunities" to define the future of computing.
Second, I argued that we must embrace multicore heterogeneity and identify those functions best implemented in hardware. Multicore means much more than a plethora of cookie-cutter homogeneous cores. We can mix in-order and out-of-order cores, network protocol processing, encryption, signal and image processing, rendering and other functions on chip in innovative ways. I likened this situation the choice between a package of standard sugar cookies and a box of designer chocolates. We get to choose, and we must choose wisely and well.
Third, we must explore this heterogeneous multicore space aggressively and at scale, while raising the level of programming abstraction. Parallel computing must move beyond the "hero developer" model to become mainstream.
This means building a set of prototypes – hardware, system software and applications – and evaluating each for performance, programmability and manufacturing cost. It also means accepting the possibility of failure. As I noted, research is what you do when you don't know what you are doing. Development is what you do when you do know what you are doing. We should not confound these two. I believe we are too risk averse in our research agenda.
In this spirit, I highlighted some of the recommendations of the PCAST report on computing research, noted that Microsoft's Chuck Thacker was building the BEE3 emulation board for architectural exploration and that Microsoft and Intel were encouraging the academic community to explore multicore options aggressively and expansively.
At this point, it was time for a drink.