"Eighty to ninety percent of life is showing up." The line has been variously attributed to Yogi Berra, Woody Allen or even an anonymous wag. It's wise, though obvious advice – showing up and doing the expected generally allows one to avoid a host of problems. Appearing for jury duty avoids one being held in contempt of court, and you can't fly if you don't show up at the airport on time. I was reflecting on the implications of "showing up" while at a recent meeting in Italy.
Show Up and See What Happens
My friend, Dave Turek, IBM's Vice President for Deep Computing, once explained IBM's open source and Linux strategy by saying that IBM had a deeply considered, two phase strategy for Linux and clusters for HPC, "Show up and see what happens." As he once remarked at an NCSA Private Sector Partners (PSP) meeting, "We've showed up. Now, we are waiting to see what happens."
At NCSA, we partnered with IBM in 2001 to deploy two of the first large-scale commodity clusters for open scientific use: two 1 teraflop systems based on Intel Pentium III and Itanium processors. At the time, this was a radical, almost heretical idea – deploying commodity PC clusters as production HPC platforms. Of course, such commodity clusters now dominate the Top500 list.
In a reprise of this experience, Microsoft and NCSA recently partnered to deploy Windows HPC Cluster 2008 on the latest incarnation of commodity cluster hardware. (The customer story has the technical details). I don't generally evangelize for Microsoft products in this blog, but I was very impressed that Windows HPC Cluster achieved substantially higher performance on the same hardware than did Linux. Microsoft, in the form of Kyril Faenov's HPC team, has definitely "showed up" in this space in a big way, and I think there are great opportunities to offer not only Windows compute clusters but also backend acceleration for desktop applications. Of course, all of this is ultimately connected to the ferment in cloud computing.
Avoid the Obviously Wrong
At the recent Cetraro meeting on High-Performance Computing and Grids, Miron Livny extended the "show up and see what happens" maxim by offering a corollary, "Show up and avoid doing something stupid." His observation was that evolutionarily, human success was defined by avoiding being trampled by a woolly mammoth, eaten by a hungry Bengal tiger or falling into a crevasse.
The computing implication of Livny's corollary is that one should do reasonable things when presented with opportunities. In terms of research infrastructure, this means avoiding our academic tendency to delight in second system syndrome – building complex systems that embody all of our personally favorite features without determining if they are either needed or useful.
At Cetraro, we debated the impact of the multicore revolution, the similarities and differences between Grids and clouds, and the commonalities between future exascale systems and the architecture of megascale data centers. (By the way, if you have not read the Department of Energy's exascale computing study, I highly recommend it.)
There are deep technical challenges in all of these areas. However, we must avoid being trampled by the woolly mammoths; this domain is fraught with academic, government and industrial politics. I believe we need a wider dynamic range (time horizon, risk/reward and fiscal scale) of research and development projects if we are to solve these problems.
I have made this point many times, most recently as part of the PCAST report on the U.S. NITRD program. I am scheduled to testify about this again to the House Science and Technology Committee on July 31. I will report on the hearing in August.
Do Simple Things Quickly
At the same Cetraro meeting, I opined that there was a second corollary, "Do the obvious, simple things quickly." I think this is the key lesson to be drawn from web2.0 mashups, and the rapid evolution of commercial clouds. The simplicity of the APIs and hosted infrastructure encourages external groups to innovate rapidly. We have seen the clear evidence of this in the explosive growth in social networking sites and in the hosted services that have appeared.
By contrast, I think this is one of the places we have struggled with academic Grids. The software has often been too complex, and this complexity has been engendered by the distributed nature of the participating organizations, requiring "glue code" to integrate disparate policies and infrastructure across virtual organizations. In contrast, mashups and cloud services can be deployed quickly (by academic standards) using very simple APIs and service level agreements (SLAs). It will be interesting to see how the Grid/Cloud mashup evolves.