Disclaimer

  • The postings on this site are my own and don’t necessarily represent Microsoft's positions, strategies or opinions.

Twitter Updates

    follow me on Twitter
    AddThis Social Bookmark Button

    Technorati

    • Add to Technorati Favorites

    High-Performance Computing

    November 23, 2008

    Reflections on SC08

    Ok, I admit it, what I said in my previous post was wrong. There was singing at SC08. The conference included both a music room where attendees could perform and a music booth where one could lip sync to classic hits. Beyond singing, the conference broke all previous attendance records, with roughly 11,000 attendees, though I doubt singing had anything to do with that!

    Clouds and Accelerators

    "Cloud" was undoubtedly the buzz word of the conference. Like the word Grid in the past, cloud is now a tabula rasa on which research groups and companies are projecting their own definitions and spins. Somewhere, there's a Dennis Milleresque cultural reference lurking that invokes either Joni Mitchell

    I've looked at clouds from both sides now,
    From up and down, and still somehow
    It's cloud illusions i recall.
    I really don't know clouds at all.

    or The Rolling Stones

    I said, Hey! You! Get off of my cloud
    Hey! You! Get off of my cloud
    Hey! You! Get off of my cloud
    Don't hang around 'cause two's a crowd
    On my cloud, baby

    In either case, I'm too tired to emit such a pithy aphorism.

    On the hardware front, accelerators, notably GPUs, and solid state storage (SSDs) dominated the exhibit floor. NVIDIA was highly visible, and vendors large and small were demonstrating software tools for accelerator programming and for SSDs.

    Microsoft News

    Microsoft broke into the top ten of the Top500 list of the world's fastest machines, based on execution of the high-performance Linpack (HPL) benchmark atop Windows HPC Server 2008. Like all Top500 runs, this required long hours by a dedicated team of people who pushed the hardware and themselves to the absolute limit. Everyone who has done this, and I remember it well from my NCSA days, knows that this is a caffeine and adrenalin-fueled, sleep deprivation process, wherever you happen to be.

    I was also pleased that HPCWire awarded its Editor's Choice Award for best industry/government collaboration to the Microsoft/Intel Universal Parallel Programming Research Center (UPCRC) program, which involves the University of Illinois at Urbana-Champaign and UC-Berkeley. Andrew Chien (Intel) and I are responsible for coordinating this program across the two companies and two universities.

    Top500 Perils

    First, one of the increasing challenges for HPL and the Top500 is the time required to complete the benchmark run. Given the scale of today's systems, regardless of hardware/software stack, the mean time before failure (MTBF) of these systems is roughly equal to the time to complete the HPL run. This alone makes benchmarking a rather stressful business.

    Beyond that, at the time of benchmark runs, the hardware is normally very new, and component infant mortality is still common. Finally, one generally has only a single window to secure the highest position on the list, because new and even larger systems appear regularly. If you miss your target of opportunity for the June or November ranking, your system will slip several positions on the list.

    Maybe I am unable to generate a pity aphorism for clouds, but I will close with an allusion to Conrad's Heart of Darkness. Considering the challenges of multicore, exascale, multidisciplinary application software and reliability, one is inclined to remark, "The horror, the horror." We have serious work ahead.

    November 15, 2008

    SC: The Family Gathering

    It's "supercomputing week," which means that almost everyone who can spell HPC and who can walk, drive, swim or fly will be in Austin, Texas during the week of November 16 for SC08. Drawing on my youth, there will be preaching (academic papers, vendor presentations and government meetings), singing (on second thought, maybe not – geeks are not best known for their performing arts ability) and an all day dinner on the grounds (receptions, parties and dinners). In short, it's the place to see and be seen, or perhaps not to be seen if you are spending all of your time in closed door meetings with vendors or government officials.

    I have been attending SC (the conference formerly known as Supercomputing XY) since 1990. Sadly, I missed the first one in Florida, where Seymour Cray gave the opening keynote, and the second one in Reno, Nevada. It is interesting to reflect on how much the conference has changed over twenty years.

    Remembering the Big Apple

    In 1990, the conference was held in a New York hotel. The technical papers presentations were all in a single ballroom, and the small (and I do mean small) vendor booths and demonstrations were in a second, nearby ballroom. I have two particular memories of that 1990 event, beyond a long meeting about trace formats for parallel system performance analysis.

    The first concerns the humble beginnings of academic research booth space. Unlike today's massive show floor, with academic and laboratory booths that rival those of major vendors, the research exhibit space consisted of two or three draped tables. I distinctly remember Jack Dongarra sitting at one of the tables with a SUN workstation, demonstrating linear algebra software.

    My second memory of 1990 was the apparent disappearance of the Intel vendor booth. As I recall, the truck containing the Intel booth arrived at the hotel loading dock, to be met by a group of workers who assured the driver that hotel rules required them to unload the truck. The truck contents – Intel's booth – disappeared and were (to my knowledge) never seen again. (I always wondered what the thieves did with an exhibit booth. I suspect there were too unhappy groups that day, Intel and the people who absconded with the booth.) Intel did manage to create a very nice booth using some backup materials, however. Welcome to the Big Apple!

    Experiencing New Mexico

    In 1991, I was a member of the SC program committee, which was chaired by the late Ken Kennedy. That year, the conference was held in Albuquerque, NM, in the convention center, leading to substantial expansion of the scale and scope of the conference.

    That year, I created a research booth (a massive 10'x10') space that highlighted the results of our DARPA-sponsored Pablo project and the performance measurement and visualization tools we were developing. I remember that we printed some black-and-white posters to stick on a backdrop and distributed "booth duty" among the group of students, staff and me (the professor).

    Lawrence Livermore National Laboratory (LLNL) occupied the equally spacious 10'x10' space next to my booth. I remember watching with fascination when the LLNL team arrived on Sunday with several sections of 8' PVC pipe, elbow connectors, and a hacksaw. They then built a frame for their booth. This was literally cutting edge technology from our national laboratories!

    Looking Forward to Austin

    As always, I am looking forward to the meeting. It is a chance to see old friends, make some new ones, trade rumors and stories, survey the evolution of technology and discuss the future. It will also be a new experience for me, as a member of Microsoft. Kyril Faenov and his team have accomplished some impressive things with Windows HPC Server 2008 and I look forward to seeing the discussion of clouds, multicore and the future of HPC services.

    Coming full circle to Seymour Cray, this year, I was pleased to chair the IEEE Seymour Cray Award committee and select my old friend Steve Wallach as the honoree. The award will be presented at SC08. By the way, you might want to check out Steve's new venture – Convey (that's Convex plus one).

    In addition to my usual random walk across the convention and exhibit floors, attending technical paper sessions, private meetings and participating in Microsoft events, I will be speaking at several events:

    Finally, check out Todd Gamblin's Thursday afternoon paper presentation on scalable performance analysis for very large systems. It's pretty cool, though I am biased, as a thesis advisor!

    Preaching, singing (well, maybe not) and dinner on the grounds – sounds like fun. I suspect there will a few margaritas and some barbeque consumed as well.

    July 13, 2008

    Showing Up and Two Corollaries

    "Eighty to ninety percent of life is showing up." The line has been variously attributed to Yogi Berra, Woody Allen or even an anonymous wag. It's wise, though obvious advice – showing up and doing the expected generally allows one to avoid a host of problems. Appearing for jury duty avoids one being held in contempt of court, and you can't fly if you don't show up at the airport on time.  I was reflecting on the implications of "showing up" while at a recent meeting in Italy.

    Show Up and See What Happens

    My friend, Dave Turek, IBM's Vice President for Deep Computing, once explained IBM's open source and Linux strategy by saying that IBM had a deeply considered, two phase strategy for Linux and clusters for HPC, "Show up and see what happens." As he once remarked at an NCSA Private Sector Partners (PSP) meeting, "We've showed up. Now, we are waiting to see what happens."

    At NCSA, we partnered with IBM in 2001 to deploy two of the first large-scale commodity clusters for open scientific use: two 1 teraflop systems based on Intel Pentium III and Itanium processors. At the time, this was a radical, almost heretical idea – deploying commodity PC clusters as production HPC platforms. Of course, such commodity clusters now dominate the Top500 list.

    In a reprise of this experience, Microsoft and NCSA recently partnered to deploy Windows HPC Cluster 2008 on the latest incarnation of commodity cluster hardware. (The customer story has the technical details). I don't generally evangelize for Microsoft products in this blog, but I was very impressed that Windows HPC Cluster achieved substantially higher performance on the same hardware than did Linux. Microsoft, in the form of Kyril Faenov's HPC team, has definitely "showed up" in this space in a big way, and I think there are great opportunities to offer not only Windows compute clusters but also backend acceleration for desktop applications. Of course, all of this is ultimately connected to the ferment in cloud computing.

    Avoid the Obviously Wrong

    At the recent Cetraro meeting on High-Performance Computing and Grids, Miron Livny extended the "show up and see what happens" maxim by offering a corollary, "Show up and avoid doing something stupid." His observation was that evolutionarily, human success was defined by avoiding being trampled by a woolly mammoth, eaten by a hungry Bengal tiger or falling into a crevasse.

    The computing implication of Livny's corollary is that one should do reasonable things when presented with opportunities. In terms of research infrastructure, this means avoiding our academic tendency to delight in second system syndrome – building complex systems that embody all of our personally favorite features without determining if they are either needed or useful.

    At Cetraro, we debated the impact of the multicore revolution, the similarities and differences between Grids and clouds, and the commonalities between future exascale systems and the architecture of megascale data centers. (By the way, if you have not read the Department of Energy's exascale computing study, I highly recommend it.)

    There are deep technical challenges in all of these areas. However, we must avoid being trampled by the woolly mammoths; this domain is fraught with academic, government and industrial politics. I believe we need a wider dynamic range (time horizon, risk/reward and fiscal scale) of research and development projects if we are to solve these problems.

    I have made this point many times, most recently as part of the PCAST report on the U.S. NITRD program. I am scheduled to testify about this again to the House Science and Technology Committee on July 31. I will report on the hearing in August.

    Do Simple Things Quickly

    At the same Cetraro meeting, I opined that there was a second corollary, "Do the obvious, simple things quickly." I think this is the key lesson to be drawn from web2.0 mashups, and the rapid evolution of commercial clouds. The simplicity of the APIs and hosted infrastructure encourages external groups to innovate rapidly. We have seen the clear evidence of this in the explosive growth in social networking sites and in the hosted services that have appeared.

    By contrast, I think this is one of the places we have struggled with academic Grids. The software has often been too complex, and this complexity has been engendered by the distributed nature of the participating organizations, requiring "glue code" to integrate disparate policies and infrastructure across virtual organizations. In contrast, mashups and cloud services can be deployed quickly (by academic standards) using very simple APIs and service level agreements (SLAs). It will be interesting to see how the Grid/Cloud mashup evolves.

    May 10, 2008

    HPC and Climate Change: Senate Hearing

    On Thursday, May 8, I testified to the U.S. Senate Committee on Commerce, Science and Technology. The full committee hearing was on improving the "Capacity of U.S. Climate Modeling for Decision Makers and End-Users." The other members of the hearing panel were

    Jim Hack and I represented the computing and computational science issues, and the other four focused on the climate aspects. Within a few weeks, our written testimony will be posted on the Committee's hearing page, and in due time (many months), our oral testimony will appear in the Congressional Record.

    Continue reading "HPC and Climate Change: Senate Hearing" »

    April 26, 2008

    HPC Procurements and NRE

    In the current issue of HPCWire, John West has some thoughtful comments on the HPC procurement process. In an article entitled, "HPC Innovation in the Era of 'Good Enough'," he analyzes the competitive pressures on vendors and procurements and the small margins available to fund non-recurring engineering costs (i.e., vendor research and development).

    In the article, Jon references my Congressional testimony on the universality of computing as an intellectual amplifier. I recently reprised some of those comments in an essay for SIAM News, discussed in this blog entry. John also quotes me as saying one cannot build a national strategy on a series of point procurements. Let me expand on that observation.

    I believe we need to recognize that innovation has real costs, which must be jointly borne by academia, industry and government. We also must view hardware procurements in a larger context, where their acquisition is driven by both a coherent R&D strategy and by an innovation-driven deployment rationale.

    The "mine's bigger than yours" bragging associated with machoFLOPS is counter-productive. Let's measure new systems by the innovation they enable, in new technology, scientific discovery and competitiveness. More to the point, let's integrate over periods longer than the latest benchmarks and deployment milestones.