Disclaimer

  • The postings on this site are my own and don’t necessarily represent Microsoft's positions, strategies or opinions.

Twitter Updates

    follow me on Twitter
    AddThis Social Bookmark Button

    Technorati

    • Add to Technorati Favorites

    Technology

    November 23, 2008

    Reflections on SC08

    Ok, I admit it, what I said in my previous post was wrong. There was singing at SC08. The conference included both a music room where attendees could perform and a music booth where one could lip sync to classic hits. Beyond singing, the conference broke all previous attendance records, with roughly 11,000 attendees, though I doubt singing had anything to do with that!

    Clouds and Accelerators

    "Cloud" was undoubtedly the buzz word of the conference. Like the word Grid in the past, cloud is now a tabula rasa on which research groups and companies are projecting their own definitions and spins. Somewhere, there's a Dennis Milleresque cultural reference lurking that invokes either Joni Mitchell

    I've looked at clouds from both sides now,
    From up and down, and still somehow
    It's cloud illusions i recall.
    I really don't know clouds at all.

    or The Rolling Stones

    I said, Hey! You! Get off of my cloud
    Hey! You! Get off of my cloud
    Hey! You! Get off of my cloud
    Don't hang around 'cause two's a crowd
    On my cloud, baby

    In either case, I'm too tired to emit such a pithy aphorism.

    On the hardware front, accelerators, notably GPUs, and solid state storage (SSDs) dominated the exhibit floor. NVIDIA was highly visible, and vendors large and small were demonstrating software tools for accelerator programming and for SSDs.

    Microsoft News

    Microsoft broke into the top ten of the Top500 list of the world's fastest machines, based on execution of the high-performance Linpack (HPL) benchmark atop Windows HPC Server 2008. Like all Top500 runs, this required long hours by a dedicated team of people who pushed the hardware and themselves to the absolute limit. Everyone who has done this, and I remember it well from my NCSA days, knows that this is a caffeine and adrenalin-fueled, sleep deprivation process, wherever you happen to be.

    I was also pleased that HPCWire awarded its Editor's Choice Award for best industry/government collaboration to the Microsoft/Intel Universal Parallel Programming Research Center (UPCRC) program, which involves the University of Illinois at Urbana-Champaign and UC-Berkeley. Andrew Chien (Intel) and I are responsible for coordinating this program across the two companies and two universities.

    Top500 Perils

    First, one of the increasing challenges for HPL and the Top500 is the time required to complete the benchmark run. Given the scale of today's systems, regardless of hardware/software stack, the mean time before failure (MTBF) of these systems is roughly equal to the time to complete the HPL run. This alone makes benchmarking a rather stressful business.

    Beyond that, at the time of benchmark runs, the hardware is normally very new, and component infant mortality is still common. Finally, one generally has only a single window to secure the highest position on the list, because new and even larger systems appear regularly. If you miss your target of opportunity for the June or November ranking, your system will slip several positions on the list.

    Maybe I am unable to generate a pity aphorism for clouds, but I will close with an allusion to Conrad's Heart of Darkness. Considering the challenges of multicore, exascale, multidisciplinary application software and reliability, one is inclined to remark, "The horror, the horror." We have serious work ahead.

    November 15, 2008

    SC: The Family Gathering

    It's "supercomputing week," which means that almost everyone who can spell HPC and who can walk, drive, swim or fly will be in Austin, Texas during the week of November 16 for SC08. Drawing on my youth, there will be preaching (academic papers, vendor presentations and government meetings), singing (on second thought, maybe not – geeks are not best known for their performing arts ability) and an all day dinner on the grounds (receptions, parties and dinners). In short, it's the place to see and be seen, or perhaps not to be seen if you are spending all of your time in closed door meetings with vendors or government officials.

    I have been attending SC (the conference formerly known as Supercomputing XY) since 1990. Sadly, I missed the first one in Florida, where Seymour Cray gave the opening keynote, and the second one in Reno, Nevada. It is interesting to reflect on how much the conference has changed over twenty years.

    Remembering the Big Apple

    In 1990, the conference was held in a New York hotel. The technical papers presentations were all in a single ballroom, and the small (and I do mean small) vendor booths and demonstrations were in a second, nearby ballroom. I have two particular memories of that 1990 event, beyond a long meeting about trace formats for parallel system performance analysis.

    The first concerns the humble beginnings of academic research booth space. Unlike today's massive show floor, with academic and laboratory booths that rival those of major vendors, the research exhibit space consisted of two or three draped tables. I distinctly remember Jack Dongarra sitting at one of the tables with a SUN workstation, demonstrating linear algebra software.

    My second memory of 1990 was the apparent disappearance of the Intel vendor booth. As I recall, the truck containing the Intel booth arrived at the hotel loading dock, to be met by a group of workers who assured the driver that hotel rules required them to unload the truck. The truck contents – Intel's booth – disappeared and were (to my knowledge) never seen again. (I always wondered what the thieves did with an exhibit booth. I suspect there were too unhappy groups that day, Intel and the people who absconded with the booth.) Intel did manage to create a very nice booth using some backup materials, however. Welcome to the Big Apple!

    Experiencing New Mexico

    In 1991, I was a member of the SC program committee, which was chaired by the late Ken Kennedy. That year, the conference was held in Albuquerque, NM, in the convention center, leading to substantial expansion of the scale and scope of the conference.

    That year, I created a research booth (a massive 10'x10') space that highlighted the results of our DARPA-sponsored Pablo project and the performance measurement and visualization tools we were developing. I remember that we printed some black-and-white posters to stick on a backdrop and distributed "booth duty" among the group of students, staff and me (the professor).

    Lawrence Livermore National Laboratory (LLNL) occupied the equally spacious 10'x10' space next to my booth. I remember watching with fascination when the LLNL team arrived on Sunday with several sections of 8' PVC pipe, elbow connectors, and a hacksaw. They then built a frame for their booth. This was literally cutting edge technology from our national laboratories!

    Looking Forward to Austin

    As always, I am looking forward to the meeting. It is a chance to see old friends, make some new ones, trade rumors and stories, survey the evolution of technology and discuss the future. It will also be a new experience for me, as a member of Microsoft. Kyril Faenov and his team have accomplished some impressive things with Windows HPC Server 2008 and I look forward to seeing the discussion of clouds, multicore and the future of HPC services.

    Coming full circle to Seymour Cray, this year, I was pleased to chair the IEEE Seymour Cray Award committee and select my old friend Steve Wallach as the honoree. The award will be presented at SC08. By the way, you might want to check out Steve's new venture – Convey (that's Convex plus one).

    In addition to my usual random walk across the convention and exhibit floors, attending technical paper sessions, private meetings and participating in Microsoft events, I will be speaking at several events:

    Finally, check out Todd Gamblin's Thursday afternoon paper presentation on scalable performance analysis for very large systems. It's pretty cool, though I am biased, as a thesis advisor!

    Preaching, singing (well, maybe not) and dinner on the grounds – sounds like fun. I suspect there will a few margaritas and some barbeque consumed as well.

    October 28, 2008

    Beyond The Azure Blue

    From the first day I arrived at Microsoft, my academic colleagues have been asking me about Microsoft's strategy for cloud computing and when (or if) there would be public announcements. Those questions rose to a crescendo as academic groups prepared responses to the NSF eXtreme Digital (XD) TeraGrid solicitation. All I could say was that we were working on a plan, and it would become clear soon.

    I don't normally pitch Microsoft products in the blog, preferring to discuss science policy, technology research and development and global competitiveness. However, something big just happened at Microsoft, something I think will affect all of us. Moreover, as I write this, the Pacific Northwest sky is clear and azure blue, and that doesn't happen often this time of year. An omen, perhaps?

    Microsoft Azure Cloud Services

    At our Professional Developers Conference (PDC), Microsoft announced Azure, our cloud computing platform, with on-demand compute and storage to host, scale and manage Internet or cloud applications. The press release has additional business perspective and a link to the presentation. Azure is one element of the vision Ray Ozzie (See "Mind to Mind: Building Innovation") described in his 2005 Internet Services Disruption memorandum.

    The simplest description of Azure is that the initial release allows you to develop hosted Windows applications using .NET Services, though future releases will support unmanaged code and open source tools as well (Eclipse, Ruby, PHP, and Python). Within Azure, a fabric controller manages application instances and access to storage via SQL Data Services (SDS), and it hosts applications atop virtualized multicore hardware. Finally, Microsoft's Live Services offerings will be layered atop the Azure framework.

    You can read the white paper for details on the Azure design and usage approach. In addition, the software development kit (SDK) is available for download. In addition to the Azure SDK itself, there are SDKs for Visual Studio, .NET and SDS Services. Finally, there are Java and Ruby SDKs for .NET Services as well. This is a Community Technology Preview (CTP), meaning Microsoft welcomes feedback on these early capabilities and will continue to expand the capabilities of Azure over the coming months.

    Science and Technology Implications

    Earlier in the year, I wrote on both my blog and in HPCWire ("Dan's Cloudy Crystal Ball") about the possibility of outsourcing research computing services and infrastructure to the cloud. I noted then that the explosive growth of computing as an enabler of scientific discovery had strained university capabilities and Federal research budgets. Given our current economic crisis, university operating budgets and Federal research expenditures will be under even greater strain and there will be increased scrutiny on the need for each investment.

    In a world of (at best) modest research budget increases, we must ask hard questions about the best use of limited funds. Cloud computing offers a potential mechanism to increase the efficiency of current research, ensure continuity of critical data and enable new kinds of research not now feasible.

    In this model, researchers focus on the higher levels of the software stack -- applications and innovation, not low-level infrastructure. University and Federal research agency administrators, in turn, procure services from the providers based on capabilities and pricing. Finally, the cloud service providers deliver economies of scale and capabilities driven by a large market base and energy efficient infrastructure. Remember, computing infrastructure exists to enable discovery, not as monuments to technological prowess.

    In addition to efficiency, the scalability of cloud services and infrastructure opens new research possibilities. Not only is it possible federate multidisciplinary research data at far larger scales than possible in a university environment (think tens to hundreds of petabytes of low latency storage), we can escape the pernicious cycle of transitory research infrastructure.

    How often have we created data repositories as part of research projects, only to find few mechanisms to ensure their long-term sustainability and access by the broader research community? How often have we faced a miasma of distributed data sources with unknown provenance and non-compatible metadata, each supported pro bono on a best effort basis? (See my recent comments on digital document preservation.) Instead, imagine multidisciplinary data fusion and mining, where students can pose queries against integrated but diverse data sources using robust tools?

    Finally, by leveraging "pay as you go" models, we can trade time and scale on a continuous basis. Imagine applying 50,000 processors for one hour at the same cost as 50 processors for one thousand hours. In the cloud, the integral under the curve is the same and the costs are comparable, but the research effects are qualitatively different.

    The Standard Questions

    The standard questions always arise about new approaches to computing. Cloud services and data storage inevitably raise the standard ones.

    • Is it reliable and will my data persist?
    • Is it safe, private and secure?
    • Will I be captured and become captive?
    • What does it cost and what if I can't continue paying?

    We tend to forget that there are complementary issues about local infrastructure because we have already internalized and accepted the implications and risks. Moreover, local failures are rarely publicized.

    • What happens if my disks crash?
    • What if I can't pay for backups or maintenance or physical plant or …?
    • What if my network is penetrated?

    These are the standard cost/benefit/risk tradeoffs. One must make them based on statistics, economics and practical constraints. Remember that we debated the same issues when we shifted research computing from vendor-backed HPC designs to predominantly commodity components.

    Let's Reason Together

    I welcome discussion of how we can exploit cloud services and infrastructure effectively – all cloud infrastructure, not just Microsoft's Azure. To do this, the cloud service providers, hardware vendors, universities and Federal government must work together to outline an agenda, conduct experiments at scale and speak with a united voice on the opportunities.

    It's a sunny day, but my head is in the clouds.

    September 08, 2008

    ManyCore: Able Was I Ere I Saw Elba



    As I write this, I am attending the ETH
    LASER summer school on concurrency, which is being held on the island of Elba. The island sits off the coast of Tuscany, a few miles from Pisa. It is perhaps best known as the place where Napoleon was exiled after his forced abdication and where he spent the interregnum before his final defeat at Waterloo. (Let me express my thanks to Bertrand Meyer for the invitation to speak at the summer school.)

    As I prepare to deliver six lectures on multicore and cloud computing here on Elba, the geographic irony of grand ambition, hubris and ignominious defeat is not lost on me. We have been struggling for the past forty years to find elegant and efficient parallel and distributed programming paradigms, with modest success. To continue my 19th century metaphor, we remain, as Matthew Arnold sadly put it, "Swept with confused alarms of struggle and flight, where ignorant armies clash by night."

    The Virtuous Cycle

    Metaphors aside, our struggle is real and extraordinarily important. The virtuous cycle that has long driven the computing industry is in flux, and if it is broken, we will struggle restart it – for deep economic reasons. The desire for new functionality leads to richer, more complex software, which imposes greater demands on extant hardware, with concomitant performance constraints. In turn, this stimulates demand for faster processors, and the cycle of innovation turns.

    One interesting corollary of this cycle is that we demand new, faster processors at the same price, rather than the same performance at a lower price. This consumer demand generates the revenue needed to fuel commercial software development, power new chip designs and fund semiconductor fabrication line construction. These are multibillion dollar (U.S.) investments, ones only repaid if tens to hundreds of millions of units are sold. In turn, this creates deep partnerships among companies such as Microsoft, Intel,
    AMD and the PC vendors. A similar virtuous cycle exists in the mobile telephone market.

    ManyCore Directions

    This ecosystem of software and hardware innovation is challenged by consumer parallelism, in the form of large-scale multicore (manycore) chips. No longer can we expect dramatic increases in single core performance, due to power and heat dissipation constraints on consumer devices. Perhaps more tellingly, all of us wonder what the next "killer app" will be that excites and incents consumers to buy new, manycore systems. Personally, I believe it will be some combination of graphics-intensive massively multiplayer games (MPGs) and contextually-adaptive, situationally-aware information spheres. (One can think of the latter as Vannevar Bush's
    Memex reborn.)

    Of course, as James Thornton (CDC) said many years ago, "Anyone who says he knows how computers should be built should have his head examined! The man who says it is either inexperienced or really mad." I'm too old to be inexperienced, so perhaps I am really mad after all!

    The fundamental question is how large multicore (manycore) chips and development software will evolve. I see at least four architectural directions, at least two of which are already commercially prevalent. The first is the "cookie cutter" homogeneous multicore design, exemplified by today's Intel and AMD flagship x86 offerings, along with similar homogeneous multicore designs from SUN (Niagara) and IBM (Power5-7). Tilera's
    TILE64 and Intel's Larrabee are other examples of this approach, combining standard cores (x86 and MIPS-derived, respectively) with a regular interconnect (mesh for Tilera and ring for Larrabee).

    The second is ISA-compatible homogeneous, but performance heterogeneous multicore. In this case, one combines, for example, a smaller number of complex, out-of-order cores with a larger number of simpler, in-order cores. The motivations for this approach are simple – the implications of Amdahl's Law and the need to execute legacy code efficiently while still delivering some of the performance and power advantages of low-power multicore. In this same spirit, Mark Hill has recently written a great paper about the importance of performance heterogeneity in multicore design.

    The third is functional heterogeneity, currently exemplified by chips such as the IBM Cell, AMD's announced Fusion chip and a host of embedded and domain-specific chips. I believe there are many opportunities for architectural innovation in this space, combining graphics, DSP, packet processing, cryptographic functions, SDR and a host of other functions with novel interconnects and memory sharing approaches.

    The fourth is what I call non-traditional architectures that embody more radical alternatives. One great example of this class is Doug Burger's
    TRIPS system, based on data graph execution. Doug recently joined Microsoft Research from the University of Texas at Austin, and I am excited about the collaboration possibilities.

    Back to the War

    I believe we are at an inflection point in parallel computing, with the economic impetus of consumer parallelism now driving us. Let us hope that we fare better than the warriors in another 19th century war, the Battle of Balaclava during the Crimean War. As Tennyson wrote so well, the Light Brigade charged into the mouth of hell. With cannons to the right (multicore architecture), cannons to the left (programming models) and cannons in front (next-generation applications), we ride into an unknown future, one fraught with peril but also with opportunity.

    May 26, 2008

    Rikei Banare and Global Competition

    On Saturday, May 17, the New York Times ran a front page story (below the fold) on the dearth of Japanese students entering science and engineering fields. Japanese universities call it rikei banare or "flight from science." The article notes:

    The decline is growing so drastic that industry has begun advertising campaigns intended to make engineering look sexy and cool, and companies are slowly starting to import foreign workers, or sending jobs to where the engineers are, in Vietnam and India.

    The article continues by relating comments from Japanese students that they prefer high-paying jobs in disciplines that do not require the long hours and hard work associated with science and technology careers.

    Does this sound familiar? It should, as we in the U.S. are also struggling to attract enough students into computing disciplines with marketing campaigns, curricula changes and outreach programs. These outreach programs are critically important, because we and other science and engineering disciplines have for too long failed to include a sufficiently broad and diverse community in computing. We can and must do better, for both ethical and practical reasons.

    International Competition

    At roughly the same time as the New York Times article appeared, Georgia Tech's Technology Policy and Assessment Center (TPAC) released its bi-annual "High-Tech Indicators" report. Via TPAC's metrics, China has now surpassed the United States in a key measure of international competitiveness. On a 100 point scale, China's technological standing is 82.8, versus the United States at 76.1. While China's ranking increased from 22.5 in 1996 to 82.8 in 2007, the United States ranking peaked at 95.4 in 1999. Equally tellingly, if the European Union were considered as a single entity, it too would have surpassed the United States.

    This is not news to those of us in the computing and technology world. Global competition is fierce and international companies seek competitive advantage wherever they can find it. As Manufacturing and Technology News put it, there has been "no Sputnik moment" to awaken the broader population to the competitive challenge and the need for an internationally competitive knowledge workforce.

    Looking Ahead

    Without doubt, there has always been ennui about the next generation and their interests. Many of us have heard the old saw about walking five miles in the snow-- barefoot -- to school and that it was uphill both ways. In my case, my late father regularly asked me if I were ever going to get a "real" job. (Perhaps my now working at Microsoft qualifies as a real job!)

    Generational jocularity aside, in a technological society where continued economic vitality depends on knowledge creation, a qualified pool of knowledge workers is the only truly renewable resource. Smart, educated people will always be in short supply. Each country's long-term competitiveness depends on having enough such people to engage their international peers. The "Gathering Storm" report made this point clearly and pointedly.

    Closer to home in computing, Andy Grove got it exactly right when he famously said that only the paranoid survive. However, most people do not realize what he really said. The full quotation is more thoughtful and thought provoking:

    Success breeds complacency.

    Complacency breeds failure.

    Only the paranoid survive