As I write this, I am attending the ETH
LASER summer school on concurrency, which is being held on the island of Elba. The island sits off the coast of Tuscany, a few miles from Pisa. It is perhaps best known as the place where Napoleon was exiled after his forced abdication and where he spent the interregnum before his final defeat at Waterloo. (Let me express my thanks to Bertrand Meyer for the invitation to speak at the summer school.)
As I prepare to deliver six lectures on multicore and cloud computing here on Elba, the geographic irony of grand ambition, hubris and ignominious defeat is not lost on me. We have been struggling for the past forty years to find elegant and efficient parallel and distributed programming paradigms, with modest success. To continue my 19th century metaphor, we remain, as Matthew Arnold sadly put it, "Swept with confused alarms of struggle and flight, where ignorant armies clash by night."
The Virtuous Cycle
Metaphors aside, our struggle is real and extraordinarily important. The virtuous cycle that has long driven the computing industry is in flux, and if it is broken, we will struggle restart it – for deep economic reasons. The desire for new functionality leads to richer, more complex software, which imposes greater demands on extant hardware, with concomitant performance constraints. In turn, this stimulates demand for faster processors, and the cycle of innovation turns.
One interesting corollary of this cycle is that we demand new, faster processors at the same price, rather than the same performance at a lower price. This consumer demand generates the revenue needed to fuel commercial software development, power new chip designs and fund semiconductor fabrication line construction. These are multibillion dollar (U.S.) investments, ones only repaid if tens to hundreds of millions of units are sold. In turn, this creates deep partnerships among companies such as Microsoft, Intel,
AMD and the PC vendors. A similar virtuous cycle exists in the mobile telephone market.
This ecosystem of software and hardware innovation is challenged by consumer parallelism, in the form of large-scale multicore (manycore) chips. No longer can we expect dramatic increases in single core performance, due to power and heat dissipation constraints on consumer devices. Perhaps more tellingly, all of us wonder what the next "killer app" will be that excites and incents consumers to buy new, manycore systems. Personally, I believe it will be some combination of graphics-intensive massively multiplayer games (MPGs) and contextually-adaptive, situationally-aware information spheres. (One can think of the latter as Vannevar Bush's
Of course, as James Thornton (CDC) said many years ago, "Anyone who says he knows how computers should be built should have his head examined! The man who says it is either inexperienced or really mad." I'm too old to be inexperienced, so perhaps I am really mad after all!
The fundamental question is how large multicore (manycore) chips and development software will evolve. I see at least four architectural directions, at least two of which are already commercially prevalent. The first is the "cookie cutter" homogeneous multicore design, exemplified by today's Intel and AMD flagship x86 offerings, along with similar homogeneous multicore designs from SUN (Niagara) and IBM (Power5-7). Tilera's
TILE64 and Intel's Larrabee are other examples of this approach, combining standard cores (x86 and MIPS-derived, respectively) with a regular interconnect (mesh for Tilera and ring for Larrabee).
The second is ISA-compatible homogeneous, but performance heterogeneous multicore. In this case, one combines, for example, a smaller number of complex, out-of-order cores with a larger number of simpler, in-order cores. The motivations for this approach are simple – the implications of Amdahl's Law and the need to execute legacy code efficiently while still delivering some of the performance and power advantages of low-power multicore. In this same spirit, Mark Hill has recently written a great paper about the importance of performance heterogeneity in multicore design.
The third is functional heterogeneity, currently exemplified by chips such as the IBM Cell, AMD's announced Fusion chip and a host of embedded and domain-specific chips. I believe there are many opportunities for architectural innovation in this space, combining graphics, DSP, packet processing, cryptographic functions, SDR and a host of other functions with novel interconnects and memory sharing approaches.
The fourth is what I call non-traditional architectures that embody more radical alternatives. One great example of this class is Doug Burger's
TRIPS system, based on data graph execution. Doug recently joined Microsoft Research from the University of Texas at Austin, and I am excited about the collaboration possibilities.
Back to the War
I believe we are at an inflection point in parallel computing, with the economic impetus of consumer parallelism now driving us. Let us hope that we fare better than the warriors in another 19th century war, the Battle of Balaclava during the Crimean War. As Tennyson wrote so well, the Light Brigade charged into the mouth of hell. With cannons to the right (multicore architecture), cannons to the left (programming models) and cannons in front (next-generation applications), we ride into an unknown future, one fraught with peril but also with opportunity.