Since I joined Microsoft in late 2007, I have written about science policy, Federal government interactions, and national competitiveness studies, in my role as a member of PCAST and chair of the Computing Research Association (CRA). Throughout, I have emphasized the need for strategic investment in long-term, basic research, especially as part of the economic stimulus package..
I have also discussed the rise of multicore computing, the consequent software crisis and the need for innovation in both architecture and software, including Microsoft's support for the Microsoft/Intel-funded Universal Parallel Computing Research Centers (UPCRCs) at Illinois and UC-Berkeley. I have also mused on the future of high-performance computing and its role as an enabler of scientific discovery. I have even written about my family, my rural childhood and my life experiences.
What I have not done is write about why I came to Microsoft and what I am doing – until now. Yes, my team manages the UPCRCs in partnership with Intel. Yes, I devote time and energy to research policy, both for the community and on behalf of Microsoft. Yes, I am involved in the future of high-performance computing, both politically and technically. However, that's not the entire story.
It's time to talk infrastructure so large it makes petascale systems seem small. It's time to talk about why I can't remember the last time I had this much fun. It's time to pull back the curtain and talk about the future of clouds. No, I'm not talking about weather forecasting, though I really enjoyed my past collaboration with the LEAD partnership.
I came to Microsoft to lead a new research initiative in cloud computing, one that complements our production data center infrastructure and our nascent Azure cloud software platform. You can read the press release and the web site for the official story. What follows is my personal perspective.
The Infrastructure of Our Lives
We all know the cloud premise – Internet delivery of software and services to distributed clients, from mobile devices to desktops. We tend not to think about how dependent we now are on those delivered services, though we are, just as we depend on the telephone and our water and electrical utilities.
Imagine a day without the web, without search engines, without social networks, without online games, without electronic commerce, without streaming audio and video. Our world has changed, and government, business, education, recreation and social interaction are now critically dependent on reliable Internet services and the hardware and software infrastructure behind them. However, more research and technology evaluation are needed to make them as trustworthy as the telephone network.
Building Internet services infrastructure using standard, off-the-shelf technology made sense during the 1990s Internet boom. (And yes, I remember how cool Mosaic was, when I first saw it at Illinois.) The facilities were small by today's standards, and the infrastructure could be deployed quickly. Today, however, the scale is vastly larger, our social and economic dependence is much greater and the consequences of failure are profound. Web service outages are now international news, and a cyberattack is considered an act of war.
For background on some of the challenges and problems in scaling, you might want to follow the Data Center Knowledge and High Scalability web sites. If you are new to this space, they and other reading will redefine your notions of large and reliable. You might not think 100 megawatts could be a data center design constraint, but it is. More importantly, you should fear – yea, verily, be absolutely terrified by –the wrath of 100 million unhappy customers should your Internet service fail. Every nightmare that has ever awakened a CIO in a cold sweat at 2am is real, but magnified a thousand fold. If it were easy, though, it would neither be exciting nor fun.
Cloud Infrastructure Challenges
Microsoft's business, like that of other cloud service providers -- Amazon, Google, Yahoo and others – depends on an ever-expanding network of massive data centers: hundreds of thousands of servers, many, many petabytes of data, hundreds of megawatts of power, and billions of dollars in capital and operational expenses. This enormous scale – far larger than even the largest high-performance computing facilities – brings new design, deployment and management challenges, including energy efficiency, rapid deployment, resilience, geo-distribution, composability, and graceful recovery.
I have been a "big iron" guy for a long time, and Internet and cloud services infrastructures do have analogs with petascale and exascale computing, but the workloads and optimization axes are different. Like today's HPC systems, cloud computing facilities are being built with hardware and software technologies not originally designed for deployment at such massive scale. Consequently, they are less efficient and less flexible than they either can or should be. If we built utility power plants the same way we build cloud infrastructure, we would start by visiting The Home Depot and buying millions of gasoline-powered generators. This must change.
Imagine a world where heterogeneous multicore processors are design and optimized for diverse workloads, where solid state storage changes our historical notions of latency and bandwidth, where on-chip optics, system interconnects and LAN/WAN networking simplify data movement, where scalable systems are resilient to component failures, where programming abstractions facilitate functional dispersion across devices and facilities, where new applications are developed more quickly and efficiently. This can be.
Cloud Computing Futures
Over the past fourteen months, I have been quietly building the Cloud Computing Futures (CCF) team, starting with a key concept. We must treat cloud service infrastructure as an integrated system—a holistic entity—and optimize all aspects of hardware and software.I have recruited hardware and software researchers, software developers and industry partners to pursue this vision. It's been a blast.
The CCF agenda spans next-generation storage devices and memories, new processors and processor architectures, networks, system packaging, programming models and software tools. We are a research and technology transfer team, whose roles are to explore radical new alternatives – "blank sheet of paper" approaches to cloud hardware and software infrastructure – and to drive those ideas into implementation and practice.
Effective research in this space requires changes to both hardware and software, and the resulting prototypes must be constructed and tested at a scale difficult for small teams. This type of research and technology transfer is in academia, because the efforts often cross many research disciplines.
For this reason, the CCF team is taking an integrated approach, drawing insights and lessons from Microsoft's production services and data center operations, and partnering with researchers, vendors and product teams worldwide. Our work builds on technical partnerships and collaborations across Microsoft, including Microsoft Research, Debra Chrapaty's Global Foundation Services (GFS) data center construction, operations and delivery team, and Ray Ozzie'sAzure cloud services group. We are also partnering with an array of hardware-technology providers and companies as we build prototypes.
Now You Know
For me, CCF has been an opportunity to apply research experiences and ideas gleaned over the past twenty-five years of my academic career. Equally importantly, it is a chance to build prototypes at scale to test those ideas, and then help drive the promising technologies into practice. The past year has been great fun, and I have been privileged to attract and partner with some wonderful people to this adventure, including Jim Larus and Dennis Gannon.
Now you know why I came to Microsoft. It was a chance to practice what I've been preaching. It was a chance to help design the biggest of big iron. It was a chance to help invent the future. It's a pretty cool gig for a balding old geezer like me!
The political maneuvering and theater are well underway as the U.S. Congress debates the merits of various proposals to stimulate the economy. The U.S. House of Representatives has passed H.R. 1, the American Recovery and Reinvestment Act of 2009, and the Nelson/Collins (Senators Ben Nelson and Susan Collins) adjustments to S. 336 will likely come to the floor of the U.S. Senate for a vote in a few days. If the modified bill is approved by the Senate, we will await the negotiations that follow in conference.
Support for scientific research is a small fraction of the stimulus plan, and the House and Senate plans differ in some marked ways. ASTRA has a handy comparison of the two proposals with respect to research investment.
If you haven't seen legislative sausage made before, it is important to understand the process. After each legislative branch passes its version of a bill, a conference committee reconciles the differences, and the compromise must then be approved (again) by both branches. It is a competitive and often messy rugby scrum. Hence, we do not yet know what may emerge in support of scientific research and evelopment.
Steve Ballmer on Science
Microsoft's CEO, Steve Ballmer, recently spoke to the U.S. House Democratic Caucus Retreat. Although you can read the complete speech, I would like to highlight a few excerpts that emphasize Microsoft's strong support for innovation and the importance of continued investment in basic research. In his speech, Steve noted
… America really has to return to growth that's built on innovation and productivity, rather than leverage and private debt. That must happen.
He went on to say,
We need to pursue breakthroughs over the coming years in green technology, alternative energy, bioengineering, parallel computing, quantum computing. Without greater government investment in the basic research, there is a danger that important advances will happen in other countries. This is truly I think not only an issue of competitiveness, but also in a sense of national security. Companies like ours and others can do our fair share in terms of funding of basic research, but government needs to take the lead.
I could not agree more wholeheartedly.
Microsoft Policy Blog
On the subject of Microsoft and policy, the company recently launched a policy blog (Microsoft on the Issues), including support for research. A few weeks ago, I penned an entry for the Microsoft policy blog on the federal stimulus plan and scientific innovation. In addition to noting the critical importance of innovation to fuel the economy, I observed that we should treat the current crisis and any new research funds as an opportunity to rethink the way we approach university research and public/private partnerships:
Beyond critically needed funding, the bill gives government, academia and industry a chance to rethink research partnerships and policies in ways that will harness the benefits of scientific innovation for the good of the entire nation. …
We now have the opportunity to further streamline our nation's research infrastructure, particularly in U.S. research universities. …
By rethinking public-private sector partnerships, and refining processes for acquiring and deploying information technology, we can increase research efficiency and catalyze new discoveries while reducing costs for both universities and the federal government.
The potential influx of research funds from the stimulus package creates a great opportunity for research innovation. However, these are perilous times, and we should not (by default) assume that "business as usual" is the best approach to accelerating research. It may indeed be the best approach, but we should face the issues squarely and thoughtfully.
What is the best way to apply information technology to science and engineering research? How can we best advance computing research itself? How can we retain our research strengths while also addressing the rising cost of higher education? What can we learn from new and effective approaches elsewhere? How can we continue to compete effectively and efficiently? As Spiderman says, "With great power, comes great responsibility."
Over the past thirty years, I have accumulated the common artifact of an academic research career – bookshelves overflowing with research journals and conference proceedings. Each time I pull an old and yellowing volume from my shelves, it is simultaneously nostalgic and thought provoking to read a few randomly selected articles. Not only does this stroll down memory lane illuminate how far we have come, both technologically and theoretically, it shows how profoundly the publication culture of our field has changed.
Not that many years ago, CRA published a "best practices" memorandum entitled, "Evaluating Computer Scientists and Engineers for Promotion and Tenure." At a time when many departments were struggling to make the case to their science and engineering colleagues that conference publications mattered, this memorandum demonstrated that computing conference publications were of a quality comparable to those in archival journals.
The perception battle won, is all right with our publication world? Perhaps, but I suspect not. Our prestigious conferences have become the moral equivalent of highly selective journals in other fields. The computing conference review process is rigorous and highly selective, and polished results are required for publication. In many of our sub-disciplines, the conference paper is the final result. There is no expectation that the preliminary results will be expanded, augmented and published in a journal. Consequently, many – arguably most – of our journals have receded in significance. I believe this is a regrettable and worrisome development.
First, it has truncated the continuum of publication options. In most disciplines, conferences are the venue where late breaking results, thought providing theories and controversial ideas are aired and debated. Many of these later are proven incorrect or validated and expanded with additional data, but the free exchange of ideas stimulates research and innovation. At the risk of sounding like an "old geezer," I encourage you to read some old conference proceedings. It is illuminating to see how our many of our conferences have evolved from idea exchanges to publication venues.
<BEGIN Old Geezer Story>
Recently, I told a group about one of my undergraduate experiences – being caught in an unexpected thunderstorm with my compiler under my arm. That would be the box of punched cards containing my compiler. I spent most of the evening with an iron and ironing board, flattening my cards very carefully, so the card reader could process it and then punch a new, undamaged deck. My audience looked at me as if I were a walking dinosaur. Now back to our regularly scheduled programming.
<END Old Geezer Story>
Our emphasis on the conference cycle has also encouraged and rewarded production of publishing "quarks" – units of intellectual endeavor that can be generated, summarized and reviewed in a calendar year. We now see new faculty and research staff candidates with more publications than were once common in promotion and tenure dossiers.
Do not misunderstand. I am not suggesting that our current conference-centric culture is all bad, merely that we should be more thoughtful regarding the timescales and range of our publication options. I would also humbly suggest that we consider how this approach shapes the types and kinds of research conducted. We all know that quality trumps quantity, and that research results have a wide range of natural sizes and time scales.
What then becomes of our often languishing journals? Are they a hidebound and archaic notion, doomed to irrelevance by ubiquitous electronic access? To be sure, the nature of publication is in flux in both popular and professional culture, with the physical artifacts likely to disappear. However, the notions of scholarly review and archival recording of research are independent of these artifacts.
I believe we need to restore journals to their rightful place as the lasting archives of scientific knowledge. This will require a cultural shift, making our conferences the harbingers of extended, rigorous publication in journals. Equally importantly, it will require us to review those journal submissions thoughtfully and with alacrity.
As anyone who has ever been the editor of computing journal knows, obtaining timely reviews is challenging. Even with gentle (and sometimes not so gentle) nagging, the weeks can stretch to months; the months sometimes turn to years. Contrast this with other technical disciplines where submissions can be reviewed and published in weeks or months. Is it any wonder that paper authors in our field eschew journals for conferences with known publication dates?
As a discipline, we benefit from the entire continuum of venues for communicating research ideas and results, from informal workshops and conferences to research surveys and expanded publication in archival journals. Let's recognize and embrace the distinct and important roles that each plays in the free and fruitful exchange of research ideas.