N.B. I also write for the Communications of the ACM (CACM). The following essay recently appeared on the CACM blog.
To appropriate a line from The Music Man, there is trouble here in River City, and anyone who cares about the future of scientific discovery and innovation should be worried. What is that trouble you say? It is the divide separating the land of high-performance computing (HPC) and big data, and the political and funding implications created by this divide.
Whether in the U.S., Europe or Japan, the competition for research and infrastructure funding is intense. In the midst of our lingering economic malaise, budgets are being stretched to the breaking point. Should we invest in a new telescope or a new accelerator, a new polar station or a new biology initiative? These are legitimate, though painful questions, borne of budget exigencies and fiscal realities.
Many of us are concerned that in this time of limited resources, we could face a similar funding competition that pits HPC, particularly exascale plans, against big data. This competition would be disastrous for science, for computing and computational science research, for infrastructure deployment and for global innovation and economic growth. Both HPC and big data are essential elements of the research portfolio. We must make sure this rumble in the policy halls does not take place.
Understanding Cultures, Preventing Conflict
Even those who have never heard the name of the philosopher George Santayana can parrot his famous dictum, "Those who cannot remember the past are condemned to repeat it." Thus, it is worth drawing a few lessons from the Peloponnesian war, which pitted two great Greek city-states, Athens and Sparta, against one another. Although Sparta was ultimately the military victor, the adverse social, economic and political effects devastated all of Greece, and the wars marked the end of the Greek golden age. We cannot afford to reprise the Peloponnesian war in the guise of big data versus HPC.
The root of this potential conflict is the differing norms of computer science and computational science. Big data and machine learning are the creations of the academic computing and business cultures, while computational science is more the offspring of science and engineering. Both share historical roots, and both are cross-fertilized by the other. They need not and should not be inimical, especially since many of the technical challenges are common to both.
Some see big data as an egalitarian opportunity, one that could readily benefit both big and small science and yield scientific and economic benefits very quickly. Let me be clear, this statement is unquestionably true. The unprecedented richness of scientific and engineering data being produced by large-scale instruments and ubiquitous sensors is ripe for harvest and correlation via advanced data mining. Implicit this is the need for investment in both new data analysis tools and techniques and in large-scale data repositories. (See My Scientific Big Data Are Lonely.)
Others see exascale computing system proposals as a quixotic quest for national bragging rights that will benefit only a few. Let me be equally clear, this statement is unquestionably false. Yes, there are elements of national and regional competition in the Top500 rankings. However, the underlying technology challenges and scientific opportunities are profound. Low power memory and processor designs, post-Dennard device scaling and the software and reliability challenges of large, complex systems all are at the forefront of 21st century computing system design, which deep implications for the future of the information economy. Similarly, some of our most pressing scientific and social problems in climate change, energy and biomedicine are dependent on powerful, advanced computing capabilities. Implicit in this is the need for continued, balanced investment in technology research and high-end system deployment for HPC.
A Grand Concord
We need a concord and strategic research investment plan that recognizes the shared importance of HPC and big data. Both warrant investments in basic research, and both need investments in large-scale infrastructure deployments. (Make no mistake, though, research and infrastructure are different, as I noted in Research {preposition} Infrastructure.) In a time of straitened budgets, this will not be easy, and it will undoubtedly require political compromise.
Neither the proponents of big data and nor those of HPC may get all they want on the time scale they desire, but neither can be sacrificed for the other. If the proponents on each side adopt strategies that treat the other community as the enemy, the relevant lesson of the Peloponnesian wars is unmistakable -- in such a battle, there are only losers. That would be disastrous for us all, particularly when there is a win-win so tantalizingly close.
Comments
You can follow this conversation by subscribing to the comment feed for this post.