N.B. The Science Magazine Policy Forum article was jointly authored. The opinions expressed in this blog post are my own, expanded thoughts on our shared call for action.
Update: A free link to the AAAS Science article is here.
On February 20, 2025, my colleagues and I – (Ewa Deelman (USC/ISI), Jack Dongarra (Tennessee/ORNL), Bruce Hendrickson (LLNL), Amanda Randles (Duke), Ed Seidel (Wyoming), and Kathy Yelick (UC-Berkeley) – published a science policy perspective on high-performance computing (HPC) in Science, the flagship journal of the American Association for the Advancement of Science (AAAS). Entitled, “High Performance Computing at a Crossroads,” it calls for the U.S. government to organize a taskforce charged with creating a national, decadal roadmap for high-performance computing in the post-exascale era and supporting that roadmap via a whole-of-nation integrated investment strategy.
Our Science policy piece builds on our collective expertise, informed by a series of government and professional society reports we have co-authored alongside other national leaders in high-performance computing and computational science. These reports include:
- “Charting a Path in a Shifting Technical and Geopolitical Landscape: Post-Exascale Computing for the National Nuclear Security Administration,” The National Academies Press (2023), https://doi.org/10.17226/26916, Yelick (committee chair), Dongarra and Reed (committee members)
- “The Future of Computational Science,” Society for Industrial and Applied Mathematics (2024), https://www.siam.org/media/cfufuosh/siam-report-on-the-future-of-computational-science.pdf, Hendrickson (committee chair)
- “Can the United States Maintain Its Leadership in High-Performance Computing?” DOE ASCAC Subcommittee on American Competitiveness and Innovation to the ASCR Office (2023). https://www.osti.gov/biblio/1989107, Dongarra (committee chair), Deelman and Yelick (committee members), Reed (committee charge, as ASCAC chair)
- “2024 Advanced Scientific Computing Advisory Committee Facilities Subcommittee Recommendations” (2024), DOE Advanced Scientific Computing Advisory Committee, https://www.osti.gov/biblio/2370379, Seidel and Randles (committee co-chairs), Deelman, Hendrickson, and Reed (committee members)
Across these reports, several consistent themes have repeatedly emerged, ones we fear are not being sufficiently heeded by the U.S. government, industry, and the research community. Beyond the two undeniable realities – the end of decades long Dennard scaling and diminishing returns from Moore’s Law transistor scaling, and the dominance of cloud hyperscaler and generative AI economics – the reality is that the U.S. in particular lacks a coherent strategy for developing future generations of high-performance computing systems, ones critical for national security, economic growth, scientific discovery, and public well-being. As we noted in the Science policy forum artcle:
Governments worldwide are heavily investing in HPC infrastructure to support research, industrial innovation, and national security, each adopting distinct approaches shaped by national interests and regulatory landscapes. Conversely, in the U.S. there is no long-term plan or comprehensive vision for the next era of HPC advancements, leaving the future trajectory of U.S. HPC and scientific and technological leadership uncertain.
A Changing Technical World
As my long-time readers know, I have warned repeatedly that HPC is in major flux, shaped by both technical challenges and market forces. (See Computing Futures: Technical, Economic, and Geopolitical Challenges Ahead and American Competitiveness: IT and HPC Futures – Follow the Money and the Talent. I also co-authored an arXiv article, Reinventing High Performance Computing: Challenges and Opportunities, and a Communications of the ACM article, HPC Forecast: Cloudy and Uncertain, with Dennis Gannon and Jack Dongarra.) The Science policy form article touches on these themes as well, notably those related to floating point arithmetic, memory bandwidth, and computing energy needs.
When Cache and Cash Are Not Enough
Most measures of algorithmic complexity still focus on integer and floating point operations and ignore data movement costs. Decades ago, that was reasonable; floating point operations were costly. Today, data movement costs, measured in both latency and energy, often dwarf those of arithmetic. For this reason, the TOP500 list, which focuses on dense matrix multiplication is no longer the best predictor of achieved performance. It once was, when systems were better balanced, and multiphysics simulations with more irregular memory access patterns were less common.
Today, the TOP500 list is increasingly an abstract dynamometer measure, when an off-road driving metric would be more appropriate. Moreover, as China no longer submits HPC system entries, and the hyperscalers see limited marketing value in submitting entries, the TOP500’s relevance as a historical record is also declining.
This increasing skew of the floating point operations per second (FLOPS) to memory bandwidth ratio is one of the “dirty little secrets” of modern semiconductors. Neither memory bandwidth nor latency have kept pace with microprocessor performance increases. Consequently, when used for computational science, modern microprocessors often stall awaiting memory contents, and achieved application performance is often just a small fraction of the publicity claims. (In fairness, the publicity claims are just that, numbers generally not achievable by practical applications.)
In the interregnum between the Chicxulub meteor strike, which caused the Cretaceous–Paleogene extinction event, and the birth of the IBM Personal Computer, computer memory system designs better matched processor performance. The Cray-1, released in 1976, used high speed static RAM (SRAM) for its 16-way interleaved memory system, allowing four 64-bit words to be read each clock cycle. When coupled with vector registers and instruction chaining (i.e., feeding vector operation outputs from one functional unit directly to another), the Cray-1 was a model of balanced system design, beloved by application developers and admired by computer architects.
As commodity PCs began to dominate the computing marketplace, the growing performance gap between microprocessors and rising capacity, slower dynamic random access memory (DRAM) – hundreds of clock cycles – was ameliorated by small, fast caches. For most business and consumer workloads, caches are quite effective, but far less so for memory-intensive computational modeling applications. (Aside: Caches have a long history, with the IBM System 360/85, built in 1968, as an early, notable example.)
In some ways, high bandwidth memory (HBM) represents a “back to the past” embodiment of some elements of the Cray-1 memory model, albeit implemented by stacking DRAM dies and connecting them to the processor via through-silicon-vias or interposers. Unlike the Cray-1, where all the memory was fast, HBM is but a small subset of the total memory capacity, due to cost and packaging constraints. While HBM improves performance, it is no substitute for a truly balanced system design.
The Artificial Intelligence (AI) Juggernaut and Numerical Precision
Unlike most HPC models, which rely on 64 or 32-bit IEEE floating point for numerical precision, generative AI models can operate quite effectively with 16-bit, 8-bit, or even lower precision arithmetic. For AI, this has the twin advantages of higher floating point operation rates and reduced data movement and energy consumption. Given the rapidly widening gap between the sizes of the computational science and AI markets, there is increasing risk that future hardware may focus on lower precision, to the detriment of traditional modeling and simulation applications. This makes algorithmic research on mixed and low precision arithmetic more critical than ever.
Finally, there are the practical issues of capital and operating costs. As semiconductor feature sizes asymptotically approach zero, and transistor design complexity continues rising (planar through FinFETs to gate-all-around (GAA)), leading edge foundry construction costs are skyrocketing. Meanwhile, with semiconductor-driven performance gains slowing, greater performance increasingly means building larger systems, a problem common to both the HPC and AI markets.
In response, hyperscalers are now investing tens of billions of dollars annually in massive data centers, filled with AI accelerators designed to support training and inferencing and consuming gigawatts of power. It is a construction (capex) and operation (opex) scale those of us in traditional HPC, where one billion dollars is still a lot of money, can only observe with open-mouthed wonder and awe.
A U.S. Call to Action
In the 1960s and 1970s, one could legitimately argue that U.S. HPC drove mainstream computing advances, as exemplified by IBM’s Stretch and Seymour Cray’s eponymously named Cray-1. Both were based on partnerships with Los Alamos National Laboratory and Lawrence Livermore National Laboratory. In the 1990s and 2000s, we leveraged and helped steer the “attack of the killer micros” personal computer revolution, which birthed scalable Linux clusters.
Today, in a world filled with billions of smartphones and cloud-based AI, HPC’s influence has waned even further. We are mostly passive passengers on the AI bus to the future, hoping it serendipitously drives past our neighborhood. This must change, if we are to continue expanding the reach and scale of first principles computational modeling.
Without doubt, there are deep technical challenges in semiconductors and HPC, but there are also profound opportunities for innovation. As the t-shirt logo says, “Science – it’s like magic, but real.” It’s time for some new HPC and AI magic.
Step up to the mirror. Peering back at you is nature’s existence proof that human-level intelligence need not consume megawatts of power, a few tens of watts suffice. Meanwhile, the distinction between HPC and AI is fading. AI offers new opportunities in computational modeling via efficient, learned surrogate models, and HPC insights are guiding AI. HPC is AI, and AI is HPC. Carpe diem!
All too often, we fixate on hardware, but algorithmic advances have given us just as much, and the promise is just as real. New software models can and have elevated productivity and created new behavioral ecosystems. Quantum and biologically inspired computing (e.g., neuromorphic) are pregnant with possibilities, as are semi-custom chiplet-based architectures, ones better matched to scientific and engineering workloads.
Although the technical opportunities before us are large, the geopolitical risks of these inflection points are equally profound. We face the very real possibility of a catastrophe, both in the literal and the mathematical sense, where small parameter changes in a non-linear system (the global semiconductor ecosystem) cause equilibria to shift or disappear. At stake is global leadership in computing, with consequences that extend far beyond next-generation smartphones or impressive AI applications. (See On Catastrophes and Rebooting the Planet)
Computing, quantum technologies, and biotechnology are critical foundations of the 21st century economy, the enablers of global competitiveness and national security. Those countries and regions that dominate these technologies will have disproportionate influence on the world order. Make no mistake, the equilibria are shifting, and the U.S. is losing ground, already eclipsed in many of these critical areas by committed and sophisticated global competitors.
Today, Taiwan Semiconductor Manufacturing Company (TSMC), located in territory claimed by the People’s Republic of China, fabricates most of the high-end semiconductors on which the world depends. That geopolitical supply chain worry triggered the U.S. CHIPS and Science Act, intended to fund the “re-shoring” of U.S. semiconductor manufacturing; today, it faces an uncertain political future. Although some of the planned investments in U.S. semiconductor plants have been made, there is still much to be done. Equally worrisome, the majority of the “and science” part of the CHIPS and Science Act was merely authorized and never funded.
Meanwhile, NVIDIA, a fabless semiconductor behemoth and hardware driver of the AI revolution, depends critically on TSMC to fabricate its chips. Intel, though once dominant, is struggling to redefine itself in the post-PC world. The U.S. needs Intel, NVIDIA, AMD, and other domestic semiconductor designers and fabricators to flourish.
The new darlings of U.S. venture capitalists and NASDAQ investors – Amazon, Apple, Google, Microsoft, Meta, and OpenAI – are flying high, though the Chinese DeepSeek AI shock showed there is no monopoly on AI innovation. In our deeply interconnected world, semiconductor embargos are leaky constraints at best, and promising AI ideas spread at Internet speed.
Intellectual and market leadership are not birthrights; they are earned every day, in the classrooms, office, laboratories, and manufacturing plants. As Intel’s Andy Grove presciently warned, “Success breeds complacency. Complacency breeds failure. Only the paranoid survive.” Even more succinctly – you snooze; you lose.
This technical and geopolitical backdrop underscores why the HPC reports cited by our Science policy forum article have repeatedly called for a coordinated national R&D and funding strategy –one that combines work on algorithms, software, novel architectures, and next-generation materials and semiconductors. Crucially, this strategy must also include construction of advanced, scaled prototypes, built with custom silicon, to test these ideas. This has not happened, at least not in any coordinated way, nor at the needed scale.
While chair of the U.S. National Science Board, I made complementary pleas for greater investment in both basic research and workforce development via a next-generation National Defense and Education Act. Such an NDEA 2.0 would expand research and development in next-generation technologies via an all-of-nation strategy. (See A Call To Action: Congressional Testimony and Upgrading the Future: House Science Committee Testimony.) This too has not happened.
Let me be clear: our Science policy forum article is not a call for scattered research solicitations backed by a few million dollars. Building a better future for high-performance computing demands a sustained, multibillion-dollar, whole-of-nation research and development initiative.
The future is not something that just happens – it is something we create, one idea at a time. And it is time – long past time – to invest in the future.
Recent Comments