N.B. An abridged version of this essay was submitted as a white paper to the BDEC Frankfurt workshop.
I am going to be a bit radical and suggest we must reason in new ways about the need to integrate high-performance computing (HPC) and big data analytics. That must begin with our thinking about how we label and discuss them. Any expert in journalism, communications or psychology will immediately and unhesitatingly affirm that names have extraordinary power. If you doubt that, look at the philosophy of proper names and the cultural history associated with true names. And if that's not convincing, ask any child who was taunted because his or her name rhymed with something unfortunate.
Names shape our thinking, define our discourse and reinforce our biases and perspectives. We use names and descriptors to categorize people, organizations and objects; we use names to define the positive and the negative; and we use names to market, reward and punish. Brand equity has real value. Sadly, as many studies have shown, names also play on stereotypes and our implicit bias. Take Harvard's Project Implicit bias test; you will be humbled and chastened as it exposes your own bias.
High-performance computing (HPC) and big data/machine learning are not exempt from the power of names and a "they could learn from us if they'd only listen" dichotomy. Done any low performance computing (LPC) lately? Analyzed any small data? Of course, you did, when you checked your email and text messages on your smartphone a few minutes ago, but you did not think about it in those terms. In fact, you ran a trans-petascale computation and staged data across a global set of fiber optically connected data caches (a content delivery network) when you searched for a coffee shop yesterday. But, none of us call it that – it's just a quick web search in the ill-defined and amorphous cloud to which we are connected via a global broadband (wired and wireless) infrastructure.
I suspect we are trapped in a sematic cul-de-sac when we talk about high-performance computing and computational science. They are code words for physics-inspired numerical solution of partial differential questions; all too frequently, everything else is potentially suspect and inferior. The denotation of the word exascale says nothing about computing at all, but the connotation of FLOPS and batch-oriented HPC platforms is there in all of our minds. In turn, deep learning and big data conjure images of computer science types tuning neural nets and recommender systems in cloud data centers for targeted product marketing.
It's time to stop categorizing and name calling, time to end cultural and disciplinary silos. There is no hierarchy of needs or intellectual purity. There are just ideas and people who can learn from one other. Dare I say it – death to HPC; death to big data. It's time to stop the religious wars over HPC and big data cultures and technologies and focus on informatics-mediated discovery and innovation. The goal of an HPC center should not be self-perpetuation, either of infrastructure or organization. Nor should a cloud service be restricted to commercial domains. That means starting with intellectual outcomes and deriving approaches rather than defining approaches and searching for feasible intellectual outcomes.
From False Dichotomy to Unification
There are many technical ways to envision and realize the continuum and fusion of traditional HPC and computational science and big data and machine learning. These include integration of steam-based workflows and just-in-time schedulers, containerization and software stack packaging for application-specific configuration, fine-grained parallelization of learning packages, learning for algorithm and software adaptation and tuning, custom ASIC designs (see Google's recent TPU announcement), software defined networks and storage (SDN and SDS), and renewable and energy efficient hardware design and configuration. There are an equally large number of social and economic approaches, including denominating informatics costs in currency to illuminate investment priorities, spot pricing for priority access, economics-driven infrastructure selection and deployment, and institutional and funding agency policies for data management.
What we cannot do is allow ourselves to be bound by names, labels and cultures. Richard Hamming wisely noted, "The purpose of computing is insight, not numbers." HPC and machine learning are not goals, they are enablers. And that's the major consensus narrative.