Update: Jim Pool reminds me of the following book, which touches on this issue.
The parallel application contains millions of lines of code, combining multiple models of physical, engineering, biological, social and/or economic processes, operating over temporal and spatial scales that span ten orders of magnitude. It was written by tens or even hundreds of graduate students, post-doctoral associates, software developers and yes, even a few professors, over a decade. It involves numerical libraries and functions from diverse research groups and companies, and a single execution requires thousands of hours on tens of thousands of processor cores. In short, it's a typical example of an extreme scale high-performance computing code.
And The Answer Is …
What is the probability that any execution of this code produces the "right" answer? Does a "right" answer even exist? If so, how might we know? Or, is this a nonsensical question?
These are not simply Gedanken experiments in software engineering or hardware reliability, nor are they just epistemological questions about the philosophy of knowledge. Rather, they are very practical and real questions about the nature of extreme scale computational science. They are the essence of verification and validation (V&V) processes, and we should be much more rigorous and systematic about their application.
The central lesson of software engineering is that regardless of the rigor of design processes, testing methodologies and boundary condition specifications, large applications, regardless of type, will contain multiple errors. We have all seen and experienced examples, from the blue screen of death on personal computers to the loss of the Mars Climate Orbiter due to mixed unit calculations. Computational science applications have no special dispensation to escape this destiny.
Second, because the raison d'être for multidisciplinary applications is enabling researchers to gain insight into complex and often poorly understood phenomena, testing them can be problematic, as the answers are often known only for simplified, model problems or boundary conditions. What constitutes a rigorous test suite when little or no experimental data is available for independent comparison?
Third, there's the small matter of numerical stability. The IEEE floating point standard balances range (exponent) and precision (mantissa) in a fixed number of bits, and necessarily approximates many real numbers. After large numbers of floating point operations, even a stable algorithm and a well-conditioned problem will contain some degree of error. Rarely do we bound the possible error via interval analysis.
Finally, there's the small matter of hardware errors, an increasingly common phenomenon for large memories constructed from high-capacity DRAMs and for microprocessor datapaths. We have been taught that error correcting codes correct DRAM bit errors. In truth, the standard SECDED code only corrects single bit errors and detects double bit errors. Burst DRAM errors (triple bit or more) are not corrected and are increasingly common. Some iterative algorithms can recover from bit errors, converging to the "correct" answer; others cannot.
Computation As Experiment
Are you afraid? We all should be. It is time to embrace the scientific process for computational science. We must view the execution of a large, multidisciplinary code as what it is – an experiment, with all the possible error sources attendant with any physical experiment. This includes repeating the experiment (computation) to determine confidence intervals on the answer, conducting perturbation studies to determine the sensitivity of the answer to environmental (hardware and software) conditions, identifying sources of experimental bias and defining the experiment rigorously for independent verification.
In the days of analog scientific computing via slide rule, we all understood that the computation was an approximation. It's time to relearn that lesson.