A theory of "probable correctness" is proposed to assess the trustworthiness of software through testing. Most current testing methods are intended for debugging, to find failures and connect them to program faults for repair. When these methods no longer expose errors, no analysis has been done to find the confidence that may be placed in the software. (Preliminary results here are that this confidence should be low.) The application of conventional decision theory to tests as program samples is suspect because its assumptions concerning independence and distribution do not hold. Thus there is no valid theoretical basis for predicting probable software quality from test results. As software becomes ever more complex, and its applications more critical, the need for a sound evaluation metric grows. It is crucial that a theory of testing for trustworthiness be plausible, which requires an analysis of the foundations of program sampling. The flaws in current theory result from sampling the wrong space: faults are distributed, not over the input domain, but over the program's text and variable state space. Preliminary results using a probabilistic model that arose in learning theory are promising. It is possible to analyze practical testing methods, and to understand why they fail. Continued development of the theory will yield quantitative comparison between methods, and can be used to compare programs according to their "testability."