In every scientific discipline, researchers are digitizing their knowledge and using computational methods. This process has generated enormous amounts of ad hoc data, data for which standard data processing tools are not readily available. This kind of data poses challenges to both the users and the software that manipulate it. In order to maximize the efficiency and accuracy with which scientists deal with ad hoc data, new work will expand the existing data description and processing language. The ability to process prevalent kinds of data sources will be expanded and the system modified to generate tools in a robust and automatic way. This will permit the system to provide a way to provide descriptive information about the data directly to the tools. The system will be formalized to prove the correctness of the tools. The research combines novel programming language design, high-performance system engineering and theoretical analysis to solve crucial data processing problems. The system will be tested to address real problems such as fraud detection and in the context of genomic pathway modeling, as well as in cosmology data. Both graduate and undergraduate students are engaged in the interdisciplinary research.