The informatics problem facing scientists in the Petabyte Age has three parts: 1) finding, accessing, and loading massive amounts of data, 2) figuring out the appropriate data reduction or abstraction to make sense of the combined data set, and 3) operationally processing those data. n this SGER, the focus is on the second of these sub-problems ? how scientists deal cognitively with petabyte size data sets of their own and others? provenance. Natural language processing applications saw orders of magnitude improvement when statistical processing/machine learning was applied to massive data sets. Some linguists observe that these improvements level off at some point, and that subsequent improvements come only after domain knowledge (in that case, linguistic theory) is also applied to the processing. There is a similar situation in science. While some application areas in a Petabyte Age might require only non domain-specific machine learning to predict phenomena (what Chris Anderson calls ?agnostic statistics?), others will require the deeper understanding of phenomena that most scientists seek. In other words, for some domains, and in most sciences, it is not enough to answer ?what? ? one also needs to answer ?how?. This work is high risk because there is little research on the extent to which visualization of natural phenomena can be made "cognitively" consonant across disparate spatial and temporal scales, Further, because of disparity between the scientific- and information-visualization communities, it is unclear how to connect analytics with visualization of ecological phenomena. Finally, the data integration necessary for the proposed visualization requires managing much larger volumes of data, and many more different kinds of data. The intellectual merit of the proposed work lays in new conceptual data structures and data representations for scientific visualization that can be used to generate domain specific visualization templates from which a range of specific visualizations for a domain could be drawn.
The broader impacts of the proposed work are three-fold: 1) potential application of the work beyond the realm of environmental science and climate change to that of natural resource management and policy, and to other sciences, 2) educational impact to computational thinking in terms of curricular development at Evergreen College, which will be disseminated via the NSF CPATH project Northwest Distributed Computer Science Department (NWDCSD), and 3) free open-source distribution of the software tool to the scientific community.