The high performance data center planning project provides computational and storage resources to generate meaningful data sets and disseminate them to the public. This project utilizes computing infrastructure with large numbers of processing cores to provide a parallel computing environment required for data processing using algorithms developed for this project. The planning phase of this project is limited in scale and provides a roadmap to building large scale infrastructure to support data management and dissemination for a range of applications. The infrastructure acquired under this project is used to (i) generate spatiotemporal data sets describing the extent and motion of hurricanes, storm systems, and other natural phenomena; (ii) create visual words, visual phrases, and visual blocks based on image contents to improve image search; (iii) manage information on search preferences and search characteristics of users for improved web search; and (iv) provide utilities to enable access to deep web data sources for scientific domains. Data sets and utilities created through this project will be made publicly available.

This project serves as a catalyst for research in many fields through the production of high quality data sets, and the production of methods and utilities required collect, manipulate, and use the data. Furthermore, this work and the hardware supporting it is being directly integrated into computer science classes to expose students to high performance computing and the value of high quality data sets for research.

Project Report

This CRI grant has supported construction of a shared high performance data center that benefits the Computer Science faculty, students, courses, and collaborative researchers across Texas State University for research and education activities. The computational and storage resources provided by the data center have enabled management of large volumes of data and collection of datasets that are valuable to researchers within and beyond Texas State University. The data center has been instrumental in addressing the needs raised by resource-demanding research projects, adding a critically important piece to the general computing infrastructure of the CS department. The CRI grant has led to fruitful results in the past two years: 1) Data Center: with this CRI grant, three pieces of powerful equipment (1 Dell Precision T7500n work station with 12 Cores, 16GB RAM, and 1TB Drive, 1 Dell PowerEdge R 610 server with 24GB RAM, 3 Cores, and 1.5T Drive, and 1 Dell PowerEdge R815 server with 64GB RAM, 6 Cores, and 876G Drive) were ordered and a shared high performance data center was built, which benefited many projects and numerous faculty members and students in the CS department. 2) Activities: 8 projects were supported, including User-centric Organization of Search Results, Kepler Scientific Workflow Design and Execution with Contexts, Geospatial Overlay Computation on the GPU, Learning to Judge Image Search Results, User-centric Organization of Search Results – Clustering Interface, ServiceXplorer, Discriminative Codebook Learning for Web Image Search, and Sketch-Based 3D Model Retrieval. These projects have generated many novel algorithms and new technologies, leading to significant contributions within and beyond the discipline. 3) Training: 3 Post-docs, 23 undergraduate students, and 7 graduate students participated in the research projects led by the PIs. An NSF/OCI REU Site project (OCI-1062439, 3/1/2011–2/28/2014) was conducted in our department in the last two summers. 20 undergraduate students from diverse institutions participated in this program and the infrastructure enabled by this CRI grant provided the necessary servers for all the REU students. These research projects helped undergraduate students gain research experiences and interests. Many undergraduate students made impressive achievements and were motivated to pursue graduate studies. Graduate students and post-docs involved in the projects were allowed to improve in research background, skills, and publication records. As a consequence, 2 graduate students (1 female) were successfully admitted to Ph.D. programs at reputable universities and one post-doc obtained a tenure-track assistant professor position. 4) Publications: From Feb. 2011 - Jan. 2013, research supported by this grant has generated 16 published papers in highly recognized leading journals (e.g., Transactions on Multimedia, Signal Processing, and IEEE Internet Computing) and conferences (e.g., ACM Multimedia, CIKM, and ICSOC). In addition, 7 papers have been submitted, and at least 4 papers are underway to be submitted in 2013. 5) Funding: 4 external and 1 internal research grants were funded, including NSF-STEM 1153688, DoD W911NF-12-1-0057, TxDOT 0-6789, Army Research URAP W911NF-12-R-0007, and Texas State Research Enhancement Program. In addition, 4 external proposals are under review. 6) Data and Systems: Three datasets were created, including a synonymous image search queries dataset, a large-scale sketch-based 3D shape retrieval benchmark dataset, and a web service dataset. These datasets have been made open for public access via http://mvlab.cs.txstate.edu/, www.itl.nist.gov/iad/vug/sharp/contest/2013/SBR/, and http://eil.cs.txstate.edu/ServiceXplorer. Two user-centric search result organization systems, Rants (http://dmlab.cs.txstate.edu/rants/) and ClusteringWiki (http://dmlab.cs.txstate.edu/clusteringwiki/), and a web service similarity search engine ServiceXplorer (http://eil.cs.txstate.edu/WSXplore) were built and maintained. The collected datasets, developed systems, and constructed REU website (http://reu.cs.txstate.edu/) are all hosted on the above workstations and servers. 7) Education: The collected datasets and research results have been integrated into various levels of classes in the Computer Science Department. Students from these classes are able to use the datasets and systems to gain first-hand experience and intuition in the fundamental concepts and research frontiers of data mining, machine learning, computer vision, and web technology. The relevant graduate courses include CS5332 (Database Theory and Design), CS5369U (Advanced Data Mining), CS5375 (Multimedia Computing), CS5369L (Machine Learning and its Applications), CS5376 (Enterprise Application Integration), and CS5369G (Web Services Engineering). The relevant undergraduate courses include CS4332 (Introduction to Database Systems), CS4378U (Data Mining), CS4378V (Introduction to Machine Learning), and CS2388 (Internet Programming on the World Wide Web). In summary, with this CRI grant, a shared high performance data center was built, providing the necessary computational power and storage space to enable collection and management of raw data as well as development of novel research and technologies. It presents significantly increased opportunities to undergraduate and graduate students at Texas State University and beyond to participate and gain valuable experience in research. The collected datasets and developed technologies have been directly integrated into courses, student projects, and interdisciplinary education programs. The datasets and systems have been made publicly accessible through the Internet, contributing to local and international research and education activities.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1058724
Program Officer
Almadena Chtchelkanova
Project Start
Project End
Budget Start
2011-02-01
Budget End
2013-01-31
Support Year
Fiscal Year
2010
Total Cost
$50,000
Indirect Cost
Name
Texas State University - San Marcos
Department
Type
DUNS #
City
San Marcos
State
TX
Country
United States
Zip Code
78666