Many distributed applications in the current Internet are massively replicated to ensure unsurpassed data robustness and scalability; however, constant data churn (i.e., update of the source) and delayed synchronization lead to staleness and thus lower performance in these systems. The goal of this project is to pioneer a stochastic theory of data replication that can tackle non-trivial dependency issues in synchronization of general non-Poisson point processes, design more accurate sampling and prediction algorithms for measuring data churn, solve novel multi-source and multi-replica staleness-optimization problems, establish new fundamental understanding of cooperative and multi-hop replication, and model non-stationary update processes of real sources.
The now omnipresent cloud technology has become a vast consumer and generator of data that must be stored, replicated, and streamed to a variety of clients. This project focuses on understanding theoretical and experimental properties of data evolution and staleness in such systems, whose outcomes are likely to impact Internet computing through creation of insight that leads to better content-distribution mechanisms, more accurate search results, and ultimately higher satisfaction among everyday users. Furthermore, this project blends a variety of inter-disciplinary scientific areas, reaches out to the student population at Texas A&M to engage them in research activities from early stages of their careers, trains well-rounded PhD students knowledgeable in both theoretical and experimental aspects of large-scale networked systems, engages under-represented student groups in STEM fields, disseminates information through two new seminars at Texas A&M, and shares data models and experimental results with the public.