This proposal aims to establish a national short course in Big Data Image Processing & Analysis (BigDIPA) intended to increase the number and overall skills of competent research scientists now encountering large, complex image data sources derived from cutting edge biological/biomedical research approaches. Extraction of knowledge from these imaging sources requires specialized skills and an interdisciplinary mindset. Yet effective training opportunities of this sector of the Big Data science community are glaringly underappreciated and underserved compared to other big data fields such as omics. UC Irvine is ideally suited to host a short course to address this thematic training deficit on account of the synergistic colocalization between multiple facilities, renowned for development of numerous advanced imaging techniques, and the outstanding instructional environment provided by faculty with collaborative expertise in biological image processing and computer vision, bioinformatics and high performance computational approaches. Specifically, our BigDIPA proposal assembles an interdisciplinary alliance of faculty experts that can leverage the preeminent imaging resource facilities, such as the Laboratory of Fluorescence Dynamics (LFD) and the Beckman Laser Institute, and fuse these to ongoing campus big data initiatives, e.g. UCI's Data Science Initiative, to create a top-rated training course designed for senior graduate students, postdoctoral researchers, faculty and industry scientists from diverse scientific disciplines who have nascent interests and needs to handle BIG DATA sources beyond their current level of competency. The course theme is focused to utilize discreet examples drawn from the analysis of complex data acquired from different microscopy imaging modalities employed to investigate dynamics in cellular and tissue processes, including signal transduction networks, development, neuroscience and biomedical applications, and that hereto where hidden or inaccessible to standard methods of analysis. Participants will be guided along the complete acquisition- processing-analysis pipeline through exposure to a coherent progression of topics and issues typically encountered when handling BIG DATA. We believe this training approach will therefore be attractive to a broad and significant untapped pool of researchers from the biological disciplines, biomedical engineering, systems biology, math, biophysics, computer science, bioinformatics and statistics who possess some, but not all, of the requisite competencies to effectively traverse the BD2K landscape. We have designed the course such that skills and experience gained by trainees will be transferable to their own research interests. The BigDIPA course format will combine didactic lectures on the theory and foundational frameworks that underpin each step, with practical instruction on implementation and hands-on tutorials in image acquisition, large data handling, basic scripting of computational tools, image processing on high performance computing architectures, as well as feature extraction, evaluation and visualization of results. The course is designed to offer an intense learning experience delivered in a compact time frame, and opportunities to foster interdisciplinary interactions through small team exercises. Participants will also be encouraged to take advantage of pre-courses - separate and distinct training opportunities not funded by this proposal - that will be coordinated to directly precede our course. This unique format provides multiple benefits: it provides an efficient mechanism to address individual participant training deficiencies to permit a more productive experience in the BigDIPA course, adds no-cost mutual benefits to independent but synergistic programs, and facilitates recruitment of applicants who frequently feel interested but intimidated due to a perceived lack of prior adequate training. Beyond providing an intensive on-site training course, all course materials (lecture notes, video lectures and tutorials), tutorial exercises, open source software resources and sample datasets will be made freely available through on-line distribution to maximize outreach and encourage additional contributions of curated training resources solicited from the community.
We propose to train and expand the cadre of researchers capable of effectively using the deluge of complex BIG DATA being generated by advanced biomedical imaging approaches. These data sources represent a rich source of complex information relevant to many scientific areas of inquiry, and are informative at multiple scales ranging from fundamental biological processes at the cellular level to patient diagnostics for diseases such as cancer or neurological disorders.