The Center for Computational Learning Systems (CCLS) is collaborating with the Computational Neurophysiology Laboratory (CNL) in the Department of Neurology, Columbia University Medical School (CUMC) to develop a distributed framework for data management and machine learning on intracranial EEG data obtained from patients suffering from epilepsy.
Drs. Schevon and Emerson have initiated a trial of a dense, two-dimensional microelectrode array which can record over long periods of time at a sampling rate of up to 30 kHz per channel. To date approximately 30 TB of data has been collected. The large volume of complex EEG data compels us to rethink how we will deal with this "data avalanche." The design of a data center for storage and analysis is particularly challenging since traditional methods of storing data on a single server do not allow machine learning algorithms to be computed within a reasonable time. Further, due to the conditions under which the data is collected, noise of multiple types and sources is pervasive; the data must be extensively cleaned and potential seizure precursors carefully labeled. The project is investigating mechanisms to develop a cluster architecture (using Apache Hadoop) for the EEGMine Data Center that incorporates reliable storage and backup; developing a library of machine learning algorithms (EEGMine- ML library) and addressing their scalability issues, potentially leveraging the MapReduce programming paradigm.
This research will have immediate impact for both epilepsy and computer science research. Because of the uniqueness and value of human-derived microelectrode EEG data, it would be beneficial for the seizure prediction community to enable data sharing and long-distance collaborations. The most practical means of sifting through terabytes of complex EEG data is to combine distributed storage on a cluster with local processing to prepare data and generate meta-data that can be used as inputs for machine learning algorithms thus enabling identification of physiologically significant patterns. From an education perspective, the project will benefit the EWarn Research Group which is part of CCLS and CUMC by training them in signal processing, machine learning and basics of EEG.
Website Address: http://www1.ccls.columbia.edu/~dutta/EEGMine