The world is increasingly awash in data. As more and more human activities move on line, and as a growing array of connected devices become integral part of daily life, the amount and diversity of data being generated continues to explode. According to one estimate, more than a Zettabyte (one billion terabytes) of new information was created in 2010 alone, with the rate of new information increasing by roughly 60% annually. This data takes many forms: free-form tweets, text messages, blogs and documents; structured streams produced by computers, sensors and scientific instruments; and media such as images and video. Buried in this flood of data are the keys to solving huge societal problems, for improving productivity and efficiency, for creating new economic opportunities, and for unlocking new discoveries in medicine, science and the humanities. However, raw data alone is not sufficient; we can only make sense of our world by turning this data into knowledge and insight. This challenge, known as the Big Data problem, cannot be solved by the straightforward application of current data analytics technology due to the sheer volume and diversity of information. Rather, to solve it requires throwing away old preconceptions about data management and breaking down many of the traditional boundaries in and around Computer Science and related disciplines.
The Algorithms, Machines, and People (AMP) expedition at the University of California, Berkeley is addressing this challenge head-on. AMP is a collaboration of researchers with a wide range of data-related expertise, committed to working together to create a new data analytics paradigm. AMP will produce fundamental innovations in and a deep integration of three very different types of computational resources: 1. Algorithms: new machine-learning and analysis methods that can operate at large scale and can give flexible tradeoffs between timeliness, accuracy, and cost. 2. Machines: systems infrastructure that allows programmers to easily harness the power of scalable cloud and cluster computing for making sense of data. 3. People: crowdsourcing human activity and intelligence to create hybrid human/computer solutions to problems not solvable by today's automated data analysis technologies alone.
AMP research will be guided and evaluated through close collaboration with domain experts in key societal applications including: cancer genomics and personalized medicine, large-scale sensing for traffic prediction and environmental monitoring, urban planning, and network security. Advances pioneered by the project will be made widely available through the development of the Berkeley Data Analysis System (BDAS), an open source software platform that seamlessly blends Algorithm, Machine and People resources to solve big data problems.
For more information visit http://amplab.cs.berkeley.edu