In any technology field, there is always some set of core concepts that dominate the agenda for research and practice. As a field progresses that set evolves, with new concepts replacing old ones. This project will model the dynamic social system through which technological concepts come to be perceived and understood. The primary research question is: How do the actions and opinions of individual actors give rise to more globally accepted concepts in a technology field, and how do such micro-macro dynamics change over time? That question will be answered by using an iterative process in which computational analysis of text is used to populate a model of salient aspects of social dynamics. Interpretations based on that model will then be used to guide refinement and enrichment of the computational analysis. This "computationally-supported case study" process offers a promising new approach to building and testing theories for social science research. By coupling focused extraction and classification for high-volume multi-source data with a multi-concept computational analysis strategy the project will create a new middle ground between today's richly analyzed but narrowly focused case studies and the presently available scalable but relatively shallow techniques, such as citation analysis. The project will focus on Information Technology as an exemplar field, leveraging the broad and accessible discourse on that topic found in vast collections of formal and informal sources. Text analytic techniques from information retrieval and computational linguistics will be adapted to detect specific concepts, and to connect those concepts with the people who write about them and with attitudes that those people express. An early goal will be to explain the extent to which the popularity of concepts results from social actors' actions and opinions. Contributions of this research are expected to include a scalable analytical framework that can affordably be extended to a broad range of other technologies, and a deeper understanding of the types of leverage that can be obtained from emerging text analytic techniques to enable new approaches to social science research.
We have developed computational systems for understanding the dynamic social system and processes underlying the development, diffusion, and use of Information Technology (IT) innovations. We have achieved outcomes in three vibrant and intertwined areas. First, we have developed and evaluated a human supervised computational approach for understanding the complex relationships among IT innovations. Contemporary strategies for classifying technologies are costly and time-consuming, because they rely on expert manual evaluation of the technologies' features and functions. As the number of new ITs increases dramatically, conventional classification techniques cannot keep up. Nonetheless, given the wide availability of machine readable news articles on various ITs, we have resorted to the texts that discussed the technologies. By analyzing the language patterns such as co-occurrences of key terms and frequencies of words, we found that similar technologies tended to have similar texts and thus automatic mining the texts can efficiently classify technologies into meaningful categories. Specifically, we have automatically classified 50 information technologies by using computers to analyze the technology news in the past 20 years. This automated text-mining approach proves to be efficient, reliable, and scalable, suggesting that the language used to describe ITs carries important signals that can help policy-makers, developers, and users untangle the complex relationships among the technologies. This computational linguistics approach has the promise to significantly ease the tasks of tracking, classifying, and making sense of innovations in science and technology. Second, we have applied and advanced manual and automated content analysis to investigate the role of human values in the development, diffusion, use, and regulation of IT innovations. We developed and evaluated the Meta-Inventory for Human Values (MIHV), the first instrument for coding human values developed for and by content analysis, which integrates 12 value inventories. The MIHV was successfully tested on the testimonies about net neutrality prepared for hearings held by the U.S. Congress and FCC. We found two statistically significant differences between the values invoked by proponents and opponents of net neutrality: proponents invoked the value of innovation significantly more frequently than opponents, and opponents invoked the value of wealth significantly more frequently than proponents. Based on this difference, we conclude that proponents of net neutrality are more motivated by long-term thinking (since for IT companies, innovation in the long run typically leads to wealth) while opponents are more motivated by short-term thinking. In addition, we developed new techniques to manually identify instances of specific human values (such as freedom, honor, innovation, justice, social order, and wealth) that are reflected in public discourse on IT innovations. We further created what we believe to be the first automated classification algorithms that can identify instances of specific human values. The algorithms automatically detect about two-thirds of all instances of specific human values when tested on new public statements about IT regulation. Third, we have developed computational system components for finding opinions about IT innovations. Social scientists need automatic sentiment analysis in order to test hypotheses about how sentiment influences this process using broadscale data. However, current methods use very computationally intensive techniques to detect first which pieces of a text contain information about sentiment, and then what that sentiment is. While error rates for each of these tasks are acceptable, pipelining them causes a potential cascade of errors that yields a system with below acceptable error rates. We demonstrated that one can use more reliable named entity techniques to diagnose the opinion-innovation relationship. Also, we demonstrated the effective use of crowdsourcing data annotation for this task. This technique is a more reliable first step for completely automatic sentiment analysis and provides an intermediate point for human analysis of appropriately annotated text. This allows social scientists to focus on text which has automatically segregated opinion or sentiment analysis components. Throughout the course of the project, five faculty members at the University of Maryland, two international visiting scholars (from China and Japan), nine graduate students, and 13 undergraduate students worked on the project, producing five journal papers, three complete dissertations, 26 conference papers, and two book chapters. In addition to the project's website, we have built a community website (http://stick.ischool.umd.edu/community) to achieve two main objectives: (1) to share the data and research findings from this project with the general public and other researchers interested in science and technology innovations; and (2) to encourage other innovation researchers to participate in the community, sharing their own data and analysis and interact with each other. This initiative goes beyond the conventional dissemination mechanisms to establish and sustain a community-based research infrastructure even long after this project officially ends. The capability of identifying, understanding, and predicting trends in sciences and technologies provides tremendous opportunities and advantage of improving economic and social welfare. This research advances the frontiers of knowledge of the nation's ecosystem for IT innovations, offering opportunities for better-informed innovation policy-making and investments.