Scientific research relies on an infrastructure of networked computers (cyberinfrastructure), which need to be secured against malicious threats and intrusions. For instance, the privacy of clinical trial subjects, the integrity of medical data, and the safety of tissue samples and cultures can be compromised if the cyberinfrastructure of a university hospital is breached. Information Security Officers (ISOs) are in charge of maintaining a secure cyberinfrastructure for research, but their task is complicated by the speed at which new threats emerge, and by the proliferation of software (often written in different languages) and obsolete or specialized hardware among labs and research centers. ISOs look for answers in their quest to fight intrusions and vulnerabilities in documentation and on-line discussion forums, but this is a time-consuming process. To streamline the process, an interdisciplinary team of researchers are developing a computer system to automatically and continuously harvest all that knowledge from online sources, organizing it into a network of concepts and relations that can be queried and reasoned upon to keep ISOs a step ahead of malicious attacks.
The project develops a number of interconnected modules: 1) A sub-system that identifies concepts and relations in software documentation, advisory bulletins, and on-line technical forums, and then retrieves that information using state-of-the-art natural language processing techniques; 2) An ontology for cybersecurity, which is a knowledge representation system that organizes the retrieved concepts and relations into a logical network, allowing for implicit knowledge to be extracted by means of automatic reasoning algorithms, and; 3) A querying interface, which allows ISO staff to access the knowledge represented in the ontology to find answers to their questions about cybersecurity. This innovative approach to cybersecurity extends the use of ontologies in the biomedical field, leveraging the metaphor of vulnerabilities in information systems as viruses or infections. Even though the initial stages in the creation of the ontology will involve curation by human experts, the researchers expect that the system can itself automatically thanks to the use of information retrieval techniques, therefore overcoming one of the known bottlenecks in the usefulness of ontologies.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.