The digital revolution of the 21st Century has brought many advances in technology along with an accompanying increase in the number and size of digital records. This increase has outpaced the existing organizational and preservation practices used by archives, especially where it is the repositories and their staff who organize long-term preservation and access for deposited collections. In 2011 the National Science Foundation issued a mandate that every researcher who applies for these federal research funds must submit a Data Management Plan to explain exactly how the collected research data would be stored, secured, preserved, and accessed. A 2013 memorandum from the White House's Office of Science and Technology Policy specified that scientific research collected with federal funds must be made open and available for use by the public. These moves have overwhelmed the existing infrastructures for digital archives, especially those focusing on endangered languages. This project will investigate a potentially transformative methodology and set of best practices to solve this challenge that enables researchers in linguistics and language documentation to organize and curate their own data collections. If successful, results will be broadly useful to endangered language researchers as well as related disciplines like anthropology, sociology and the digital humanities. A transformation that facilitates the access and archiving of endangered language and other social science data would have a major positive impact on dissemination of data and further scholarship through re-use of the data. Exploring a solution that minimizes the financial impact while maximizing open access will offer a high value solution to a major challenge of the digital era.
The Archive of the Indigenous Languages of Latin America (AILLA) is perfectly poised to carry out this project and confront the issue of the logjam of data. AILLA is the foremost example of a regional digital language archive whose holdings are both large and easily accessible. Founded in 2001 at the University of Texas at Austin, AILLA covers all of Latin America, home to several hundred endangered languages and the location of numerous language documentation projects, many funded by NSF and other federal agencies. With portals in both English and Spanish, AILLA holdings are accessible to a wider audience than those of English-portal-only archives. Project activities target a specific set of data collections, first to assess number and size of files and the level of organization for each individual deposit, and then to curate them. To gauge the extent of the data backlog facing the discipline on a wider scale, fellow member archives from the Digital Endangered Languages and Musics Archives Network (DELAMAN) will be surveyed regarding their workflows, repository organization, and types of backlogged collections. The information gleaned will be synthesized into a shared set of best practices, metadata, and methodologies, refined in consultation with DELAMAN archivists and disseminated through video tutorials on YouTube in both English and Spanish. The resulting training has the potential to enable digital archives staff to quickly and easily add new data collections to the digital repositories. Faster and easier ingestion of data into digital repositories will both streamline the ingestion processes at digital repositories as well as increase accessibility and discoverability of data to fulfill the open access mandate. Additionally, the project will co-convene a gathering of representatives DELAMAN archives in 2018 at the Institute on Collaborative Language Research (CoLang) to highlight newly developed methodologies and to facilitate connections between archivists and the researchers with unarchived data collections. Archivist and CoPI Susan Smythe Kung will also conduct a CoLang workshop to disseminate results and train researchers. Other deliverables include a presentation at the Linguistic Society of America, designed to reach a broad audience in the language sciences.