In order to build domain-specific web-scale video digital libraries on the Web, it is critical to be able to identify and extract certain information of interest (termed information segments) efficiently and automatically. For instance, by collecting only so-called academic videos and their information segments from the Web, one can build a next-generation digital library similar to CiteSeer or Google Scholar. However, that only archives and indexes academic videos (instead of academic papers). Toward this goal, we conduct a preliminary study to develop such identification and extraction of latent information segments from domain-specific videos on the Web. Key emphasis is on how to unearth diverse metadata and associated data from video contents and web pages from which videos are downloaded. Techniques from machine learning (e.g., LDA), data extraction and integration (e.g., wrapper/mediator), natural language processing (e.g., named entity recognition and extraction), and multimedia processing (e.g., near-duplicate detection) are evaluated, applied, and extended appropriately. Scalability of such techniques over large volumes of video data is also being explored.