This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. Constructing time decompositions of time stamped documents is an important first step in extracting temporal information from a document set. Efficient algorithms are described for computing optimal lossy decompositions for a given document set, where the loss of information is constrained to be within a specified bound. A novel and efficient algorithm is proposed for computing information loss values required to construct optimal lossy decompositions. Experimental results are reported comparing optimal lossy decompositions and equal length decompositions in terms of a number of parameters such as information loss. In particular, our results show that optimal lossy decompositions outperform equal length decompositions by preserving more of the information content of the underlying document set. The results also demonstrate that permitting even small amounts of variability in the length of the subintervals of a decomposition results in capturing more of the temporal information content of a document set when compared to equal length decompositions.
Showing the most recent 10 out of 322 publications