Cross-validation of neural network applications for automatic new topic identification


JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, vol.59, no.3, pp.339-362, 2008 (SCI-Expanded) identifier identifier


The purpose of this study is to provide results from experiments designed to investigate-the cross-validation of an artificial neural network application to automatically identify topic changes in Web search engine user sessions by using data logs of different Web search engines for training and testing the neural network. Sample data logs from the FAST and Excite search engines are used in this study. The results of the study show that identification of topic shifts and continuations on a particular Web search engine user session can be achieved with neural networks that are trained on a different Web search engine data log. Although FAST and Excite search engine users differ with respect to some user characteristics (e.g., number of queries per session, number of topics per session), the results of this study demonstrate that both search engine users display similar characteristics as they shift from one topic to another during a single search session. The key finding of this study is that a neural network that is trained on a selected data log could be universal; that is, it can be applicable on all Web search engine transaction logs regardless of the source of the training data log.