Knowledge Commons of National Science Library,CAS
Exploring features for automatic identification of news queries through query logs | |
ZHANG Xiaojuan; LI Jian; Jian Li(Email:lijian@swu.edu.cn) | |
2014-12-25 | |
Source Publication | Chinese Journal of Library and Information Science
![]() |
Volume | 7Issue:4Pages:31-45 |
Abstract | Purpose: Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases, this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources. Design/methodology/approach: First, we manually labeled 1,220 news queries from Sogou. com. Based on the analysis of these queries, we then identified three features of news queries in terms of query content, time of query occurrence and user click behavior. Afterwards, we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine (SVM) classifier. Finally, we compared the impacts of the features used in this paper on the identification of news queries. Findings: Compared with baseline features, the F-score has been improved from 0.6414 to 0.8368 after the use of three newly-identified features, among which the burst point (bst) was the most effective while predicting news queries. In addition, query expression (qes) was more useful than query terms, and among the click behavior-based features, news URL was the most effective one. Research limitations: Analyses based on features extracted from query logs might lead to produce limited results. Instead of short queries, the segmentation tool used in this study has been more widely applied for long texts. Practical implications: The research will be helpful for general-purpose search engines to address search intents for news events. Originality/value: Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.
Purpose: Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases, this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.
Design/methodology/approach: First, we manually labeled 1,220 news queries from Sogou. com. Based on the analysis of these queries, we then identified three features of news queries in terms of query content, time of query occurrence and user click behavior. Afterwards, we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine (SVM) classifier. Finally, we compared the impacts of the features used in this paper on the identification of news queries. Findings: Compared with baseline features, the F-score has been improved from 0.6414 to 0.8368 after the use of three newly-identified features, among which the burst point (bst) was the most effective while predicting news queries. In addition, query expression (qes) was more useful than query terms, and among the click behavior-based features, news URL was the most effective one. Research limitations: Analyses based on features extracted from query logs might lead to produce limited results. Instead of short queries, the segmentation tool used in this study has been more widely applied for long texts. Practical implications: The research will be helpful for general-purpose search engines to address search intents for news events. Originality/value: Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter. |
Subtype | Research Paper |
Keyword | Query Intent News Query News Intent Query Classification Automatic Identification |
Subject Area | 新闻学与传播学 ; 图书馆、情报与文献学 |
URL | 查看原文 |
Indexed By | 其他 |
Project Number | Grant No.: 2011QNCB28 |
Language | 英语 |
Funding Organization | This work is supported by the Social Science Planning Foundation of Chongqing |
Document Type | 期刊论文 |
Identifier | http://ir.las.ac.cn/handle/12502/7623 |
Collection | Journal of Data and Information Science_Chinese Journal of Library and Information Science-2014 |
Corresponding Author | Jian Li(Email:lijian@swu.edu.cn) |
Affiliation | School of Computer and Information Science, Southwest University, Chongqing 400715, China |
Recommended Citation GB/T 7714 | ZHANG Xiaojuan,LI Jian,Jian Li. Exploring features for automatic identification of news queries through query logs[J]. Chinese Journal of Library and Information Science,2014,7(4):31-45. |
APA | ZHANG Xiaojuan,LI Jian,&Jian Li.(2014).Exploring features for automatic identification of news queries through query logs.Chinese Journal of Library and Information Science,7(4),31-45. |
MLA | ZHANG Xiaojuan,et al."Exploring features for automatic identification of news queries through query logs".Chinese Journal of Library and Information Science 7.4(2014):31-45. |
Files in This Item: | Download All | |||||
File Name/Size | DocType | Version | Access | License | ||
31-Zhang Xiaojuan.pd(5182KB) | 期刊论文 | 出版稿 | 开放获取 | CC BY-NC-ND | View Download |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment