Exploring features for automatic identification of news queries through query logs
ZHANG Xiaojuan; LI Jian; Jian Li(Email:lijian@swu.edu.cn)
2014-12-25
发表期刊Chinese Journal of Library and Information Science
卷号7期号:4页码:31-45
摘要Purpose: Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases, this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.

Design/methodology/approach: First, we manually labeled 1,220 news queries from Sogou. com. Based on the analysis of these queries, we then identified three features of news queries in terms of query content, time of query occurrence and user click behavior. Afterwards, we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine (SVM) classifier. Finally, we compared the impacts of the features used in this paper on the identification of news queries.

Findings: Compared with baseline features, the F-score has been improved from 0.6414 to 0.8368 after the use of three newly-identified features, among which the burst point (bst) was the most effective while predicting news queries. In addition, query expression (qes) was more useful than query terms, and among the click behavior-based features, news URL was the most effective one.

Research limitations: Analyses based on features extracted from query logs might lead to produce limited results. Instead of short queries, the segmentation tool used in this study has been more widely applied for long texts.

Practical implications: The research will be helpful for general-purpose search engines to address search intents for news events.

Originality/value: Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.
 
;
Purpose: Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases, this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.

Design/methodology/approach: First, we manually labeled 1,220 news queries from Sogou. com. Based on the analysis of these queries, we then identified three features of news queries in terms of query content, time of query occurrence and user click behavior. Afterwards, we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine (SVM) classifier. Finally, we compared the impacts of the features used in this paper on the identification of news queries.

Findings: Compared with baseline features, the F-score has been improved from 0.6414 to 0.8368 after the use of three newly-identified features, among which the burst point (bst) was the most effective while predicting news queries. In addition, query expression (qes) was more useful than query terms, and among the click behavior-based features, news URL was the most effective one.

Research limitations: Analyses based on features extracted from query logs might lead to produce limited results. Instead of short queries, the segmentation tool used in this study has been more widely applied for long texts.

Practical implications: The research will be helpful for general-purpose search engines to address search intents for news events.

Originality/value: Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.
 
文章类型Research Paper
关键词Query Intent News Query News Intent Query Classification Automatic Identification
学科领域新闻学与传播学 ; 图书馆、情报与文献学
URL查看原文
收录类别其他
所属项目编号Grant No.: 2011QNCB28
语种英语
项目资助者This work is supported by the Social Science Planning Foundation of Chongqing
文献类型期刊论文
条目标识符http://ir.las.ac.cn/handle/12502/7623
专题Journal of Data and Information Science_Chinese Journal of Library and Information Science-2014
通讯作者Jian Li(Email:lijian@swu.edu.cn)
作者单位School of Computer and Information Science, Southwest University, Chongqing 400715, China
推荐引用方式
GB/T 7714
ZHANG Xiaojuan,LI Jian,Jian Li. Exploring features for automatic identification of news queries through query logs[J]. Chinese Journal of Library and Information Science,2014,7(4):31-45.
APA ZHANG Xiaojuan,LI Jian,&Jian Li.(2014).Exploring features for automatic identification of news queries through query logs.Chinese Journal of Library and Information Science,7(4),31-45.
MLA ZHANG Xiaojuan,et al."Exploring features for automatic identification of news queries through query logs".Chinese Journal of Library and Information Science 7.4(2014):31-45.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
31-Zhang Xiaojuan.pd(5182KB)期刊论文出版稿开放获取CC BY-NC-ND请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[ZHANG Xiaojuan]的文章
[LI Jian]的文章
[Jian Li(Email:lijian@swu.edu.cn)]的文章
百度学术
百度学术中相似的文章
[ZHANG Xiaojuan]的文章
[LI Jian]的文章
[Jian Li(Email:lijian@swu.edu.cn)]的文章
必应学术
必应学术中相似的文章
[ZHANG Xiaojuan]的文章
[LI Jian]的文章
[Jian Li(Email:lijian@swu.edu.cn)]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。