中国科学院文献情报中心机构知识库
Advanced  
NSL OpenIR  > Journal of Data and Information Science  > Chinese Journal of Library and Information Science-2014  > 期刊论文
Title: Exploring features for automatic identification of news queries through query logs
Author: ZHANG Xiaojuan; LI Jian
Source: Chinese Journal of Library and Information Science
Issued Date: 2014-12-25
Volume: 7, Issue:4, Pages:31-45
Keyword: Query intent ; News query ; News intent ; Query classification ; Automatic identification
Subject: 新闻学与传播学 ; 图书馆、情报与文献学
Indexed Type: 其他
Corresponding Author: Jian Li(Email:lijian@swu.edu.cn)
DOC Type: Research Paper
Abstract: Purpose: Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases, this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.

Design/methodology/approach: First, we manually labeled 1,220 news queries from Sogou. com. Based on the analysis of these queries, we then identified three features of news queries in terms of query content, time of query occurrence and user click behavior. Afterwards, we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine (SVM) classifier. Finally, we compared the impacts of the features used in this paper on the identification of news queries.

Findings: Compared with baseline features, the F-score has been improved from 0.6414 to 0.8368 after the use of three newly-identified features, among which the burst point (bst) was the most effective while predicting news queries. In addition, query expression (qes) was more useful than query terms, and among the click behavior-based features, news URL was the most effective one.

Research limitations: Analyses based on features extracted from query logs might lead to produce limited results. Instead of short queries, the segmentation tool used in this study has been more widely applied for long texts.

Practical implications: The research will be helpful for general-purpose search engines to address search intents for news events.

Originality/value: Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.
 
English Abstract:
Purpose: Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases, this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.

Design/methodology/approach: First, we manually labeled 1,220 news queries from Sogou. com. Based on the analysis of these queries, we then identified three features of news queries in terms of query content, time of query occurrence and user click behavior. Afterwards, we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine (SVM) classifier. Finally, we compared the impacts of the features used in this paper on the identification of news queries.

Findings: Compared with baseline features, the F-score has been improved from 0.6414 to 0.8368 after the use of three newly-identified features, among which the burst point (bst) was the most effective while predicting news queries. In addition, query expression (qes) was more useful than query terms, and among the click behavior-based features, news URL was the most effective one.

Research limitations: Analyses based on features extracted from query logs might lead to produce limited results. Instead of short queries, the segmentation tool used in this study has been more widely applied for long texts.

Practical implications: The research will be helpful for general-purpose search engines to address search intents for news events.

Originality/value: Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.
 
Project Number: Grant No.: 2011QNCB28
Funder: This work is supported by the Social Science Planning Foundation of Chongqing
Related URLs: 查看原文
Language: 英语
Content Type: 期刊论文
URI: http://ir.las.ac.cn/handle/12502/7623
Appears in Collections:Chinese Journal of Library and Information Science-2014_期刊论文

Files in This Item: Download All
File Name/ File Size Content Type Version Access License
31-Zhang Xiaojuan.pdf(5182KB)期刊论文出版稿开放获取View Download

description.institution: School of Computer and Information Science, Southwest University, Chongqing 400715, China

Recommended Citation:
ZHANG Xiaojuan,LI Jian. Exploring features for automatic identification of news queries through query logs[J]. Chinese Journal of Library and Information Science,2014,7(4):31-45.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[ZHANG Xiaojuan]'s Articles
[LI Jian]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[ZHANG Xiaojuan]‘s Articles
[LI Jian]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
文件名: 31-Zhang Xiaojuan.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院文献情报中心 - Feedback
Powered by CSpace