Person-specific named entity recognition using SVM with rich feature sets
NIE Hui; Nie Hui (E-mail:issnh@mail.sysu.edu.cn)
2012-11-20
发表期刊Chinese Journal of Library and Information Science
ISSN1674-3393
卷号5期号:3页码:27-46
摘要

Purpose: The purpose of the study is to explore the potential use of nature language process (NLP) and machine learning (ML) techniques and intents to find a feasible strategy and effective approach to fulfill the NER task for Web oriented person-specific information extraction.

Design/methodology/approach: An SVM-based multi-classification approach combined with a set of rich NLP features derived from state-of-the-art NLP techniques has been proposed to fulfill the NER task. A group of experiments has been designed to investigate the influence of various NLP-based features to the performance of the system, especially the semantic features. Optimal parameter settings regarding with SVM models, including kernel functions, margin parameter of SVM model and the context window size, have been explored through experiments as well.

Findings: The SVM-based multi-classification approach has been proved to be effective for the NER task. This work shows that NLP-based features are of great importance in datadriven NE recognition, particularly the semantic features. The study indicates that higher order kernel function may not be desirable for the specific classification problem in practical application. The simple linear-kernel SVM model performed better in this case. Moreover, the modified SVM models with uneven margin parameter are more common and flexible, which have been proved to solve the imbalanced data problem better.

Research limitations/implications: The SVM-based approach for NER problem is only proved to be effective on limited experiment data. Further research need to be conducted on the large batch of real Web data. In addition, the performance of the NER system need be tested when incorporated into a complete IE framework.

Originality/value: The specially designed experiments make it feasible to fully explore the characters of the data and obtain the optimal parameter settings for the NER task, leading to a preferable rate in recall, precision and F1 measures. The overall system performance (F1 value) for all types of name entities can achieve above 88.6%, which can meet the requirements for the practical application.

关键词Named Entity Recognition Natural Language Processing Svm-based Classifier Feature Selection
学科领域编辑出版
URL查看原文
项目资助者This work is support by the Special Research Fundation for Young Teachers of Sun Yat-sen University (Grant No. 2000-3161101) and Humanity and Social Science Youth Foundation of Ministry of Education of China (Grant No. 08JC870013).
文献类型期刊论文
条目标识符http://ir.las.ac.cn/handle/12502/5606
专题Journal of Data and Information Science_Chinese Journal of Library and Information Science-2012
通讯作者Nie Hui (E-mail:issnh@mail.sysu.edu.cn)
推荐引用方式
GB/T 7714
NIE Hui,Nie Hui . Person-specific named entity recognition using SVM with rich feature sets[J]. Chinese Journal of Library and Information Science,2012,5(3):27-46.
APA NIE Hui,&Nie Hui .(2012).Person-specific named entity recognition using SVM with rich feature sets.Chinese Journal of Library and Information Science,5(3),27-46.
MLA NIE Hui,et al."Person-specific named entity recognition using SVM with rich feature sets".Chinese Journal of Library and Information Science 5.3(2012):27-46.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
27-46-Nie Hui[19].pd(1671KB) 开放获取使用许可请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[NIE Hui]的文章
[Nie Hui (E-mail:issnh@mail.sysu.edu.cn)]的文章
百度学术
百度学术中相似的文章
[NIE Hui]的文章
[Nie Hui (E-mail:issnh@mail.sysu.edu.cn)]的文章
必应学术
必应学术中相似的文章
[NIE Hui]的文章
[Nie Hui (E-mail:issnh@mail.sysu.edu.cn)]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。