中国科学院文献情报中心机构知识库
Advanced  
NSL OpenIR  > Journal of Data and Information Science  > Chinese Journal of Library and Information Science-2012  > 期刊论文
Title: Person-specific named entity recognition using SVM with rich feature sets
Author: NIE Hui
Source: Chinese Journal of Library and Information Science
Issued Date: 2012-11-20
Volume: 5, Issue:3, Pages:27-46
Keyword: Named entity recognition ; Natural language processing ; SVM-based classifier ; Feature selection
Subject: 编辑出版
Corresponding Author: Nie Hui (E-mail:issnh@mail.sysu.edu.cn)
English Abstract:

Purpose: The purpose of the study is to explore the potential use of nature language process (NLP) and machine learning (ML) techniques and intents to find a feasible strategy and effective approach to fulfill the NER task for Web oriented person-specific information extraction.

Design/methodology/approach: An SVM-based multi-classification approach combined with a set of rich NLP features derived from state-of-the-art NLP techniques has been proposed to fulfill the NER task. A group of experiments has been designed to investigate the influence of various NLP-based features to the performance of the system, especially the semantic features. Optimal parameter settings regarding with SVM models, including kernel functions, margin parameter of SVM model and the context window size, have been explored through experiments as well.

Findings: The SVM-based multi-classification approach has been proved to be effective for the NER task. This work shows that NLP-based features are of great importance in datadriven NE recognition, particularly the semantic features. The study indicates that higher order kernel function may not be desirable for the specific classification problem in practical application. The simple linear-kernel SVM model performed better in this case. Moreover, the modified SVM models with uneven margin parameter are more common and flexible, which have been proved to solve the imbalanced data problem better.

Research limitations/implications: The SVM-based approach for NER problem is only proved to be effective on limited experiment data. Further research need to be conducted on the large batch of real Web data. In addition, the performance of the NER system need be tested when incorporated into a complete IE framework.

Originality/value: The specially designed experiments make it feasible to fully explore the characters of the data and obtain the optimal parameter settings for the NER task, leading to a preferable rate in recall, precision and F1 measures. The overall system performance (F1 value) for all types of name entities can achieve above 88.6%, which can meet the requirements for the practical application.

Related URLs: 查看原文
Content Type: 期刊论文
URI: http://ir.las.ac.cn/handle/12502/5606
Appears in Collections:Chinese Journal of Library and Information Science-2012_期刊论文

Files in This Item: Download All
File Name/ File Size Content Type Version Access License
27-46-Nie Hui[19].pdf(1671KB)----开放获取View Download

Recommended Citation:
NIE Hui. Person-specific named entity recognition using SVM with rich feature sets[J]. Chinese Journal of Library and Information Science,2012,5(3):27-46.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[NIE Hui]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[NIE Hui]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
文件名: 27-46-Nie Hui[19].pdf
格式: Adobe PDF
此文件暂不支持浏览
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院文献情报中心 - Feedback
Powered by CSpace