中国科学院文献情报中心机构知识库
Advanced  
NSL OpenIR  > Journal of Data and Information Science  > Chinese Journal of Library and Information Science-2012  > 期刊论文
Title: A method for improving the accuracy of automatic indexing of Chinese-English mixed documents
Author: ZHAO Yan ; SHI Hui
Source: Chinese Journal of Library and Information Science
Issued Date: 2012-12-25
Volume: 5, Issue:4, Pages:77-92
Keyword: Chinese-English mixed documents ; String matching ; Accuracy of automatic indexing ; Cybernetics ; Dedicated hepatitis B virus (HBV) database
Subject: 编辑出版
Corresponding Author: Yan Zhao (E-mail: zhaoyan2000@shisu.edu.cn)
Abstract:

Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.
Design/methodology/approach: Based on the inherent characteristics of Chinese-English mixed texts and the cybernetics theory, we proposed an integrated control method for indexing  documents. It consists of "feed-forward control", "in-progress control" and "feed-back control", aiming at improving the accuracy of automatic indexing of Chinese-English mixed documents. An experiment was conducted to investigate the effect of our proposed method.
Findings: This method distinguishes Chinese and English documents in grammatical structures and word formation rules. Through the implementation of this method in the three phases of automatic indexing for the Chinese-English mixed documents, the results were encouraging. The precision increased from 88.54% to 97.10% and recall improved from 97.37% to 99.47%.
Research limitations: The indexing method is relatively complicated and the whole indexing process requires substantial human intervention. Due to pattern matching based on a bruteforce (BF) approach, the indexing efficiency has been reduced to some extent.
Practical implications: The research is of both theoretical signifi cance and practical value in improving the accuracy of automatic indexing of multilingual documents (not confined to Chinese-English mixed documents). The proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas.
Originality/value: So far, few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. This study will provide insights into the automatic indexing of multilingual documents, especially Chinese-English mixed documents.

English Abstract:

Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.
Design/methodology/approach: Based on the inherent characteristics of Chinese-English mixed texts and the cybernetics theory, we proposed an integrated control method for indexing  documents. It consists of "feed-forward control", "in-progress control" and "feed-back control", aiming at improving the accuracy of automatic indexing of Chinese-English mixed documents. An experiment was conducted to investigate the effect of our proposed method.
Findings: This method distinguishes Chinese and English documents in grammatical structures and word formation rules. Through the implementation of this method in the three phases of automatic indexing for the Chinese-English mixed documents, the results were encouraging. The precision increased from 88.54% to 97.10% and recall improved from 97.37% to 99.47%.
Research limitations: The indexing method is relatively complicated and the whole indexing process requires substantial human intervention. Due to pattern matching based on a bruteforce (BF) approach, the indexing efficiency has been reduced to some extent.
Practical implications: The research is of both theoretical signifi cance and practical value in improving the accuracy of automatic indexing of multilingual documents (not confined to Chinese-English mixed documents). The proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas.
Originality/value: So far, few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. This study will provide insights into the automatic indexing of multilingual documents, especially Chinese-English mixed documents.

Related URLs: 查看原文
Content Type: 期刊论文
URI: http://ir.las.ac.cn/handle/12502/5628
Appears in Collections:Chinese Journal of Library and Information Science-2012_期刊论文

Files in This Item: Download All
File Name/ File Size Content Type Version Access License
Zhao Yan.pdf(1595KB)----开放获取View Download

Recommended Citation:
ZHAO Yan,SHI Hui. A method for improving the accuracy of automatic indexing of Chinese-English mixed documents[J]. Chinese Journal of Library and Information Science,2012,5(4):77-92.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[ZHAO Yan]'s Articles
[SHI Hui]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[ZHAO Yan]‘s Articles
[SHI Hui]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
文件名: Zhao Yan.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院文献情报中心 - Feedback
Powered by CSpace