A method for improving the accuracy of automatic indexing of Chinese-English mixed documents
ZHAO Yan; SHI Hui; Yan Zhao (E-mail: zhaoyan2000@shisu.edu.cn)
2012-12-25
发表期刊Chinese Journal of Library and Information Science
ISSN1674-3393
卷号5期号:4页码:77-92
摘要

Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.
Design/methodology/approach: Based on the inherent characteristics of Chinese-English mixed texts and the cybernetics theory, we proposed an integrated control method for indexing  documents. It consists of "feed-forward control", "in-progress control" and "feed-back control", aiming at improving the accuracy of automatic indexing of Chinese-English mixed documents. An experiment was conducted to investigate the effect of our proposed method.
Findings: This method distinguishes Chinese and English documents in grammatical structures and word formation rules. Through the implementation of this method in the three phases of automatic indexing for the Chinese-English mixed documents, the results were encouraging. The precision increased from 88.54% to 97.10% and recall improved from 97.37% to 99.47%.
Research limitations: The indexing method is relatively complicated and the whole indexing process requires substantial human intervention. Due to pattern matching based on a bruteforce (BF) approach, the indexing efficiency has been reduced to some extent.
Practical implications: The research is of both theoretical signifi cance and practical value in improving the accuracy of automatic indexing of multilingual documents (not confined to Chinese-English mixed documents). The proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas.
Originality/value: So far, few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. This study will provide insights into the automatic indexing of multilingual documents, especially Chinese-English mixed documents.

;

Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.
Design/methodology/approach: Based on the inherent characteristics of Chinese-English mixed texts and the cybernetics theory, we proposed an integrated control method for indexing  documents. It consists of "feed-forward control", "in-progress control" and "feed-back control", aiming at improving the accuracy of automatic indexing of Chinese-English mixed documents. An experiment was conducted to investigate the effect of our proposed method.
Findings: This method distinguishes Chinese and English documents in grammatical structures and word formation rules. Through the implementation of this method in the three phases of automatic indexing for the Chinese-English mixed documents, the results were encouraging. The precision increased from 88.54% to 97.10% and recall improved from 97.37% to 99.47%.
Research limitations: The indexing method is relatively complicated and the whole indexing process requires substantial human intervention. Due to pattern matching based on a bruteforce (BF) approach, the indexing efficiency has been reduced to some extent.
Practical implications: The research is of both theoretical signifi cance and practical value in improving the accuracy of automatic indexing of multilingual documents (not confined to Chinese-English mixed documents). The proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas.
Originality/value: So far, few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. This study will provide insights into the automatic indexing of multilingual documents, especially Chinese-English mixed documents.

关键词Chinese-english Mixed Documents String Matching Accuracy Of Automatic Indexing Cybernetics Dedicated Hepatitis b Virus (Hbv) Database
学科领域编辑出版
URL查看原文
文献类型期刊论文
条目标识符http://ir.las.ac.cn/handle/12502/5628
专题Journal of Data and Information Science_Chinese Journal of Library and Information Science-2012
通讯作者Yan Zhao (E-mail: zhaoyan2000@shisu.edu.cn)
推荐引用方式
GB/T 7714
ZHAO Yan,SHI Hui,Yan Zhao . A method for improving the accuracy of automatic indexing of Chinese-English mixed documents[J]. Chinese Journal of Library and Information Science,2012,5(4):77-92.
APA ZHAO Yan,SHI Hui,&Yan Zhao .(2012).A method for improving the accuracy of automatic indexing of Chinese-English mixed documents.Chinese Journal of Library and Information Science,5(4),77-92.
MLA ZHAO Yan,et al."A method for improving the accuracy of automatic indexing of Chinese-English mixed documents".Chinese Journal of Library and Information Science 5.4(2012):77-92.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Zhao Yan.pdf(1595KB) 开放获取使用许可请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[ZHAO Yan]的文章
[SHI Hui]的文章
[Yan Zhao (E-mail: zhaoyan2000@shisu.edu.cn)]的文章
百度学术
百度学术中相似的文章
[ZHAO Yan]的文章
[SHI Hui]的文章
[Yan Zhao (E-mail: zhaoyan2000@shisu.edu.cn)]的文章
必应学术
必应学术中相似的文章
[ZHAO Yan]的文章
[SHI Hui]的文章
[Yan Zhao (E-mail: zhaoyan2000@shisu.edu.cn)]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。