T. Tamilselvi, Jayam College of Engineering and Technology, Dharmapuri, Tamilnadu, India.
G. Tholkappia Arasu, Principal, AVS Engineering College, Salem, Tamilnadu, India.
DOI : 01.0401/ijaict.2014.07.32
International Journal of Advanced Information and Communication Technology
Received On : February 15, 2017
Revised On : March 10, 2017
Accepted On : April 12, 2017
Published On : May 05, 2017
Volume 04, Issue 05
Pages : 658-663
Abstract
Despite the growth and development of the web in scientific publishing, there remain significant obstacles to the application of computer based text processing technologies. One obvious obstacle is the relative paucity of freely and publicly available full-text articles. Such obstacles have resulted in a large concentration of text processing research on the relatively small amount of suitable material that is currently available, notably MedLine abstracts. In this paper, we discuss a processing framework (PTX) for scientific documents guided by two main principles. We start from the de facto position that most published material is available in PDF, a layout or document appearance format. For text processing, the (hierarchical) structure of the text is required. Secondly, we believe that the most likely users of scientific text processing will be scientists exploring literature within a particular specialism. Consequently, the framework can and should exploit (in a modular fashion) knowledge about that specific literature. The framework is being developed in the context of two E-Science projects: FlySlip and CitRAZ. The former is developing tools to aid human database curation of Drosophila genetics literature. The latter is combining Argumentative Zoning with citation information in order to help improve both citation indexing and text summarisation.
Keywords
Hierarchical, FlySlip, CitRAZ.
Cite this article
T. Tamilselvi, G. Tholkappia Arasu, “Retrieving Hierarchical Text Structure from Typescript Scientific Articles – A Requirement for E-Science Text Mining” INTERNATIONAL JOURNAL OF ADVANCED INFORMATION AND COMMUNICATION TECHNOLOGY, pp.658-663, May 05, 2017.
Copyright
© 2017 T. Tamilselvi, G. Tholkappia Arasu. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.