International Journal of Advanced Information and Communication Technology


Retrieving Hierarchical Text Structure from Typescript Scientific Articles – A Requirement for E-Science Text Mining

T. Tamilselvi, Jayam College of Engineering and Technology, Dharmapuri, Tamilnadu, India.

G. Tholkappia Arasu, Principal, AVS Engineering College, Salem, Tamilnadu, India.

DOI : 01.0401/ijaict.2014.07.32

International Journal of Advanced Information and Communication Technology

Received On : February 15, 2017

Revised On : March 10, 2017

Accepted On : April 12, 2017

Published On : May 05, 2017

Volume 04, Issue 05

Pages : 658-663

Abstract


Despite the growth and development of the web in scientific publishing, there remain significant obstacles to the application of computer based text processing technologies. One obvious obstacle is the relative paucity of freely and publicly available full-text articles. Such obstacles have resulted in a large concentration of text processing research on the relatively small amount of suitable material that is currently available, notably MedLine abstracts. In this paper, we discuss a processing framework (PTX) for scientific documents guided by two main principles. We start from the de facto position that most published material is available in PDF, a layout or document appearance format. For text processing, the (hierarchical) structure of the text is required. Secondly, we believe that the most likely users of scientific text processing will be scientists exploring literature within a particular specialism. Consequently, the framework can and should exploit (in a modular fashion) knowledge about that specific literature. The framework is being developed in the context of two E-Science projects: FlySlip and CitRAZ. The former is developing tools to aid human database curation of Drosophila genetics literature. The latter is combining Argumentative Zoning with citation information in order to help improve both citation indexing and text summarisation.

Keywords


Hierarchical, FlySlip, CitRAZ.

Cite this article


T. Tamilselvi, G. Tholkappia Arasu, “Retrieving Hierarchical Text Structure from Typescript Scientific Articles – A Requirement for E-Science Text Mining” INTERNATIONAL JOURNAL OF ADVANCED INFORMATION AND COMMUNICATION TECHNOLOGY, pp.658-663, May 05, 2017.

Copyright


© 2017 T. Tamilselvi, G. Tholkappia Arasu. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.