TitleText/non-text classification of connected components in document images
DateOct. 17-20, 2017
Author1 Julca-Aguilar, Frank Dennis
2 Maia, Ana Lucia Lima Marreiros
3 Hirata, Nina Sumiko Tomita
Affiliation1 University of São Paulo
2 State University of Feira de Santana, University of São Paulo
3 University of São Paulo
Conference NameConference on Graphics, Patterns and Images, 30 (SIBGRAPI)
Conference LocationNiterói, RJ
Book TitleProceedings
PublisherIEEE Computer Society
Publisher CityLos Alamitos
Keywordstext segmentation, connected component, convolutional neural network.
AbstractText segmentation is an important problem in document analysis related applications. We address the problem of classifying connected components of a document image as text or non-text. Inspired from previous works in the literature, besides common size and shape related features extracted from the components, we also consider component images, without and with context information, as inputs of the classifiers. Muli-layer perceptrons and convolutional neural networks are used to classify the components. High precision and recall is obtained with respect to both text and non-text components.
