%0 Conference Proceedings
%T Text/non-text classification of connected components in document images
%D 2017
%A Julca-Aguilar, Frank Dennis,
%A Maia, Ana Lucia Lima Marreiros,
%A Hirata, Nina Sumiko Tomita,
%@affiliation University of São Paulo
%@affiliation State University of Feira de Santana, University of São Paulo
%@affiliation University of São Paulo
%E Torchelsen, Rafael Piccin,
%E Nascimento, Erickson Rangel do,
%E Panozzo, Daniele,
%E Liu, Zicheng,
%E Farias, Mylène,
%E Viera, Thales,
%E Sacht, Leonardo,
%E Ferreira, Nivan,
%E Comba, João Luiz Dihl,
%E Hirata, Nina,
%E Schiavon Porto, Marcelo,
%E Vital, Creto,
%E Pagot, Christian Azambuja,
%E Petronetto, Fabiano,
%E Clua, Esteban,
%E Cardeal, Flávio,
%B Conference on Graphics, Patterns and Images, 30 (SIBGRAPI)
%C Niterói, RJ
%8 Oct. 17-20, 2017
%S Proceedings
%I IEEE Computer Society
%J Los Alamitos
%K text segmentation, connected component, convolutional neural network.
%X Text segmentation is an important problem in document analysis related applications. We address the problem of classifying connected components of a document image as text or non-text. Inspired from previous works in the literature, besides common size and shape related features extracted from the components, we also consider component images, without and with context information, as inputs of the classifiers. Muli-layer perceptrons and convolutional neural networks are used to classify the components. High precision and recall is obtained with respect to both text and non-text components.
%@language en
%3 PID4960469.pdf