1. Identity statement | |
Reference Type | Conference Paper (Conference Proceedings) |
Site | sibgrapi.sid.inpe.br |
Holder Code | ibi 8JMKD3MGPEW34M/46T9EHH |
Identifier | 8JMKD3MGPEW34M/45E54GS |
Repository | sid.inpe.br/sibgrapi/2021/09.13.19.25 |
Last Update | 2021:10.13.16.06.07 (UTC) administrator |
Metadata Repository | sid.inpe.br/sibgrapi/2021/09.13.19.25.26 |
Metadata Last Update | 2022:09.10.00.16.17 (UTC) administrator |
Citation Key | CarvalhoBorg:2021:CoStTe |
Title | A Comparative Study of Text Document Representation Approaches Using Point Placement-based Visualizations |
Format | On-line |
Year | 2021 |
Access Date | 2024, Mar. 29 |
Number of Files | 1 |
Size | 4243 KiB |
|
2. Context | |
Author | 1 Carvalho, Hevelyn Sthefany Lima de 2 Borges, Vinicius Ruela Pereira |
Affiliation | 1 University of Brasília 2 University of Brasília |
Editor | Paiva, Afonso Menotti, David Baranoski, Gladimir V. G. Proença, Hugo Pedro Junior, Antonio Lopes Apolinario Papa, João Paulo Pagliosa, Paulo dos Santos, Thiago Oliveira e Sá, Asla Medeiros da Silveira, Thiago Lopes Trugillo Brazil, Emilio Vital Ponti, Moacir A. Fernandes, Leandro A. F. Avila, Sandra |
e-Mail Address | hevelyn.sthefany@gmail.com |
Conference Name | Conference on Graphics, Patterns and Images, 34 (SIBGRAPI) |
Conference Location | Gramado, RS, Brazil (virtual) |
Date | 18-22 Oct. 2021 |
Publisher | Sociedade Brasileira de Computação |
Publisher City | Porto Alegre |
Book Title | Proceedings |
Tertiary Type | Undergraduate Work |
History (UTC) | 2021-10-13 16:06:07 :: hevelyn.sthefany@gmail.com -> administrator :: 2021 2022-09-10 00:16:17 :: administrator -> :: 2021 |
|
3. Content and structure | |
Is the master or a copy? | is the master |
Content Stage | completed |
Transferable | 1 |
Keywords | visualization word-embedding feature extraction text multidimensional scaling |
Abstract | In natural language processing, text representation plays an important role which can affect the performance of language models and machine learning algorithms. Basic vector space models, such as the term frequency-inverse document frequency, became popular approaches to represent text documents. In the last years, approaches based on word embeddings have been proposed to preserve the meaning and semantic relations of words, phrases and texts. In this paper, we focus on studying the influences of different text representations to the quality of layouts generated by state-of-art visualizations based on point placement. For that purpose, a visualization-assisted approach is proposed to support users when exploring such representations in classification tasks. Experimental results using two public labeled corpora were conducted to assess the quality of the layouts and to discuss possible relations to the classification performances. The results are promising, indicating that the proposed approach can guide users to understand the relevant patterns of a corpus in each representation. |
Arrangement | urlib.net > SDLA > Fonds > SIBGRAPI 2021 > A Comparative Study... |
doc Directory Content | access |
source Directory Content | there are no files |
agreement Directory Content | |
|
4. Conditions of access and use | |
data URL | http://urlib.net/ibi/8JMKD3MGPEW34M/45E54GS |
zipped data URL | http://urlib.net/zip/8JMKD3MGPEW34M/45E54GS |
Language | en |
Target File | WUW-9.pdf |
User Group | hevelyn.sthefany@gmail.com |
Visibility | shown |
|
5. Allied materials | |
Mirror Repository | sid.inpe.br/banon/2001/03.30.15.38.24 |
Next Higher Units | 8JMKD3MGPEW34M/45PQ3RS |
Citing Item List | sid.inpe.br/sibgrapi/2021/11.12.11.46 1 |
Host Collection | sid.inpe.br/banon/2001/03.30.15.38 |
|
6. Notes | |
Empty Fields | archivingpolicy archivist area callnumber contenttype copyholder copyright creatorhistory descriptionlevel dissemination documentstage doi edition electronicmailaddress group isbn issn label lineage mark nextedition notes numberofvolumes orcid organization pages parameterlist parentrepositories previousedition previouslowerunit progress project readergroup readpermission resumeid rightsholder schedulinginformation secondarydate secondarykey secondarymark secondarytype serieseditor session shorttitle sponsor subject tertiarymark type url versiontype volume |
|