Alberto Simões
University of Minho
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alberto Simões.
Behavior Research Methods | 2012
Ana Paula Soares; Montserrat Comesaña; Ana P. Pinheiro; Alberto Simões; Carla Sofia Frade
This study presents the adaptation of the Affective Norms for English Words (ANEW; Bradley & Lang, 1999a) for European Portuguese (EP). The EP adaptation of the ANEW was based on the affective ratings made by 958 college students who were EP native speakers. Subjects assessed about 60 words by considering the affective dimensions of valence, arousal, and dominance, using the Self-Assessment Manikin (SAM) in either a paper-and-pencil or a Web survey procedure. Results of the adaptation of the ANEW for EP are presented. Furthermore, the differences between EP, American (Bradley & Lang, 1999a), and Spanish (Redondo, Fraga, Padrón, & Comesaña, Behavior Research Methods, 39, 600–605, 2007) standardizations were explored. Results showed that the ANEW words were understood in a similar way by EP, American, and Spanish subjects, although some sex and cross-cultural differences were observed. The EP adaptation of the ANEW is shown to be a valid and useful tool that will allow researchers to control and/or manipulate the affective properties of stimuli, as well as to develop cross-linguistic studies. The normative values of EP adaptation of the ANEW can be downloaded at http://brm.psychonomic-journals.org/content/supplemental.
Behavior Research Methods | 2014
Ana Paula Soares; José Carlos Medeiros; Alberto Simões; João Machado; Ana Rita Costa; Álvaro Iriarte; José João de Almeida; Ana P. Pinheiro; Montserrat Comesaña
In this article, we introduce ESCOLEX, the first European Portuguese children’s lexical database with grade-level-adjusted word frequency statistics. Computed from a 3.2-million-word corpus, ESCOLEX provides 48,381 word forms extracted from 171 elementary and middle school textbooks for 6- to 11-year-old children attending the first six grades in the Portuguese educational system. Like other children’s grade-level databases (e.g., Carroll, Davies, & Richman, 1971; Corral, Ferrero, & Goikoetxea, Behavior Research Methods, 41, 1009–1017, 2009; Lété, Sprenger-Charolles, & Colé, Behavior Research Methods, Instruments, & Computers, 36, 156–166, 2004; Zeno, Ivens, Millard, Duvvuri, 1995), ESCOLEX provides four frequency indices for each grade: overall word frequency (F), index of dispersion across the selected textbooks (D), estimated frequency per million words (U), and standard frequency index (SFI). It also provides a new measure, contextual diversity (CD). In addition, the number of letters in the word and its part(s) of speech, number of syllables, syllable structure, and adult frequencies taken from P-PAL (a European Portuguese corpus-based lexical database; Soares, Comesaña, Iriarte, Almeida, Simões, Costa, …, Machado, 2010; Soares, Iriarte, Almeida, Simões, Costa, França, …, Comesaña, in press) are provided. ESCOLEX will be a useful tool both for researchers interested in language processing and development and for professionals in need of verbal materials adjusted to children’s developmental stages. ESCOLEX can be downloaded along with this article or from http://p-pal.di.uminho.pt/about/databases.
processing of the portuguese language | 2012
Alberto Simões; Álvaro Iriarte Sanromán; José João Almeida
In this paper we describe how Dicionaio-Aberto, an online dictionary for the Portuguese language, is being used as the base to construct diverse resources that are relevant in the processing of the Portuguese language. We will briefly present its history, explaining how we got here. Then, we will describe the resources already available to download and use, followed by the discussion on the resources that are being currently developed.
Psicologia-reflexao E Critica | 2014
Ana Paula Soares; Álvaro Iriarte; José João de Almeida; Alberto Simões; Ana Rita Costa; Patrícia Cunha França; João Machado; Montserrat Comesaña
In this paper we present the strategies and procedures undertaken in the development of a new measure of lexical frequency of the contemporary European Portuguese - Procura-PALavras (P-PAL). Based on a corpus of over 227 million words, P-PAL offers the default frequency per million words (lemmas and wordforms), and the computation of several other objective (lexical and sublexical) and subjective word metrics. We also describe lexical entry integration and word frequency extraction. The high number of indices and lexical entries makes P-PAL an advanced and indispensable web application for the promotion and internationalization of Portuguese research. P-PAL is available at http://p-pal.di.uminho.pt/tools
IberSPEECH 2014 Proceedings of the Second International Conference on Advances in Speech and Language Technologies for Iberian Languages - Volume 8854 | 2014
Alberto Simões; Xavier Gómez Guinovart
In this article we exploit the possibility on bootstrapping an European Portuguese WordNet from the English, Spanish and Galician wordnets using Probabilistic Translation Dictionaries automatically created from parallel corpora. The process generated a total of 56i¾?770 synsets and 97i¾?058 variants. An evaluation of the results using the Brazilian OpenWordNet-PT as a gold standard resulted on a precision varying from 53% to 75% percent, depending on the cut-line. The results were satisfying and comparable to similar experiments using the WN-Toolkit.
symposium on languages, applications and technologies | 2013
Xavier Gómez Guinovart; Alberto Simões
Even in the 21st century, paper dictionaries are still compiled and developed using standard word processors. Many publishing companies are, nowadays, working on converting their dictionaries into computer readable documents, so that they can be used to prepare new features, such as making them available online. Luckily, most of these publishers can pay review teams to fix and even enhance these dictionaries. Unfortunately, research institutions cannot hire that amount of workers. In this article we present the process of retreading a Galician dictionary that was first developed and compiled using Microsoft Word. This dictionary was converted, through automatic rewriting, into a Text Encoding Initiative schema subset. This process will be detailed, and the problems found will be discussed. Given a recent normative that changed the Galician orthography, the dictionary has undergone a semi-automatic modernization process. Finally, two applications for the obtained dictionaries will be shown. 1998 ACM Subject Classification I.7.2 Document Preparation
world conference on information systems and technologies | 2013
Nuno Ramos Carvalho; Alberto Simões; José João Almeida
Besides source code, the fundamental source of information about Open Source Software lies in documentation, and other non source code files, like README, INSTALL, or HowTo files, commonly available in the software ecosystem. These documents, written in natural language, provide valuable information during the software development stage, but also in future maintenance and evolution tasks.
symposium on languages, applications and technologies | 2014
Alberto Simões; José João Almeida; Simon D. Byers
One of the first tasks when building a Natural Language application is the detection of the used language in order to adapt the system to that language. This task has been addressed several times. Nevertheless most of these attempts were performed a long time ago when the amount of computer data and the computational power were limited. In this article we analyze and explain the use of a neural network for language identification, where features can be extracted automatically, and therefore, easy to adapt to new languages. In our experiments we got some surprises, namely with the two Chinese variants, whose forced us for some language-dependent tweaking of the neural network. At the end, the network had a precision of 95%, only failing for the Portuguese language.
cross language evaluation forum | 2005
Nuno Cardoso; Leonardo Andrade; Alberto Simões; Mário J. Silva
This paper presents the participation of the XLDB Group in the CLEF 2005 ad-hoc monolingual and bilingual subtasks for Portuguese. We participated with an improved and extended configuration of the tumba! search engine software. We detail the new features and evaluate their performance.
Computer Science and Information Systems | 2014
Nuno Carvalho; Alberto Simões; José João Almeida
Besides source code, the fundamental source of information about open source software lies in documentation, and other non source code files, like README, INSTALL, or How-To files, commonly available in the software ecosystem. These documents, written in natural language, provide valuable information during the software development stage, but also in future maintenance and evolution tasks. DMOSS3 is a toolkit designed to systematically assess the quality of non source code content found in software packages. The toolkit handles a package as an attribute tree, and performs several tree traverse algorithms through a set of plugins, specialized in retrieving specific metrics from text, gathering information about the software. These metrics are later used to infer knowledge about the software, and composed together to build reports that assess the quality of specific features. This paper discusses the motivations for this work, continues with a description of the toolkit implementation and design goals. This is followed by an example of its usage to process a software package, and the produced report.