Jorge J. García Flores

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jorge J. García Flores is active.

Explore More

Publication

Featured researches published by Jorge J. García Flores.

International Conference on NLP | 2012

Tracking Researcher Mobility on the Web Using Snippet Semantic Analysis

Jorge J. García Flores; Pierre Zweigenbaum; Zhao Yue; William A. Turner

This paper presents the Unoporuno system: an application of natural language processing methods to the sociology of migration. Our approach extracts names of people from a scientific publications database, refines Web search queries using bibliographical data and decides of the international mobility category of a person according to the location analysis of those snippets classified as mobility traces. In order to identify mobility traces, snippets are filtered with a name validation grammar, analyzed with mobility related semantic features and classified with a support vector machine. This classification method is completed by a semi-automatic one, where Unoporuno selects 5 snippets to help a sociologist decide upon the mobility status of authors. Empirical evidence for the automatic person classification task suggest that Unoporuno classified 78% of the mobile persons in the right mobility category, with F=0.71. We also present empirical evidence for the semi-automatic task: in 80% of the cases sociologist are able to choose the right category with a moderate level of inter-rater agreement (0.60) based on the 5 snippet selection.

international conference on computational linguistics | 2014

LIPN: Introducing a new Geographical Context Similarity Measure and a Statistical Similarity Measure based on the Bhattacharyya coefficient

Davide Buscaldi; Jorge J. García Flores; Joseph Le Roux; Nadi Tomeh; Belém Priego Sanchez

This paper describes the system used by the LIPN team in the task 10, Multilingual Semantic Textual Similarity, at SemEval 2014, in both the English and Spanish sub-tasks. The system uses a support vector regression model, combining different text similarity measures as features. With respect to our 2013 participation, we included a new feature to take into account the geographical context and a new semantic distance based on the Bhattacharyya distance calculated on co-occurrence distributions derived from the Spanish Google Books n-grams dataset.

applications of natural language to data bases | 2004

Semantic filtering of textual requirements descriptions

Jorge J. García Flores

This paper explores the use of semantic filtering techniques for the analysis of large textual requirements descriptions. Our approach makes use of the Contextual Exploration Method to extract, within large textual requirements descriptions, those statements considered as relevant from requirements engineering perspective: concepts relationships, aspecto-temporal organisation, cause and control statements. We investigate to what extent filtering with these criteria can be the base of requirements analysis and validation processing, and what kind of software tools are necessary to support contextual exploration systems for this purpose.

north american chapter of the association for computational linguistics | 2015

SOPA: Random Forests Regression for the Semantic Textual Similarity task

Davide Buscaldi; Jorge J. García Flores; Ivan Meza; Isaac Rodriguez

This paper describes the system used by the LIPN-IIMAS team in the Task 2, Semantic Textual Similarity, at SemEval 2015, in both the English and Spanish sub-tasks. We included some features based on alignment measures and we tested different learning models, in particular Random Forests, which proved the best among those used in our participation.

soft computing | 2017

Cross-domain deception detection using support vector networks

Ángel Hernández-Castañeda; Hiram Calvo; Alexander F. Gelbukh; Jorge J. García Flores

Our motivation is to assess the effectiveness of support vector networks (SVN) on the task of detecting deception in texts, as well as to investigate to which degree it is possible to build a domain-independent detector of deception in text using SVN. We experimented with different feature sets for training the SVN: a continuous semantic space model source represented by the latent Dirichlet allocation topics, a word-space model, and dictionary-based features. In this way, a comparison of performance between semantic information and behavioral information is made. We tested several combinations of these features on different datasets designed to identify deception. The datasets used include the DeRev dataset (a corpus of deceptive and truthful opinions about books obtained from Amazon), OpSpam (a corpus of fake and truthful opinions about hotels), and three corpora on controversial topics (abortion, death penalty, and a best friend) on which the subjects were asked to write an idea contrary to what they really believed. We experimented with one-domain setting by training and testing our models separately on each dataset (with fivefold cross-validation), with mixed-domain setting by merging all datasets into one large corpus (again, with fivefold cross-validation), and with cross-domain setting: using one dataset for testing and a concatenation of all other datasets for training. We obtained an average accuracy of 86% in one-domain setting, 75% in mixed-domain setting, and 52 to 64% in cross-domain setting.

north american chapter of the association for computational linguistics | 2016

LIPN-IIMAS at SemEval-2016 Task 1: Random Forest Regression Experiments on Align-and-Differentiate and Word Embeddings penalizing strategies.

Oscar William Lightgow Serrano; Ivan Vladimir Meza Ruiz; Albert Manuel Orozco Camacho; Jorge J. García Flores; Davide Buscaldi

This paper describes the SOPA-N system used by the LIPN-IIMAS team in Semeval 2016 Semantic Textual Similarity (Task 1). We based our work on the SOPA 2015 system. The SOPA-2015 system used 16 similarity features (including Wordnet, Information Retrieval and Syntactic Dependencies) within a Random Forest learning model. We expanded this system with an Align and Differentiate based strategy, word embeddings and penalization, which showed 6.8% of improvement on the development set. However, we found that on the evaluation data for the 2016 STS shared task, the 2015 system outperformed our newer systems.

joint conference on lexical and computational semantics | 2013