Cristina Soguero-Ruiz
King Juan Carlos University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Cristina Soguero-Ruiz.
acm symposium on computing and development | 2013
Vanessa Frias-Martinez; Cristina Soguero-Ruiz; Enrique Frias-Martinez; Malvina Josephidou
National Statistical Institutes typically hire large numbers of enumerators to carry out periodic surveys regarding the socioeconomic status of a society. Such approach suffers from two drawbacks:(i) the survey process is expensive, especially for emerging countries that struggle with their budgets and (ii) the socioeconomic indicators are computed ex-post i.e., after socioeconomic changes have already happened. We propose the use of human behavioral patterns computed from calling records to predict future values of socioeconomic indicators. Our objective is to help institutions be able to forecast socioeconomic changes before they happen while reducing the number of surveys they need to compute. For that purpose, we explore a battery of different predictive approaches for time series and show that multivariate time-series models yield R-square values of up to 0.65 for certain socioeconomic indicators.
Journal of Biomedical Informatics | 2016
Cristina Soguero-Ruiz; Kristian Hindberg; Inmaculada Mora-Jiménez; José Luis Rojo-Álvarez; Stein Olav Skrøvseth; Fred Godtliebsen; Kim Erlend Mortensen; Arthur Revhaug; Rolv-Ole Lindsetmo; Knut Magne Augestad; Robert Jenssen
OBJECTIVE In this work, we have developed a learning system capable of exploiting information conveyed by longitudinal Electronic Health Records (EHRs) for the prediction of a common postoperative complication, Anastomosis Leakage (AL), in a data-driven way and by fusing temporal population data from different and heterogeneous sources in the EHRs. MATERIAL AND METHODS We used linear and non-linear kernel methods individually for each data source, and leveraging the powerful multiple kernels for their effective combination. To validate the system, we used data from the EHR of the gastrointestinal department at a university hospital. RESULTS We first investigated the early prediction performance from each data source separately, by computing Area Under the Curve values for processed free text (0.83), blood tests (0.74), and vital signs (0.65), respectively. When exploiting the heterogeneous data sources combined using the composite kernel framework, the prediction capabilities increased considerably (0.92). Finally, posterior probabilities were evaluated for risk assessment of patients as an aid for clinicians to raise alertness at an early stage, in order to act promptly for avoiding AL complications. DISCUSSION Machine-learning statistical model from EHR data can be useful to predict surgical complications. The combination of EHR extracted free text, blood samples values, and patient vital signs, improves the model performance. These results can be used as a framework for preoperative clinical decision support.
IEEE Transactions on Biomedical Engineering | 2013
Cristina Soguero-Ruiz; Luis Lechuga-Suarez; Inmaculada Mora-Jiménez; Javier Ramos-López; Óscar Barquero-Pérez; Arcadi García-Alberola; José Luis Rojo-Álvarez
Electronic health record (EHR) automates the clinician workflow, allowing evidence-based decision support and quality management. We aimed to start a framework for domain standardization of cardiovascular risk stratification into the EHR, including risk indices whose calculation involves ECG signal processing. We propose the use of biomedical ontologies completely based on the conceptual model of SNOMED-CT, which allows us to implement our domain in the EHR. In this setting, the present study focused on the heart rate turbulence (HRT) domain, according to its concise guidelines and clear procedures for parameter calculations. We used 289 concepts from SNOMED-CT, and generated 19 local extensions (new concepts) for the HRT specific concepts not present in the current version of SNOMED-CT. New concepts included averaged and individual ventricular premature complex tachograms, initial sinus acceleration for turbulence onset, or sinusal oscillation for turbulence slope. Two representative use studies were implemented: first, a prototype was inserted in the hospital information system for supporting HRT recordings and their simple follow up by medical societies; second, an advanced support for a prospective scientific research, involving standard and emergent signal processing algorithms in the HRT indices, was generated and then tested in an example database of 27 Holter patients. Concepts of the proposed HRT ontology are publicly available through a terminology server, hence their use in any information system will be straightforward due to the interoperability provided by SNOMED-CT.
Scientific Reports | 2017
Kasper Jensen; Cristina Soguero-Ruiz; Karl Øyvind Mikalsen; Rolv-Ole Lindsetmo; Irene Kouskoumvekaki; Mark A. Girolami; Stein Olav Skrøvseth; Knut Magne Augestad
With an aging patient population and increasing complexity in patient disease trajectories, physicians are often met with complex patient histories from which clinical decisions must be made. Due to the increasing rate of adverse events and hospitals facing financial penalties for readmission, there has never been a greater need to enforce evidence-led medical decision-making using available health care data. In the present work, we studied a cohort of 7,741 patients, of whom 4,080 were diagnosed with cancer, surgically treated at a University Hospital in the years 2004–2012. We have developed a methodology that allows disease trajectories of the cancer patients to be estimated from free text in electronic health records (EHRs). By using these disease trajectories, we predict 80% of patient events ahead in time. By control of confounders from 8326 quantified events, we identified 557 events that constitute high subsequent risks (risk > 20%), including six events for cancer and seven events for metastasis. We believe that the presented methodology and findings could be used to improve clinical decision support and personalize trajectories, thereby decreasing adverse events and optimizing cancer treatment.
Pattern Recognition | 2018
Karl Øyvind Mikalsen; Filippo Maria Bianchi; Cristina Soguero-Ruiz; Robert Jenssen
Abstract Similarity-based approaches represent a promising direction for time series analysis. However, many such methods rely on parameter tuning, and some have shortcomings if the time series are multivariate (MTS), due to dependencies between attributes, or the time series contain missing data. In this paper, we address these challenges within the powerful context of kernel methods by proposing the robust time series cluster kernel (TCK). The approach taken leverages the missing data handling properties of Gaussian mixture models (GMM) augmented with informative prior distributions. An ensemble learning approach is exploited to ensure robustness to parameters by combining the clustering results of many GMM to form the final kernel. We evaluate the TCK on synthetic and real data and compare to other state-of-the-art techniques. The experimental results demonstrate that the TCK is robust to parameter choices, provides competitive results for MTS without missing data and outstanding results for missing data.
Expert Systems With Applications | 2012
Cristina Soguero-Ruiz; Francisco Javier Gimeno-Blanes; Inmaculada Mora-Jiménez; María Pilar Martínez-Ruiz; José Luis Rojo-Álvarez
Sales promotions have become in recent years a paramount issue in the marketing strategies of many companies, and they have even more relevance in the present economic situation. Currently, the empirical models, aimed at assessing consumers behavior in response to certain sales promotions activities such as temporary price reductions, are receiving growing attention in this relevant research field, due to two reasons mainly: (1) the complexity of the interactions among the different elements incorporated inside promotions campaigns attracts growing attention; and (2) the increased availability of electronic records on sales history. Hence, it will become important that the performance description and comparison among all available machine learning promotion models, as well as their design parameters selection, will be performed using a robust and statistically rigorous procedure, while keeping functionality and usefulness. In this paper, we first propose a simple nonparametric statistical tool, based on the paired bootstrap resampling, to allow an operative result comparison among different learning-from-samples promotional models. Secondly, we use the bootstrap statistical description to evaluate the models in terms of average and scatter measurements, for a more complete efficiency characterization of the promotional sales models. These statistical characterizations allow us to readily work with the distribution of the actual risk, in order to avoid overoptimistic performance evaluation in the machine learning based models. We also present the analysis performed to determinate whether the figure of merit has a significant impact on final result, together with an in depth design parameter selection to optimize final results during the promotion evaluation using statistical learning techniques. No significant difference was obtained in terms of figure of merit choice, and Mean Absolute Error was selected for performance measurement. As a summary, the applied technique allows clarifying the design of the promotional sales models for a real database (milk category), according to the influence of the figure of merit used for design parameters selection, showing the robustness of the machine learning techniques in this setting. Results obtained in this paper will be subsequently applied, and presented in the companion paper, devoted to a more detailed quality analysis, to evaluate four well-known machine learning algorithms in real databases for two categories with different promotional behavior.
4th International Conference on Physiological Computing Systems | 2017
Javier Fernández-Sánchez; Cristina Soguero-Ruiz; Pablo de Miguel-Bohoyo; Francisco Javier Rivas-Flores; Ángel Gómez-Delgado; Francisco Javier Gutiérrez-Expósito; Inmaculada Mora-Jiménez
Hypertension is a chronic condition that has a considerable prevalence in the elderly. Furthermore, hypertensive patients double cost of normotensive individuals. The budget reduction and the increasing concern about the sustainability of the healthcare system have caused that improving the efficiency and use of resources are a priority in developed countries. Identification of chronic hypertensive patients, i.e., patients with high blood pressure, can be performed by means of population classification systems such as Clinical Risk Groups (CRGs). CRGs classify individuals in health status categories taking both demographic and clinical information of the encounters that individuals have with the healthcare system during a defined period of time. In this work, we determine the characteristic profile and the evolution of diagnosis codes according to the International Classification of Diseases 9th revision, Clinical Modification (ICD9-CM), focusing on healthy and chronic hypertensive patients at different chronic statuses (CRG). Our data correspond to the population associated to the University Hospital of Fuenlabrada (Madrid, Spain) during the year 2012, providing about 46000/16000 healthy/hypertensive individuals. We found that profiles associated to different health statuses have different patterns in terms of ICD-9 diagnosis codes. Furthermore, a prediction method is proposed to determine the health status of a new patient according to demographic (age and gender) and clinical (diagnosis codes) data. We conclude that gender is the less informative characteristic, though the combination of age and diagnosis codes have a great potential when they are non linearly combined.
Digital Signal Processing | 2014
Cristina Soguero-Ruiz; Francisco Javier Gimeno-Blanes; Inmaculada Mora-Jiménez; María Pilar Martínez-Ruiz; José Luis Rojo-Álvarez
Abstract New economic conditions have led to innovations in retail industries, such as more dynamic retail approaches based on flexible strategies. We propose and compare different approaches incorporating nonlinear methods for promotional decision-making using retail aggregated data registered at the point of the sale. Specifically, this paper describes a reliable quantification tool as an effective information system leveraged on recent and historical data that provides managers with an operative vision. Furthermore, a new set of indicators are proposed to evaluate the reliability and stability of the data model in the multidimensional feature space by using nonparametric resampling techniques. This allows the user to make a clearer comparison among linear, nonlinear, static, and dynamic data models, and to identify the uncertainty of different feature space regions, for example, corresponding to the most frequent deal features. This methodology allows retailers to use aggregated data in suitable conditions that will result in acceptable confidence intervals. To test the proposed methodology, we used a database containing the sales history of representative products registered by a Spanish retail chain. The results indicate that: (1) the deal effect curve analysis and the time series linear model do not provide enough expressive capacity, and (2) nonlinear promotional models more accurately follow the actual sales pattern obtained in response to the implemented sales promotions. The quarterly temporal analysis conducted enabled the authors to identify long-term changes in the dynamics of the model for several products, especially during the early stage of most recent economic crisis, consistent with the information provided by the reliability indices in terms of the feature space. We conclude that the proposed method provides a reliable operative tool for decision support, allowing retailers to alter their strategies to accommodate consumer behavior.
Expert Systems With Applications | 2012
Cristina Soguero-Ruiz; Francisco Javier Gimeno-Blanes; Inmaculada Mora-Jiménez; María Pilar Martínez-Ruiz; José Luis Rojo-Álvarez
The assessment of promotional sales with models constructed by machine learning techniques is arousing interest due, among other reasons, to the current economic situation leading to a more complex environment of simultaneous and concurrent promotional activities. An operative model diagnosis procedure was previously proposed in the companion paper, which can be readily used both for agile decision making on the architecture and implementation details of the machine learning algorithms, and for differential benchmarking among models. In this paper, a detailed example of model analysis is presented for two representative databases with different promotional behaviour, namely, a non-seasonal category (milk) and a heavily seasonal category (beer). The performance of four well-known machine learning techniques with increasing complexity is analyzed in detail here. In particular, k-Nearest Neighbours, General Regression Neural Networks, Multilayer Perceptron (MLP), and Support Vector Machines (SVM), are differentially compared. Present paper evaluates these techniques along the experiments described for both categories when applying the methodological findings obtained in the companion paper. We conclude that some elements included in the architecture are not essential for a good performance of the machine learning promotional models, such as the semiparametric nature of the kernel in SVM models, whereas other can be strongly dependent of the database, such as the convenience of multiple output models in MLP regression schemes. Additionally, the specificity of the behaviour of certain categories and product ranges determines the need to establish suitable and specific procedures for a better prediction and feature extraction.
Diabetes and Metabolic Syndrome: Clinical Research and Reviews | 2018
Rafael Garcia-Carretero; Luis Vigil-Medina; Inmaculada Mora-Jiménez; Cristina Soguero-Ruiz; Rebeca Goya-Esteban; Javier Ramos-López; Óscar Barquero-Pérez
BACKGROUND The aim of our study was to determine whether prediabetes increases cardiovascular (CV) risk compared to the non-prediabetic patients in our hypertensive population. Once this was achieved, the objective was to identify relevant CV prognostic features among prediabetic individuals. METHODS We included hypertensive 1652 patients. The primary outcome was a composite of incident CV events: cardiovascular death, stroke, heart failure and myocardial infarction. We performed a Cox proportional hazard regression to assess the CV risk of prediabetic patients compared to non-prediabetic and to produce a survival model in the prediabetic cohort. RESULTS The risk of developing a CV event was higher in the prediabetic cohort than in the non-prediabetic cohort, with a hazard ratio (HR) = 1.61, 95% CI 1.01-2.54, p = 0.04. Our Cox proportional hazard model selected age (HR = 1.04, 95% CI 1.02-1.07, p < 0.001) and cystatin C (HR = 2.4, 95% CI 1.26-4.22, p = 0.01) as the most relevant prognostic features in our prediabetic patients. CONCLUSIONS Prediabetes was associated with an increased risk of CV events, when compared with the non-prediabetic patients. Age and cystatin C were found as significant risk factors for CV events in the prediabetic cohort.