Yindalon Aphinyanaphongs
New York University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yindalon Aphinyanaphongs.
association for information science and technology | 2014
Yindalon Aphinyanaphongs; Lawrence D. Fu; Zhiguo Li; Eric R. Peskin; Efstratios Efstathiadis; Constantin F. Aliferis; Alexander Statnikov
An important aspect to performing text categorization is selecting appropriate supervised classification and feature selection methods. A comprehensive benchmark is needed to inform best practices in this broad application field. Previous benchmarks have evaluated performance for a few supervised classification and feature selection methods and limited ways to optimize them. The present work updates prior benchmarks by increasing the number of classifiers and feature selection methods order of magnitude, including adding recently developed, state‐of‐the‐art methods. Specifically, this study used 229 text categorization data sets/tasks, and evaluated 28 classification methods (both well‐established and proprietary/commercial) and 19 feature selection methods according to 4 classification performance metrics. We report several key findings that will be helpful in establishing best methodological practices for text categorization.
Journal of Biomedical Informatics | 2011
Lawrence D. Fu; Yindalon Aphinyanaphongs; Lily Wang; Constantin F. Aliferis
Evaluating the biomedical literature and health-related websites for quality are challenging information retrieval tasks. Current commonly used methods include impact factor for journals, PubMeds clinical query filters and machine learning-based filter models for articles, and PageRank for websites. Previous work has focused on the average performance of these methods without considering the topic, and it is unknown how performance varies for specific topics or focused searches. Clinicians, researchers, and users should be aware when expected performance is not achieved for specific topics. The present work analyzes the behavior of these methods for a variety of topics. Impact factor, clinical query filters, and PageRank vary widely across different topics while a topic-specific impact factor and machine learning-based filter models are more stable. The results demonstrate that a method may perform excellently on average but struggle when used on a number of narrower topics. Topic-adjusted metrics and other topic robust methods have an advantage in such situations. Users of traditional topic-sensitive metrics should be aware of their limitations.
world congress on medical and health informatics, medinfo | 2013
Yindalon Aphinyanaphongs; Lawrence D. Fu; Constantin F. Aliferis
Building machine learning models that identify unproven cancer treatments on the Health Web is a promising approach for dealing with the dissemination of false and dangerous information to vulnerable health consumers. Aside from the obvious requirement of accuracy, two issues are of practical importance in deploying these models in real world applications. (a) Generalizability: The models must generalize to all treatments (not just the ones used in the training of the models). (b) Scalability: The models can be applied efficiently to billions of documents on the Health Web. First, we provide methods and related empirical data demonstrating strong accuracy and generalizability. Second, by combining the MapReduce distributed architecture and high dimensionality compression via Markov Boundary feature selection, we show how to scale the application of the models to WWW-scale corpora. The present work provides evidence that (a) a very small subset of unproven cancer treatments is sufficient to build a model to identify unproven treatments on the web; (b) unproven treatments use distinct language to market their claims and this language is learnable; (c) through distributed parallelization and state of the art feature selection, it is possible to prepare the corpora and build and apply models with large scalability.
Clinical and Translational Science | 2014
Claudia S. Plottel; Yindalon Aphinyanaphongs; Yongzhao Shao; Keith J. Micoli; Yixin Fang; Judith D. Goldberg; Claudia R. Galeano; Jessica H. Stangel; Deborah Chavis-Keeling; Judith S. Hochman; Bruce N. Cronstein; Michael H. Pillinger
Senior housestaff and junior faculty are often expected to perform clinical research, yet may not always have the requisite knowledge and skills to do so successfully. Formal degree programs provide such knowledge, but require a significant commitment of time and money. Short‐term training programs (days to weeks) provide alternative ways to accrue essential information and acquire fundamental methodological skills. Unfortunately, published information about short‐term programs is sparse. To encourage discussion and exchange of ideas regarding such programs, we here share our experience developing and implementing INtensive Training in Research Statistics, Ethics, and Protocol Informatics and Design (INTREPID), a 24‐day immersion training program in clinical research methodologies. Designing, planning, and offering INTREPID was feasible, and required significant faculty commitment, support personnel and infrastructure, as well as committed trainees.
Academic Radiology | 2016
Andrew B. Rosenkrantz; Ankur M. Doshi; Luke A. Ginocchio; Yindalon Aphinyanaphongs
RATIONALE AND OBJECTIVES This study aimed to assess the performance of a text classification machine-learning model in predicting highly cited articles within the recent radiological literature and to identify the models most influential article features. MATERIALS AND METHODS We downloaded from PubMed the title, abstract, and medical subject heading terms for 10,065 articles published in 25 general radiology journals in 2012 and 2013. Three machine-learning models were applied to predict the top 10% of included articles in terms of the number of citations to the article in 2014 (reflecting the 2-year time window in conventional impact factor calculations). The model having the highest area under the curve was selected to derive a list of article features (words) predicting high citation volume, which was iteratively reduced to identify the smallest possible core feature list maintaining predictive power. Overall themes were qualitatively assigned to the core features. RESULTS The regularized logistic regression (Bayesian binary regression) model had highest performance, achieving an area under the curve of 0.814 in predicting articles in the top 10% of citation volume. We reduced the initial 14,083 features to 210 features that maintain predictivity. These features corresponded with topics relating to various imaging techniques (eg, diffusion-weighted magnetic resonance imaging, hyperpolarized magnetic resonance imaging, dual-energy computed tomography, computed tomography reconstruction algorithms, tomosynthesis, elastography, and computer-aided diagnosis), particular pathologies (prostate cancer; thyroid nodules; hepatic adenoma, hepatocellular carcinoma, non-alcoholic fatty liver disease), and other topics (radiation dose, electroporation, education, general oncology, gadolinium, statistics). CONCLUSIONS Machine learning can be successfully applied to create specific feature-based models for predicting articles likely to achieve high influence within the radiological literature.
American Journal on Addictions | 2017
Babak Tofighi; Frank Grazioli; Sewit Bereket; Ellie Grossman; Yindalon Aphinyanaphongs; Joshua David Lee
BACKGROUND AND OBJECTIVES Missed visits are common in office-based buprenorphine treatment (OBOT). The feasibility of text message (TM) appointment reminders among OBOT patients is unknown. METHODS This 6-month prospective cohort study provided TM reminders to OBOT program patients (N = 93). A feasibility survey was completed following delivery of TM reminders and at 6 months. RESULTS Respondents reported that the reminders should be provided to all OBOT patients (100%) and helped them to adhere to their scheduled appointment (97%). At 6 months, there were no reports of intrusion to their privacy or disruption of daily activities due to the TM reminders. Most participants reported that the TM reminders were helpful in adhering to scheduled appointments (95%), that the reminders should be offered to all clinic patients (95%), and favored receiving only TM reminders rather than telephone reminders (95%). Barriers to adhering to scheduled appointment times included transportation difficulties (34%), not being able to take time off from school or work (31%), long clinic wait-times (9%), being hospitalized or sick (8%), feeling sad or depressed (6%), and child care (6%). CONCLUSIONS This study demonstrated the acceptability and feasibility of TM appointment reminders in OBOT. Older age and longer duration in buprenorphine treatment did not diminish interest in receiving the TM intervention. Although OBOT patients expressed concern regarding the privacy of TM content sent from their providers, privacy issues were uncommon among this cohort. Scientific Significance Findings from this study highlighted patient barriers to adherence to scheduled appointments. These barriers included transportation difficulties (34%), not being able to take time off from school or work (31%), long clinic lines (9%), and other factors that may confound the effect of future TM appointment reminder interventions. Further research is also required to assess 1) the level of system changes required to integrate TM appointment reminder tools with already existing electronic medical records and appointment records software; 2) acceptability among clinicians and administrators; and 3) financial and resource constraints to healthcare systems. (Am J Addict 2017;26:581-586).
Seminars in Musculoskeletal Radiology | 2017
Yindalon Aphinyanaphongs
&NA; This article reviews examples of big data analyses in health care with a focus on radiology. We review the defining characteristics of big data, the use of natural language processing, traditional and novel data sources, and large clinical data repositories available for research. This article aims to invoke novel research ideas through a combination of examples of analyses and domain knowledge.
ICHI '15 Proceedings of the 2015 International Conference on Healthcare Informatics | 2015
Bisakha Ray; Yindalon Aphinyanaphongs; Sean P. Heffron
A lack of recruitment of appropriate subjects plagues most clinical research trials. One barrier is an efficient way to identify eligible subjects. Researchers worked to harness computing power to improve automated identification of potential subjects for clinical trials with modest success. We use text classification to automatically identify patients for a hypothetical Acute Coronary Syndrome clinical research study from intensive care unit discharge summaries. We apply several state of the art classification methods including Bayesian Logistic Regression, AdaBoost, Support Vector Machines, and Random Forests to build models from administrative manually assigned ICD-9 codes. We then apply these models to discharge summaries labeled by a board certified cardiologist for patients eligible for the hypothetical research study. The best models perform with 0.95 area under the ROC curve for identifying eligible patients. This pilot study suggests that text-based classification holds promise for identification of potential clinical trial subjects. Our methods require further validation in studies involving multiple inclusion and exclusion criteria.
Journal of Translational Medicine | 2016
Alisa Surkis; Janice A. Hogle; Deborah DiazGranados; Joe Hunt; Paul E. Mazmanian; Emily Connors; Kate Westaby; Elizabeth C. Whipple; Trisha Adamus; Meridith Mueller; Yindalon Aphinyanaphongs
Gynecologic Oncology | 2016
J. Lee; Yindalon Aphinyanaphongs; John P. Curtin; Jing-Yi Chern; Melissa K. Frey; Leslie R. Boyd