John Sharko
University of Massachusetts Lowell
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by John Sharko.
IEEE Transactions on Visualization and Computer Graphics | 2008
John Sharko; Georges G. Grinstein; Kenneth A. Marx
Radviz is a radial visualization with dimensions assigned to points called dimensional anchors (DAs) placed on the circumference of a circle. Records are assigned locations within the circle as a function of its relative attraction to each of the DAs. The DAs can be moved either interactively or algorithmically to reveal different meaningful patterns in the dataset. In this paper we describe Vectorized Radviz (VRV) which extends the number of dimensions through data flattening. We show how VRV increases the power of Radviz through these extra dimensions by enhancing the flexibility in the layout of the DAs. We apply VRV to the problem of analyzing the results of multiple clusterings of the same data set, called multiple cluster sets or cluster ensembles. We show how features of VRV help discern patterns across the multiple cluster sets. We use the Iris data set to explain VRV and a newt gene microarray data set used in studying limb regeneration to show its utility. We then discuss further applications of VRV.
Breast Journal | 2009
Elissa M. Ozanne; Andrea Loberg; Sherwood S. Hughes; Christine Lawrence; Brian Drohan; Alan Semine; Michael S. Jellinek; Claire Cronin; Frederick Milham; Dana Dowd; Caroline Block; Deborah Lockhart; John Sharko; Georges G. Grinstein; Kevin S. Hughes
Abstract: Despite advances in identifying genetic markers of high risk patients and the availability of genetic testing, it remains challenging to efficiently identify women who are at hereditary risk and to manage their care appropriately. HughesRiskApps, an open‐source family history collection, risk assessment, and Clinical Decision Support (CDS) software package, was developed to address the shortcomings in our ability to identify and treat the high risk population. This system is designed for use in primary care clinics, breast centers, and cancer risk clinics to collect family history and risk information and provide the necessary CDS to increase quality of care and efficiency. This paper reports on the first implementation of HughesRiskApps in the community hospital setting. HughesRiskApps was implemented at the Newton‐Wellesley Hospital. Between April 1, 2007 and March 31, 2008, 32,966 analyses were performed on 25,763 individuals. Within this population, 915 (3.6%) individuals were found to be eligible for risk assessment and possible genetic testing based on the 10% risk of mutation threshold. During the first year of implementation, physicians and patients have fully accepted the system, and 3.6% of patients assessed have been referred to risk assessment and consideration of genetic testing. These early results indicate that the number of patients identified for risk assessment has increased dramatically and that the care of these patients is more efficient and likely more effective.
Journal of Pathology Informatics | 2012
Julliette M. Buckley; Suzanne B. Coopey; John Sharko; Fernanda Polubriaginof; Brian Drohan; Ahmet K. Belli; Elizabeth Min Hui Kim; Judy Garber; Barbara L. Smith; Michele A. Gadd; Michelle C. Specht; Constance A. Roche; Thomas M. Gudewicz; Kevin S. Hughes
Objective: The opportunity to integrate clinical decision support systems into clinical practice is limited due to the lack of structured, machine readable data in the current format of the electronic health record. Natural language processing has been designed to convert free text into machine readable data. The aim of the current study was to ascertain the feasibility of using natural language processing to extract clinical information from >76,000 breast pathology reports. Approach and Procedure: Breast pathology reports from three institutions were analyzed using natural language processing software (Clearforest, Waltham, MA) to extract information on a variety of pathologic diagnoses of interest. Data tables were created from the extracted information according to date of surgery, side of surgery, and medical record number. The variety of ways in which each diagnosis could be represented was recorded, as a means of demonstrating the complexity of machine interpretation of free text. Results: There was widespread variation in how pathologists reported common pathologic diagnoses. We report, for example, 124 ways of saying invasive ductal carcinoma and 95 ways of saying invasive lobular carcinoma. There were >4000 ways of saying invasive ductal carcinoma was not present. Natural language processor sensitivity and specificity were 99.1% and 96.5% when compared to expert human coders. Conclusion: We have demonstrated how a large body of free text medical information such as seen in breast pathology reports, can be converted to a machine readable format using natural language processing, and described the inherent complexities of the task.
ieee international conference on information visualization | 2007
John Sharko; Georges G. Grinstein; Kenneth A. Marx; Jianping Zhou; Chia Ho Cheng; Shannon J. Odelberg; Hans Georg Simon
Since clustering algorithms are heuristic, multiple clustering algorithms applied to the same dataset will typically not generate the same sets of clusters. This is especially true for complex datasets such as those from microarray time series experiments. Two such microarray datasets describing gene expression activities from regenerating newt forelimbs at various times following limb amputation were used in this study. A cluster stability matrix, which shows the number of times two genes appear in the same cluster, was generated as a heat map. This was used to evaluate the overall variation among the clustering algorithms and to identify similar clusters. A comparison of the cluster stability matrices for two related microarray experiments with different levels of precision was shown to be an effective basis for comparing the quality of the two sets of experiments. A pairwise heat map was generated to show which pairs of clustering algorithms grouped the data into similar clusters.
2009 13th International Conference Information Visualisation | 2009
John Sharko; Georges G. Grinstein
Clustering techniques are heuristic processes that typically do not yield one optimal solution. Therefore it is common practice to generate multiple cluster sets and investigate the consistency of the record groupings. Fuzzy clustering essentially achieves the same objective by providing a measure of the extent to which each record belongs to each cluster i.e. the strength of association of each record to each cluster. This paper develops a visualization of fuzzy clustering using RadViz, compares it to a Vectorized RadViz visualization of the results of multiple cluster ensembles and then applies it to a microarray dataset.
bioinformatics and bioengineering | 2007
Kenneth A. Marx; John Sharko; Georges G. Grinstein; Shannon J. Odelberg; Hans Georg Simon
We have studied newt (N. viridescens) gene expression levels in ~ 1200 selected genes important in tissue regeneration at various times post-amputation at 6 different limb amputation sites. Here we provide analyses of microarray data that demonstrate a global gene expression correlation decrease on going from proximal to distal amputation sites of either limb or tail appendages. Also, the proximal (upper) forelimb and hindlimb regenerates have by far the most highly pairwise correlated gene expression levels of all sites. In contrast, the distal (lower) forelimb and hindlimb and tail regenerates reveal the least pairwise correlated gene expression levels. These data support the idea that limb loss at a proximal site produces afar more robust response as compared to a more distal site and requires a greater level of gene regulation to properly rebuild the lost structure.
Archive | 2010
Kevin S. Hughes; Mahmoud El-Tamer; Sherwood S. Hughes; Brian Drohan; John Sharko; Christine Lawrence; Andrea Loberg; Georges G. Grinstein
The electronic health record continues to mature as a component of the health care system. As these systems are increasingly adopted, they bring more widespread access to computer technology in the hospital and the office. An increasing number of specialty-specific applications are possible. These can be used to store data, facilitate decision making through the use of clinical algorithms, track various quality measures, and assess performance of a group or the individual.
Cancer Research | 2009
Elissa M. Ozanne; John Sharko; Brian Drohan; Georges G. Grinstein; Kevin S. Hughes
Abstract #3001 Purpose Pathology reports contain extensive research information that is inaccessible except through costly and time consuming chart reviews. This is due to the fact that pathology reports are recorded as semi-structured prose with critically important descriptive text intended for human interpretation. Key challenges for processing this data include interpreting multiple methods of describing the same finding, and subsequently aggregating the findings of multiple reports into episodes of care. Investigators tested NLP techniques in the processing of pathology reports into structured data and episodes of care, allowing for the rapid identification and epidemiologic modeling of high-risk breast lesions. Methods Using state-of-the-art NLP software (ClearForest, A Thomson Reuters Company, Waltham, MA), breast pathology reports stored as text files were processed into a structured electronic database using these steps: 1) identification of diagnosis of interest (i.e. high risk lesions, cancer), 2) use of NLP to identify all terms and phrases used to report each finding (e.g. atypical hyperplasia, hyperplasia with atypia), 3) grouping of relevant terms into categories, 4) identification of categories occurring in each patient report, and 5) grouping of patient reports into episodes of care (defined as all reports within 6 months of an initial diagnosis). Results Under IRB approval, 27,931 breast pathology reports from Massachusetts General Hospital in 16,208 patients seen between 1990-2007 were analyzed. The results were compared against manually reviewed pathology reports for quality control. For DCIS diagnoses, the initial error rate for both the NLP process and the manual process was 2%. The NLP process was then re-tuned using the identified discrepancies which reduced the error rate to zero. Using the refined model, we identified 1) patients with atypical lesions (atypical ductal hyperplasia (ADH), severe ADH, atypical lobular hyperplasia (ALH), and lobular carcinoma in situ (LCIS)) without prior or concurrent cancer, and 2) patients who developed cancer greater than 6 months post diagnosis. Conclusion This process successfully identified high-risk diagnoses that were otherwise relatively inaccessible, and appears to match the accuracy of a human research associate. The results of this first implementation are promising and will be further validated over time. In the future, this approach can be applied to other medical reports and diseases. NLP has significant potential to decrease the cost of research and for improving patient care. Citation Information: Cancer Res 2009;69(2 Suppl):Abstract nr 3001.
Breast Cancer Research and Treatment | 2012
Suzanne B. Coopey; Emanuele Mazzola; Julliette M. Buckley; John Sharko; Ahmet K. Belli; Elizabeth Min Hui Kim; Fernanda Polubriaginof; Giovanni Parmigiani; Judy Garber; Barbara L. Smith; Michele A. Gadd; Michelle C. Specht; Anthony J. Guidi; Constance A. Roche; Kevin S. Hughes
Archive | 2005
Curt Rawley; Georges G. Grinstein; Alex Bauman; Jon Victorine; Abdulrahmane Bezzati; Vivek Gupta; John Sharko; Paul Bubert