Zhuoran Wang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhuoran Wang is active.

Explore More

Publication

Featured researches published by Zhuoran Wang.

PLOS ONE | 2012

Extracting diagnoses and investigation results from unstructured text in electronic health records by semi-supervised machine learning.

Zhuoran Wang; Anoop Dinesh Shah; A Rosemary Tate; Spiros Denaxas; John Shawe-Taylor; Harry Hemingway

Background Electronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is time-consuming to review manually. Aim To develop an algorithm to identify relevant free texts automatically based on labelled examples. Methods We developed a novel machine learning algorithm, the ‘Semi-supervised Set Covering Machine’ (S3CM), and tested its ability to detect the presence of coronary angiogram results and ovarian cancer diagnoses in free text in the General Practice Research Database. For training the algorithm, we used texts classified as positive and negative according to their associated Read diagnostic codes, rather than by manual annotation. We evaluated the precision (positive predictive value) and recall (sensitivity) of S3CM in classifying unlabelled texts against the gold standard of manual review. We compared the performance of S3CM with the Transductive Vector Support Machine (TVSM), the original fully-supervised Set Covering Machine (SCM) and our ‘Freetext Matching Algorithm’ natural language processor. Results Only 60% of texts with Read codes for angiogram actually contained angiogram results. However, the S3CM algorithm achieved 87% recall with 64% precision on detecting coronary angiogram results, outperforming the fully-supervised SCM (recall 78%, precision 60%) and TSVM (recall 2%, precision 3%). For ovarian cancer diagnoses, S3CM had higher recall than the other algorithms tested (86%). The Freetext Matching Algorithm had better precision than S3CM (85% versus 74%) but lower recall (62%). Conclusions Our novel S3CM machine learning algorithm effectively detected free texts in primary care records associated with angiogram results and ovarian cancer diagnoses, after training on pre-classified test sets. It should be easy to adapt to other disease areas as it does not rely on linguistic rules, but needs further testing in other electronic health record datasets.

north american chapter of the association for computational linguistics | 2007

Kernel Regression Based Machine Translation

Zhuoran Wang; John Shawe-Taylor; Sandor Szedmak

We present a novel machine translation framework based on kernel regression techniques. In our model, the translation task is viewed as a string-to-string mapping, for which a regression type learning is employed with both the source and the target sentences embedded into their kernel induced feature spaces. We report the experiments on a French-English translation task showing encouraging results.

workshop on statistical machine translation | 2008

Kernel Regression Framework for Machine Translation: UCL System Description for WMT 2008 Shared Translation Task

Zhuoran Wang; John Shawe-Taylor

The novel kernel regression model for SMT only demonstrated encouraging results on small-scale toy data sets in previous works due to the complexities of kernel methods. It is the first time results based on the real-world data from the shared translation task will be reported at ACL 2008 Workshop on Statistical Machine Translation. This paper presents the key modules of our system, including the kernel ridge regression model, retrieval-based sparse approximation, the decoding algorithm, as well as language modeling issues under this framework.

Machine Translation | 2010

A kernel regression framework for SMT

Zhuoran Wang; John Shawe-Taylor

This paper presents a novel regression framework to model both the translational equivalence problem and the parameter estimation problem in statistical machine translation (SMT). The proposed method kernelizes the training process by formulating the translation problem as a linear mapping among source and target word chunks (word n-grams of various length), which yields a regression problem with vector outputs. A kernel ridge regression model and a one-class classifier called maximum margin regression are explored for comparison, between which the former is proved to perform better in this task. The experimental results conceptually demonstrate its advantages of handling very high-dimensional features implicitly and flexibly. However, it shares the common drawback of kernel methods, i.e. the lack of scalability. For real-world application, a more practical solution based on locally linear regression hyperplane approximation is proposed by using online relevant training examples subsetting. In addition, we also introduce a novel way to integrate language models into this particular machine translation framework, which utilizes the language model as a penalty item in the objective function of the regression model, since its n-gram representation exactly matches the definition of our feature space.

bioinformatics and biomedicine | 2010

Semi-supervised feature learning from clinical text

Zhuoran Wang; John Shawe-Taylor; Anoop Dinesh Shah

This paper is focused on the automated identification of the clinical free-text records that contain useful information (e.g. symptoms, modifiers, diagnosis, etc) of a certain disease. We introduce a novel semi-supervised machine learning algorithm to address this problem, by training the set covering machine in a bootstrapping procedure. The advantage of the proposed technique is that not only can it find the documents of interest more accurately than searching based on diagnostic codes, the features it learned could also be directly used as a knowledge representation of the given topic and to assist either further machine learning algorithms or manual post-processing and analysis.

In: (Proceedings) Machine Translation Summit XII. : Ottawa, Canada. (2009) | 2009