Chris Hokamp
Dublin City University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chris Hokamp.
international conference on semantic systems | 2013
Joachim Daiber; Max Jakob; Chris Hokamp; Pablo N. Mendes
There has recently been an increased interest in named entity recognition and disambiguation systems at major conferences such as WWW, SIGIR, ACL, KDD, etc. However, most work has focused on algorithms and evaluations, leaving little space for implementation details. In this paper, we discuss some implementation and data processing challenges we encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure. We compare our solution to the previous system, considering time performance, space requirements and accuracy in the context of the Dutch and English languages. Additionally, we report results for 9 additional languages among the largest Wikipedias. Finally, we present challenges and experiences to foment the discussion with other developers interested in recognition and disambiguation of entities in natural language text.
workshop on statistical machine translation | 2015
Ondrej Bojar; Rajen Chatterjee; Christian Federmann; Barry Haddow; Matthias Huck; Chris Hokamp; Philipp Koehn; Varvara Logacheva; Christof Monz; Matteo Negri; Matt Post; Carolina Scarton; Lucia Specia; Marco Turchi
This paper presents the results of the WMT15 shared tasks, which included a standard news translation task, a metrics task, a tuning task, a task for run-time estimation of machine translation quality, and an automatic post-editing task. This year, 68 machine translation systems from 24 institutions were submitted to the ten translation directions in the standard translation task. An additional 7 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had three subtasks, with a total of 10 teams, submitting 34 entries. The pilot automatic postediting task had a total of 4 teams, submitting 7 entries.
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016
André F. T. Martins; Ramón Fernández Astudillo; Chris Hokamp; Fabio Kepler
This paper presents the contribution of the Unbabel team to the WMT 2016 Shared Task on Word-Level Translation Quality Estimation. We describe our two submitted systems: (i) UNBABELLINEAR, a feature-rich sequential linear model with syntactic features, and (ii) UNBABEL-ENSEMBLE, a stacked combination of the linear system with three different deep neural networks, mixing feedforward, convolutional, and recurrent layers. Our systems achieved F OK 1 × F BAD 1 scores of 46.29% and 49.52%, respectively, which were the two highest scores in the challenge.
workshop on statistical machine translation | 2015
Varvara Logacheva; Chris Hokamp; Lucia Specia
This paper describes the DCU-SHEFF word-level Quality Estimation (QE) system submitted to the QE shared task at WMT15. Starting from a baseline set of features and a CRF algorithm to learn a sequence tagging model, we propose improvements in two ways: (i) by filtering out the training sentences containing too few errors, and (ii) by adding incomplete sequences to the training data to enrich the model with new information. We also experiment with considering the task as a classification problem, and report results using a subset of the features with Random Forest classifiers.
workshop on statistical machine translation | 2014
Chris Hokamp; Iacer Calixto; Joachim Wagner; Jian Zhang
We describe the DCU-MIXED and DCUSVR submissions to the WMT-14 Quality Estimation task 1.1, predicting sentencelevel perceived post-editing effort. Feature design focuses on target-side features as we hypothesise that the source side has little effect on the quality of human translations, which are included in task 1.1 of this year’s WMT Quality Estimation shared task. We experiment with features of the QuEst framework, features of our past work, and three novel feature sets. Despite these efforts, our two systems perform poorly in the competition. Follow up experiments indicate that the poor performance is due to improperly optimised parameters.
north american chapter of the association for computational linguistics | 2015
Mahmoud Azab; Chris Hokamp; Rada Mihalcea
We introduce an interactive interface that aims to help English as a Second Language (ESL) students overcome language related hindrances while reading a text. The interface allows the user to find supplementary information on selected difficult words. The interface is empowered by our lexical substitution engine that provides context-based synonyms for difficult words. We also provide a practical solution for a real-world usage scenario. We demonstrate using the lexical substitution engine ‐ as a browser extension that can annotate and disambiguate difficult words on any webpage.
north american chapter of the association for computational linguistics | 2015
Piyush Arora; Chris Hokamp; Jennifer Foster; Gareth J. F. Jones
We describe the work carried out by the DCU team on the Semantic Textual Similarity task at SemEval-2015. We learn a regression model to predict a semantic similarity score between a sentence pair. Our system exploits distributional semantics in combination with tried-and-tested features from previous tasks in order to compute sentence similarity. Our team submitted 3 runs for each of the five English test sets. For two of the test sets, belief and headlines, our best system ranked second and fourth out of the 73 submitted systems. Our best submission averaged over all test sets ranked 26 out of the 73 systems.
north american chapter of the association for computational linguistics | 2016
Chris Hokamp; Piyush Arora
We experiment with learning word representations designed to be combined into sentence level semantic representations, using an objective function which does not directly make use of the supervised scores provided with the training data, instead opting for a simpler objective which encourages similar phrases to be close together in the embedding space. This simple objective lets us start with high quality embeddings trained using the Paraphrase Database (PPDB) (Wieting et al., 2015; Ganitkevitch et al., 2013), and then tune these embeddings using the official STS task training data, as well as synthetic paraphrases for each test dataset, obtained by pivoting through machine translation. Our submissions include runs which only compare the similarity of phrases in the embedding space, directly using the similarity score to produce predictions, as well as a run which uses vector similarity in addition to a suite of features we investigated for our 2015 Semeval submission. For the crosslingual task, we simply translate the Spanish sentences to English, and use the same system we designed for the monolingual task.
recent advances in natural language processing | 2013
Bharath Dandala; Chris Hokamp; Rada Mihalcea; Razvan C. Bunescu
Transactions of the Association for Computational Linguistics | 2017
André F. T. Martins; Marcin Junczys-Dowmunt; Fabio Kepler; Ramón Fernández Astudillo; Chris Hokamp; Roman Grundkiewicz