Chris Hokamp | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chris Hokamp is active.

Explore More

Publication

Featured researches published by Chris Hokamp.

international conference on semantic systems | 2013

Improving efficiency and accuracy in multilingual entity extraction

Joachim Daiber; Max Jakob; Chris Hokamp; Pablo N. Mendes

There has recently been an increased interest in named entity recognition and disambiguation systems at major conferences such as WWW, SIGIR, ACL, KDD, etc. However, most work has focused on algorithms and evaluations, leaving little space for implementation details. In this paper, we discuss some implementation and data processing challenges we encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure. We compare our solution to the previous system, considering time performance, space requirements and accuracy in the context of the Dutch and English languages. Additionally, we report results for 9 additional languages among the largest Wikipedias. Finally, we present challenges and experiences to foment the discussion with other developers interested in recognition and disambiguation of entities in natural language text.

workshop on statistical machine translation | 2015

Findings of the 2015 Workshop on Statistical Machine Translation

Ondrej Bojar; Rajen Chatterjee; Christian Federmann; Barry Haddow; Matthias Huck; Chris Hokamp; Philipp Koehn; Varvara Logacheva; Christof Monz; Matteo Negri; Matt Post; Carolina Scarton; Lucia Specia; Marco Turchi

This paper presents the results of the WMT15 shared tasks, which included a standard news translation task, a metrics task, a tuning task, a task for run-time estimation of machine translation quality, and an automatic post-editing task. This year, 68 machine translation systems from 24 institutions were submitted to the ten translation directions in the standard translation task. An additional 7 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had three subtasks, with a total of 10 teams, submitting 34 entries. The pilot automatic postediting task had a total of 4 teams, submitting 7 entries.

Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016

Unbabel's Participation in the WMT16 Word-Level Translation Quality Estimation Shared Task.

André F. T. Martins; Ramón Fernández Astudillo; Chris Hokamp; Fabio Kepler

This paper presents the contribution of the Unbabel team to the WMT 2016 Shared Task on Word-Level Translation Quality Estimation. We describe our two submitted systems: (i) UNBABELLINEAR, a feature-rich sequential linear model with syntactic features, and (ii) UNBABEL-ENSEMBLE, a stacked combination of the linear system with three different deep neural networks, mixing feedforward, convolutional, and recurrent layers. Our systems achieved F OK 1 × F BAD 1 scores of 46.29% and 49.52%, respectively, which were the two highest scores in the challenge.

workshop on statistical machine translation | 2015

Data enhancement and selection strategies for the word-level Quality Estimation

Varvara Logacheva; Chris Hokamp; Lucia Specia

This paper describes the DCU-SHEFF word-level Quality Estimation (QE) system submitted to the QE shared task at WMT15. Starting from a baseline set of features and a CRF algorithm to learn a sequence tagging model, we propose improvements in two ways: (i) by filtering out the training sentences containing too few errors, and (ii) by adding incomplete sequences to the training data to enrich the model with new information. We also experiment with considering the task as a classification problem, and report results using a subset of the features with Random Forest classifiers.

workshop on statistical machine translation | 2014

Target-Centric Features for Translation Quality Estimation

Chris Hokamp; Iacer Calixto; Joachim Wagner; Jian Zhang

We describe the DCU-MIXED and DCUSVR submissions to the WMT-14 Quality Estimation task 1.1, predicting sentencelevel perceived post-editing effort. Feature design focuses on target-side features as we hypothesise that the source side has little effect on the quality of human translations, which are included in task 1.1 of this year’s WMT Quality Estimation shared task. We experiment with features of the QuEst framework, features of our past work, and three novel feature sets. Despite these efforts, our two systems perform poorly in the competition. Follow up experiments indicate that the poor performance is due to improperly optimised parameters.

north american chapter of the association for computational linguistics | 2015

Using Word Semantics To Assist English as a Second Language Learners

Mahmoud Azab; Chris Hokamp; Rada Mihalcea

We introduce an interactive interface that aims to help English as a Second Language (ESL) students overcome language related hindrances while reading a text. The interface allows the user to find supplementary information on selected difficult words. The interface is empowered by our lexical substitution engine that provides context-based synonyms for difficult words. We also provide a practical solution for a real-world usage scenario. We demonstrate using the lexical substitution engine ‐ as a browser extension that can annotate and disambiguate difficult words on any webpage.

north american chapter of the association for computational linguistics | 2015

DCU: Using Distributional Semantics and Domain Adaptation for the Semantic Textual Similarity SemEval-2015 Task 2

Piyush Arora; Chris Hokamp; Jennifer Foster; Gareth J. F. Jones

We describe the work carried out by the DCU team on the Semantic Textual Similarity task at SemEval-2015. We learn a regression model to predict a semantic similarity score between a sentence pair. Our system exploits distributional semantics in combination with tried-and-tested features from previous tasks in order to compute sentence similarity. Our team submitted 3 runs for each of the five English test sets. For two of the test sets, belief and headlines, our best system ranked second and fourth out of the 73 submitted systems. Our best submission averaged over all test sets ranked 26 out of the 73 systems.

north american chapter of the association for computational linguistics | 2016

DCU-SEManiacs at SemEval-2016 Task 1: Synthetic Paragram Embeddings for Semantic Textual Similarity.

Chris Hokamp; Piyush Arora

We experiment with learning word representations designed to be combined into sentence level semantic representations, using an objective function which does not directly make use of the supervised scores provided with the training data, instead opting for a simpler objective which encourages similar phrases to be close together in the embedding space. This simple objective lets us start with high quality embeddings trained using the Paraphrase Database (PPDB) (Wieting et al., 2015; Ganitkevitch et al., 2013), and then tune these embeddings using the official STS task training data, as well as synthetic paraphrases for each test dataset, obtained by pivoting through machine translation. Our submissions include runs which only compare the similarity of phrases in the embedding space, directly using the similarity score to produce predictions, as well as a run which uses vector similarity in addition to a suite of features we investigated for our 2015 Semeval submission. For the crosslingual task, we simply translate the Spanish sentences to English, and use the same system we designed for the monolingual task.

recent advances in natural language processing | 2013