Silvia Bernardini | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Silvia Bernardini is active.

Explore More

Publication

Featured researches published by Silvia Bernardini.

language resources and evaluation | 2009

The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora

Marco Baroni; Silvia Bernardini; Adriano Ferraresi; Eros Zanchetta

This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Italian built by web crawling, and describes the methodology and tools used in their construction. The corpora contain more than a billion words each, and are thus among the largest resources for the respective languages. The paper also provides an evaluation of their suitability for linguistic research, focusing on ukWaC and itWaC. A comparison in terms of lexical coverage with existing resources for the languages of interest produces encouraging results. Qualitative evaluation of ukWaC versus the British National Corpus was also conducted, so as to highlight differences in corpus composition (text types and subject matters). The article concludes with practical information about format and availability of corpora and tools.

Archive | 2013

Old Needs, New Solutions: Comparable Corpora for Language Professionals

Silvia Bernardini; Adriano Ferraresi

Use of corpora by language service providers and language professionals remains limited due to the existence of competing resources that are likely to be perceived as less demanding in terms of time and effort required to obtain and (learn to) use them (e.g. translation memory software, term bases and so forth). These resources however have limitations that could be compensated for through the integration of comparable corpora and corpus building tools in the translator’s toolkit. This chapter provides an overview of the ways in which different types of comparable corpora can be used in translation teaching and practice. First, two traditional corpus typologies are presented, namely small and specialized “handmade” corpora collected by end-users themselves for a specific task, and large and general “manufactured” corpora collected by expert teams and made available to end users. We suggest that striking a middleground between these two opposites is vital for professional uptake. To this end, we show how the BootCaT toolkit can be used to construct largish and relatively specialized comparable corpora for a specific translation task, and how, varying the search parameters in very simple ways, the size and usability of the corpora thus constructed can be further increased. The process is exemplified with reference to a simulated task (the translation of a patient information leaflet from English into Italian) and its efficacy is evaluated through an end-user questionnaire.

Encyclopedia of Language & Linguistics (Second Edition) | 2006

Machine Readable Corpora

Silvia Bernardini

This article reviews three areas at the interface between corpus linguistics and translation, namely, corpus-based approaches to (1) translation studies, (2) translator education, and (3) translation practice. With reference to (1), it surveys approaches, corpus typologies, and findings concerning translation strategies, norms/laws, and universals. Moving on to (2), it provides examples of possible uses of corpora (parallel, comparable, learner) in the translation classroom and points out that building a corpus may in itself be a valid exercise in text analysis and documentation. Finally, (3), it considers the role (still rather marginal, but growing) currently played by corpus tools in the translation professions.

language resources and evaluation | 2004