Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lawrence Y. L. Cheung is active.

Publication


Featured researches published by Lawrence Y. L. Cheung.


Software - Practice and Experience | 2003

Bilingual legal document retrieval and management using XML

Robert W. P. Luk; Benjamin K. Tsou; Tom B. Y. Lai; Oi Yee Kwong; Francis C. Y. Chik; Lawrence Y. L. Cheung

In certain bilingual and multi‐lingual societies, translated legal documents are as important as the original legal documents because they have the same legal status as the originals. However, there is little reported work on the retrieval and management of bilingual legal documents. We describe the design and development of a bilingual document retrieval and management prototype, called ELDoS, which is used by court interpreters and judges from the Hong Kong Judiciary. Since the speed of retrieval is a major concern for user acceptance, and therefore for widespread deployment of the system, the architecture of the prototype is designed to balance the workload of the client and server. Extensible Markup Language (XML) is used to mark up the bilingual legal documents for a variety of document retrieval and management tasks. XML enables the use of XML Stylesheet Language Transformation (XSLT) to align bilingual data in the client, instead of the server, and improve alignment speed linearly with respect to the size of the document, using a high‐end PC, when the server has no concurrent access. The design of the interface was continually improved after extensive consultation with court interpreters and after the user acceptance tests. In our evaluation, the facilities for highlighting translated terms have a macro‐averaged precision of 90+% and a macro‐average recall of 80+%, which were considered acceptable by our users. We believe that the experience in the design and development of this prototype is applicable to other language pairs as well as to other domains. Copyright


international conference on computational linguistics | 2002

Some considerations on guidelines for bilingual alignment and terminology extraction

Lawrence Y. L. Cheung; Tom B. Y. Lai; Robert W. P. Luk; Oi Yee Kwong; King Kui Sin; Benjamin K. Tsou

Despite progress in the development of computational means, human input is still critical in the production of consistent and useable aligned corpora and term banks. This is especially true for specialized corpora and term banks whose end-users are often professionals with very stringent requirements for accuracy, consistency and coverage. In the compilation of a high quality Chinese-English legal glossary for ELDoS project, we have identified a number of issues that make the role human input critical for term alignment and extraction. They include the identification of low frequency terms, paraphrastic expressions, discontinuous units, and maintaining consistent term granularity, etc. Although manual intervention can more satisfactorily address these issues, steps must also be taken to address intra- and inter-annotator inconsistency.


international conference on computational linguistics | 2010

A machine learning parser using an unlexicalized distituent model

Samuel W. K. Chan; Lawrence Y. L. Cheung; Mickey W. C. Chong

Despite the popularity of lexicalized parsing models, practical concerns such as data sparseness and applicability to domains of different vocabularies make unlexicalized models that do not refer to word tokens themselves deserve more attention. A classifier-based parser using an unlexicalized parsing model has been developed. Most importantly, to enhance the accuracy of these tasks, we investigated the notion of distituency (the possibility that two parts of speech cannot remain in the same constituent or phrase) and incorporated it as attributes using various statistic measures. A machine learning method integrates linguistic attributes and information-theoretic attributes in two tasks, namely sentence chunking and phrase recognition. The parser was applied to parsing English and Chinese sentences in the Penn Treebank and the Tsinghua Chinese Treebank. It achieved a parsing performance of F-Score 80.3% in English and 82.4% in Chinese.


International Journal of Computer Processing of Languages | 2006

Court Stenography-To-Text ("STT") in Hong Kong: A Jurilinguistic Engineering Effort

Benjamin K. Tsou; Tom B. Y. Lai; King Kui Sin; Lawrence Y. L. Cheung

Implementation of legal bilingualism in Hong Kong after 1997 has necessitated the production of voluminous and extensive court proceedings and judgments in both Chinese and English. For the Chinese records, Cantonese, a dialect of Chinese, is the home language of more than 90% of the population in Hong Kong and is thus officially used in the courts. For the court proceedings, Cantonese speech would have to be recorded, and a Cantonese Computer-Aided Transcription system has been developed. The transcription system converts stenographic codes into Chinese text, i.e. from phonetic to orthographic representation of the language. The main challenge lies in the resolution of the severe ambiguity resulting from homocode problems in the conversion process. Cantonese Chinese is typified by problematic homonymy, which presents serious challenges. The N-gram statistical model is employed to estimate the most probable character string of the input transcription codes. Domain-specific corpora have been compiled to support the statistical computation. To improve accuracy, scalable techniques such as domain-specific transcription and special encoding are used. Put together, these techniques deliver 96% transcription accuracy.


international conference on computational linguistics | 2002

Alignment and extraction of bilingual legal terminology from context profiles

Oi Yee Kwong; Benjamin K. Tsou; Tom B. Y. Lai; Lawrence Y. L. Cheung; Francis C. Y. Chik; Robert W. P. Luk

In this study, we propose a knowledge-independent method for aligning terms and thus extracting translations from a small, domain-specific corpus consisting of parallel English and Chinese court judgments from Hong Kong. With a sentence-aligned corpus, translation equivalences are suggested by analysing the frequency profiles of parallel concordances. The method overcomes the limitations of conventional statistical methods which require large corpora to be effective, and lexical approaches which depend on existing bilingual dictionaries. Pilot testing on a parallel corpus of about 113K Chinese words and 120K English words gives an encouraging 85% precision and 45% recall. Future work includes fine-tuning the algorithm upon the analysis of the errors, and acquiring a translation lexicon for legal terminology by filtering out general terms.


international conference on computational linguistics | 2000

Jurilinguistic engineering in Cantonese Chinese: an N -gram-based speech to text transcription system

Benjamin K. Tsou; King Kui Sin; Samuel W. K. Chan; Tom B. Y. Lai; Caesar Suen Lun; K. T. Ko; Gary K. K. Chan; Lawrence Y. L. Cheung

A Cantonese Chinese transcription system to automatically convert stenograph code to Chinese characters is reported. The major challenge in developing such a system is the critical homocode problem because of homonymy. The statistical N-gram model is used to compute the best combination of characters. Supplemented with a 0.85 million character corpus of domain-specific training data and enhancement measures, the bigram and trigram implementations achieve 95% and 96% accuracy respectively, as compared with 78% accuracy in the baseline model. The system performance is comparable with other advanced Chinese Speech-to-Text input applications under development. The system meets an urgent need of the Judiciary of post-1997 Hong Kong.


Journal of East Asian Linguistics | 2009

Dislocation focus construction in Chinese

Lawrence Y. L. Cheung


Journal of East Asian Linguistics | 2009

Negative wh-construction and its semantic properties

Lawrence Y. L. Cheung


international conference on computational linguistics | 2010

Tree Topological Features for Unlexicalized Parsing

Samuel W. K. Chan; Lawrence Y. L. Cheung; Mickey W. C. Chong


Archive | 2001

A preliminary study of lexical density for the development of xml-based discourse structure tagger

Lawrence Y. L. Cheung; Tom B. Y. Lai; Benjamin K. Tsou; Francis C. Y. Chik; Robert W. P. Luk; Oi Yee Kwong

Collaboration


Dive into the Lawrence Y. L. Cheung's collaboration.

Top Co-Authors

Avatar

Benjamin K. Tsou

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Tom B. Y. Lai

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Samuel W. K. Chan

The Chinese University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

King Kui Sin

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Gary K. K. Chan

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

K. T. Ko

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Oi Yee Kwong

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Robert W. P. Luk

Hong Kong Polytechnic University

View shared research outputs
Top Co-Authors

Avatar

Francis C. Y. Chik

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Mickey W. C. Chong

The Chinese University of Hong Kong

View shared research outputs
Researchain Logo
Decentralizing Knowledge