The Chinese/English Political Interpreting Corpus

About the Project

The Chinese/English Political Interpreting Corpus (CEPIC), with about 6 .5 million word tokens in size, is designed for the study of Chinese/English political interpreting and translation.

CEPIC Data

The CEPIC consists of transcripts of speeches delivered by top political figures from Hong Kong, Beijing, Washington DC and London, as well as their translated/interpreted texts.

The main speech types of CEPIC include the reading of government reports such as policy addresses and budget speeches, Q&A at press conferences, parliamentary debates, as well as remarks delivered at bilateral meetings (For details, please refer to the section Basic Statistics).

The corpus features a parallel display of up to six versions of the same speech segment, aligned at paragraph level. Apart from POS tagging, the corpus is also annotated with different prosodic and paralinguistic features that are of concern to the study of spoken language as well as interpreting.

The CEPIC can be used to investigate matters relating to Chinese/English political translation/interpreting and political discourse at large. It can also serve students, teachers, as well as people working in political settings, in aspects of political speech delivery and translation/interpreting production. Users can also download search results from the corpus for their own teaching/research purposes.