This section contains the definition of the formalisms to be used by the project participants for resource annotation and exchange. A detailed description of the proposed formalisms is given in deliverable 1.1 of the project. The following files are distributed:

  • transread_v1-2.xsd: the XML scheme. This is the definition of the annotation formalism.
  • transread_v1-2.dtd: the DTD of the annotation scheme. This file introduces the structure of the annotation scheme.

We also provide a text annotated according to this scheme. This sample is based on an excerpt of the novel "The Last of the Mohicans" by F. Cooper and is composed of three files:

  • sample_Mohicans_en.xhtml: the first two chapters of the English version of the novel, taken from the Gutenberg Project.
  • sample_Mohicans_fr.xhtml: the first two chapters of the French version of the novel, taken from Gutenberg Project.
  • sample_Mohicans_annot.xml: the annotation file linking the previous two documents. The file contains alignments at various levels, morpho-syntactic annotations, word sense disambiguation information, etc.

Parallel Corpora

This section lists the parallel corpora created and manually annotated during the project. These corpora are distributed under the CC-BY licence. If you use it for your research, please cite:

  author = 	 {Xu, Yong and Yvon, François},
  title = 	 {Novel annotation schemes for sentential and sub-sentential alignments of bi-texts},
  booktitle = {Proceedings of 10th Language Resources and Evaluation Conference },
  year = 	 2016,
  series = 	 {LREC'16},
  address = 	 {Portorož (Slovenia)}

