Participants

The LIMSI/CNRS (French National Agency for Scientific Research) is one of France's largest research laboratory working on language technologies; it covers the full spectrum from low level signal processing to language generation and machine translation. Building on its expertise in Automatic Speech Recognition and Statistical Language Modeling, it has also pioneered Statistical Machine Translation in France. It also brings the project valuable expertise on development and use of comparable corpora, and on Machine Learning techniques for language and translation models. The main activities of the Spoken Language Processing Group cover the following domains: Speech Recognition, Speech Understanding, Dialog Systems, Speaker and Language Recognition, Audio Indexing and Statistical Machine Translation. Associated activities include data collection, system evaluation and technology transfer. The group has succeeded in basic research as well as in applied research developing new algorithms, prototypes and databases. Advanced commercialized products developed from studies at LIMSI are now being used in several applications, with special emphasis on speech recognition (EC IST CORETEX), spoken language systems for information retrieval (EC MASK, Railtel, Arise, Home, DISC and Amities projects) and audio document indexation and retrieval in multiple languages (EC Olive, Alert, Echo, RNRT Theoreme, AudioSurf), facilitating human-human communication (EC CHIL) and statistical models for machine translation (EC TC-STAR, ANR CroTal, ANR Trace). In the context of the Quaero program, LIMSI has developed leading-edge Machine Translation systems; as a result, the group's submissions were ranked among the best for the French/English language pairs in the international WMT2 evaluations on automatic machine translation.

CEDRIC (Centre d'Études et De Recherche en Informatique et Communications) is a joint research laboratory of CNAM in Paris and ENSIIE in Evry. With 5 faculty members and 10 PhD students, the research group "Interaction pour lire et jouer" (ILJ) is among the few research teams in France that have developed strong skills in HCI research for video games, digital reading and digital libraries. The "Association des Bibliophiles Universels" (ABU)2 was created by P.Cubaud in 1993 and still allows people to freely access public domain French literature (approx. 900 fulltext downloads/day). In 1998, ILJ started with CNAM librarians and historians the "Conservatoire Numérique"), a digital library dedicated to the history of French technology (CNUM)3. The CNUM is partly funded by the French National Library and now holds more than 600K books pages. These two running digital libraries have been the foundations of the future work of the ILJ group account digital reading: - development of 3D visualization interfaces for digital book reading (PhD thesis of A. Topol, 2002 and J. Dupire, 2006) - innovative multi degree of freedom input devices and immersive output for document collection management (PhD thesis of R. Almeida, 2009) - design studies for personal reading appliances, such as photo albums (PhD thesis of S.H. Hsu, 2010 with France telecom R&D) - application of visual (PhD thesis of A. Damala, 2009) and audio (PhD thesis of F. Kaghat, ongoing) augmented reality technologies to personal appliance for museum visits Today, the ILJ group brings its expertise in the project FUI / DEMAT FACTORY dedicated to digitization farms. In this project, ILJ has investigated 3D book scanning and is in charge of visualization tools for the quality control of the scanning in the digitization farm. Apart from its well established master degree in videogames (ENJMIN), the ILJ team is now also in charge of a master degree in interaction design, in cooperation with ENSCI Les Ateliers and University Paris 8 (opening oct. 2011).

Reverso-Softissimo is a software publisher specialized in linguistic software in business since 1986. It has released the first grammar checker for French (Hugo), electronic dictionaries (Collins Lexibase), high quality translation software (Reverso) and various tools: online spell and grammar checking, conjugation, collaborative multilingual dictionaries. It provides MT solutions to over 5 million professionals in over 100 corporate clients (Total, Société Générale, Renault, etc) and operates Reverso.net1, a Web portal exposing various linguistic services for the benefits of more than 10 million users. Reverso-Softissimo has been or is involved in various research projects, including: - WebCrossling (ANR project) with CEA-LIST: this project aims at using cross-lingual search mechanisms to develop innovative MT systems. - Trace (ANR project, with LIMSI,) aims at providing robust translation systems thanks to improvements of the source texts with linguistic and statistical techniques on the one hand, and to the design of confidence measures for MT on the other hand; - Flavius (European FP7 Project) will develop an end-to-end solution for publishing and indexing websites in various languages; - Faust (European FP7 Project): the main objective is here to take advantage of user feedback to improve MT systems.

The CLILLAC-ARP laboratory (Centre de Linguistique Inter-Langues, de Lexicologie, de Linguistique Anglaise et de Corpus) is one of the few major research groups in France which deals with the full spectrum of theoretical linguistics and applied linguistics. The research areas of CLILLAC-ARP cover both written and spoken language, in such diverse areas as syntax, semantics, pragmatics, phonology and phonetics, corpus linguistics, phraseology, terminology and neology, lexicology, socio-linguistics, and language policies. The team's 42 members all work on authentic data, be it language corpora or field studies. The languages covered by the team include English, French, Spanish, German and Vietnamese; however, some members also work on Chinese, Turkish, Czech and Thai. Within CLILLAC-ARP, the LSCT group (LSP (Languages for Specific Purposes), Phraseology, and Translation) has developed over the years a high degree of national and international recognition in its areas of specialization. Research themes such as phraseological and terminological analysis, technical writing, the study of controlled languages, the development of writing aid tools for scientific English and the study of Languages for Specific Purposes are central to the research interests of the group. LSCT's experience in the corpus-based analysis of scientific texts is particularly well-known. The LSCT group was the promoter of the European Leonardo da Vinci project MeLLANGE, in which the different strategies adopted by trainee translators, and the different errors they make were studied on a translator learner corpus that was developed by the project. Research on terminology has led to the development of a corpus based bilingual term base (ARTES) mainly in Earth Science, but also in numerous other very specific domains, such as neuroimagery or electronic intelligence. The team also conducts corpus studies of the linguistic characterisation of textual genres (user's manual, scientific papers, abstracts, etc.) in other to provide writing aids for non-native speakers of English. Corpus linguistics applied to pragmatic translation is one of the leading themes developed over the years. Another subgroup, Discourse, Speech, and Cognition develops cognition-oriented studies on discourse for native speakers (ANR EMPHILINE) or learners of a second language (Diderot-Longdale Project). The group also works on the interface between phonetics, phonology and prosody in English and French (ANR Coregraphy). The Syntax, Semantics and Pragmatics group conducts research on lexical semantics (ANR NOMAGE) and verb or noun complementation (ANR EMPHILINE working on noun complementation expressing surprise), anaphora and ellipsis and the syntax of sentences and complex utterances.