このページは http://www.slideshare.net/ToshiakiNakazawa/pacling2015 の内容を掲載しています。
A high-quality parallel corpus needs to be manually created to achieve good machine translation f...
A high-quality parallel corpus needs to be manually created to achieve good machine translation for the domains which do not have enough existing resources. Although the quality of the corpus to some extent can be improved by asking the professional translators to translate, it is impossible to completely avoid making any mistakes. In this paper, we propose a framework for cleaning the existing professionally-translated parallel corpus in a quick and cheap way. The proposed method uses a 3-step crowdsourcing procedure to efficiently detect and edit the translation flaws, and also guarantees the reliability of the edits. The experiments using the fashion-domain e-commerce-site (EC-site) parallel corpus show the effectiveness of the proposed method for the parallel corpus cleaning.