Thanks to myPREP, the production of training corpus for statistical translation is also possible (in Moses format). Such corpora are divided into several parts for the training, the tuning and the evaluation.
myPREP also makes possible the alignment of comparable corpora. The outcome of the alignment is a set of pair of sentences associated with a score, the number of aligned terms, and the length of sentences. These functions can control the alignments.
myPREP requires segmented documents corpora in UTF-8 format. The converter and the segmentation tool of the myCAT software are included in the installation of myPREP.
myPREP is available for both Windows and for GNU/Linux (Ubuntu 12.04); please find the links to the respective installation files below. As for the sources, they are the same for both versions.
The software owned by Olanto are distributed under the GNU Affero General Public License Version 3, or AGPL V3.