|Journal||International Journal of Computer Applications|
|Title||Building English-Punjabi Parallel corpus for Machine Translation|
|Index Term||Information Systems|
Parallel corpus is the key resource for English Punjabi machine translation. At wide level there is no availability of English-Punjabi corpora. There is a primary requirement of parallel corpus for the training of statistical machine translation.
In this paper, authors focus on building English-Punjabi corpus at large scale. It posed difficulties and the intensive labor to develop the corpus. We are intricate on the collection as well as the flow of work for the construction of parallel corpus. Now after getting the raw text, we need to refine the corpus in such a way that every source language sentence should have corresponding target language sentence.
The paper attempts to explore existing tools as well as building new tools. One of the goals is alignment of bilingual corpus. The alignment algorithms are used to tune the sentences. The accuracy depends on the type of corpus.
A cautious endeavor has been made to capture different types of texts.
|Keywords||Bilingual corpora, Machine-translation, English, Punjabi, NLP.|
|No. of Pages||4|
|Author Names||Shishpal Jindal, Vishal Goyal, Jaskarn Singh Bhullar|
|Author Emailsfirstname.lastname@example.org, email@example.com, firstname.lastname@example.org|
|Start Page No.||26|