diTii.com Digital News Hub

Sign up with your email address to be the first to know about latest news and more.

I agree to have my personal information transfered to MailChimp (more information)


Google Translating “16 million Wikipedia words” into Arabic, Gujarati, Hindi, Kannada, Swahili, Tamil and Telugu Wiki

At Wikimania 2010 in Gda?sk, Poland, Google said “In last ~16 months, they’ve been working with Wikimedia Foundation, students, professors, Google volunteers, paid translators, and members of Wikipedia community to to translate more than 16 million words for Wiki into Arabic, Gujarati, Hindi, Kannada, Swahili, Tamil and Telugu.”

Google said “We began these efforts in 2008, starting with translating Wiki articles into Hindi Wikipedia, that had only 3.4 million words across 21k articles—while in contrast, English Wiki had 1.3 billion words across 2.5 million articles. We selected Wiki articles using a couple of different sets of criteria. First, we used Google search to determine most popular English Wiki articles read in India. Using Google Trends, we found articles that were consistently read over time—and not just temporarily popular.

Finally we used Translator Toolkit to translate articles that either didn’t exist or were placeholder articles or “stubs” in Hindi Wiki. In three months, we used a combination of human & machine translation tools to translate 600,000 words from more than 100 articles in English Wiki, growing Hindi Wiki by almost 20%. We’ve since repeated this process for other languages, to bring our total number of words translated to 16 million.


Share This Story, Choose Your Platform!

Do NOT follow this link or you will be banned from the site!