WikiTrans Code Pyango View Goopytrans Wikipydia

First time using wikipydia

12 January 2010 – Chris – Baltimore

This weekend I taught myself python and started writing some more code for the wikipydia.py module that James started. I’m extremely impressed at how easy python and JSON are to use, and how nicely written James’s code is. I picked it up with no problems at all.

Choosing What To Translate

As an introductory exercise, I decide to try to solve the problem of deciding what pages we should translate for WikiTrans. Because the Wikipedia has so many articles, we won’t be able to translate all of them into all other languages — using Mechanical Turk would be too expensive, and even using machine translation would probably require too many CPU hours to be feasible. So my goal was to select a subset of articles to translate first.

Getting A List Of Featured Articles

We could draw our translation candidates from Wikipedia’s Featured Articles that meet the criteria of being well-written, comprehensive, and well researched. These articles are labeled with the label Category:Featured_articles. I wrote a method for retrieving members of a given category:

Here’s an example of its output:

Ranking Articles By Popularity

Even 2739 articles may be too many to start out with. Ideally, I like to be able to sort them based on their popularity. We’ll use page view statistics to quantify popularity. Wikipedia user Henrik maintains stats for daily wikipedia page views at stats.grok.se (there’s also a 3 month archive of raw hourly traffic data at dammit.lt/wikistats). Henrik provides a JSON interface to stats.grok.se, so I wrote a new wikipydia method to query for page views.

Here’s an example of what that returns:

It takes about an hour to gather the stats for all 2738 featured articles in English. After that we can sort them based on their total views:

Creative Commons License

blog comments powered by Disqus
Fork me on GitHub