WikiTrans Code Pyango View Goopytrans Wikipydia

Running wikitrans locally

04 January 2010 – James – Brooklyn

Update!

Please check the new setup document found on our wiki.

Setting up the environment

First we need to install some apps and then configure our virutual environment. I use Mac Ports but it’s not too different to setup a Linux machine. Then pick a directory for where the virtualenv will store it’s data. I like to use ~/.virtualenv/. Virtualenv will fail if this directory doesn’t exist.

The next lines should be put in your environment and then sourced into any shells you’re working with. Check the paths against what your system uses. On Ubuntu 9.4 there was no virtualenvwrapper package so I downloaded the source and put it in my src directory and used ~/src/virtualenv-src/virtualenvwrapper_bashrc.

Setting up the system

WikiTrans relies on many packages for it’s functionality. There is a snapshot of pinax on my github page that I build against. We will clone that and then clone wikitrans.

I keep my source code in a Projects directory on my laptop.

You will notice your shell’s prompt is a bit different. It should have (dev-wikitrans) in front of it now. This is telling you that you have activated the dev-wikitrans virtual environment. If you type ‘which python’, you will see that it now points to /your/home/dir/.virtualenvs/dev-wikitrans/bin/python. You also have access to pip. Pip is a tool for installing Python packages from the Python Package Index. Goopytrans, Wikipydia and Pyango View all, live there.

Anyway, we have only done some preliminary setup. This next step is gonna take a little while.

This fetches everything from the Pinax project and install pinax into the virtual environment. We use a similar file for setting up the wikitrans environment too. Once the external_apps install is finished, we use wikitran’s requirements.txt file to configure it’s environment.

There is an ordering bug that I can’t explain just yet with pyyaml so will install it first. We then install the Python Imaging Library. It’s installation will vary and requires compilation so install that separately. Then install all the libraries listed in the requirements file and we’re off.

Assuming everything installed correctly we are just about ready to go. We have to download all of the data nltk requires too. This will create an nltk_data directory in your home directory. We require the configuration files for the punkt tokenizer to understand how to split up sentences.

Let’s turn it on! The syncdb step synchronizes the database against the described data models and offers a chance for apps to also run some functionality. It uses a signals framework for this. One of the operations that gets called is one to create the admin user and so during this step it will ask you for a username, email and password. This will be the first user account created in the system. ‘runserver’ is how we turn on Django’s built in webserver. It’s not meat to be used for production but makes it easy to develop locally.

If that command is working for you, try going to http://127.0.0.1:8000. One of the really neat things about Django development is the good design of URL’s. I can link you to a local URL built off 127.0.0.1 and know it will work on your side too.

Try poking around the site a little bit. I would go to the Request Translation page and request “Mew_(band)” (a rock jazz band) with a title language of English and a target language of Spanish. If someone gives us a title, they must also give us the language of the Title. For example, ‘Johns Hopkins University’ is ‘Universidad Johns Hopkins’ in Spanish. Knowing which language we’re looking at saves us having to issue a search query for language offerings every time we want to load an article by it’s title.

The admin page

You can see the entire database system for any Django by visiting the top level of the admin page. This page is very useful for quickly constructing data while developing a web site. Once some data is constructed, you can export it using Django’s dumpdata command and reimport it using loaddata. And, if you have some data that you want initialized everytime you generate the database, to bootstrap the system, you put that data extract in a file called ‘initial_data.json’ in a fixtures folder inside your apps. For example, try ‘cat wt_languages/fixtures/initial_data.json’. You’ll see a defaut language competency for the first user in the system. Feel free to change it.

The Translation Request you entered for “Foobar” put an entry in the ArticleOfInterest table. See for yourself here. Wikitrans generally lives in two applications: wt_articles and wt_languages.

Batches

I currently have two batches that are important for development. There are two more batch files for handling email distribution, but those are beyond the scope of this entry. Anyway.

The first command, update_wiki_articles, reads the entries from the ArticleOfInterest table, uses wikipydia to communicate with Wikipedia, and fetches the articles. When it saves the articles to the data, it also splits the articles into sentences, using the Punkt tokenizer from NLTK. After a successful run you will see the articles here and the sentences here.

Now that you’ve consumed an article from Wikipedia and split it into sentence segments a Translation Request is generated. That will be here here. The next batch command, translate_google, will search the Translatation Request table for any entries with Google marked as the translator and uses Goopytrans to have Google translate the sentences one by one. We are getting close to having Mechanical Turk hooked in and will be translating via MTurk sentence by sentence also. It also possible to mark a request to be translated by a human, but this has no purpose implemented yet.

Together, those commands look like this.

After an article has been translated, you will see it in The Article List. This list is populated by what languages you have marked yourself competent in. By default, you are marked competent in Spanish as part of the bootstrapping setup from wt_languages.

The Articles landing page is set manually right now. It’d be cool to have this an automated process eventually but I’m not sure how to determine what should be featured yet. Anyway. go here to pick a Translated Article. Refresh the landing page and there is your article.

Educational destruction

Part of development is breaking things for progress. WikiTrans comes bundled with Werkzeug as part of the code base so let’s break the system and see what it offers us. Open up ~/Projects/wikitrans/wt-app/settings.py and put a comment (# symbol) at the beginning of the line that says “DATABASE_ENGINE=‘sqlite3’”. This will disable the database system and produce an error page on the landing. If you put your mouse on one of the dark grey lines that has code on it, you will see two icons on the right side appear. The one that looks like a terminal actually offers a interactive terminal at each code level in the stack output. The icon that looks like a page will expose the relevant code.

Amazing, right? Werkzeug is pretty incredible. Don’t forget to remove the comment character from line 26 in settings.py.

Conclusion

And so that’s the gist of how you can get the WikiTrans codebase running locally.

Creative Commons License

blog comments powered by Disqus
Fork me on GitHub