While I was using GAE, the web app was at http://wware-reflector.appspot.com/. Now you can go to http://68.169.50.100/hack/ and you'll be immediately redirected to https://68.169.50.100/hack/.
This file (README.rst) is used to create the index file for the server. I haven't figured out a good way to set up a makefile for this, so just do the following manually when you change this file:
rst2html README.rst django/mysite/index.html # then commit index.html in git
With that out of the way, what is long polling?
My employer maintains a server for long polling, which I've only just learned about. Some of what I want to do is outside the scope of what that server will do, and it's in use for existing products and services so it's not available for my tinkering. So I'm looking at doing something of my own.
One issue with a long-polling server is a race condition that I'll call the "gap problem". When a client's connection times out, the client is supposed to issue a new HTTP request to re-open the connection. The potential problem is that between the dropping of the old connection and the opening of the new one, a message may become available that ought to go to the client. There are different approaches to this and the one used in that server is not to my liking.
The server at work is an HTTP server designed to offer a long-polling service. To use it, you open a connection to it using a unique id. That connection will block until a response is received or it times out. In either of those cases, you just reconnect and repeat. To post a message to a waiting client, you post it to the server using the id. Here's an example of how it works.
In terminal 1:
curl http://foobar.example.com/activity?id=my_test
in terminal 2:
curl http://foobar.example.com/publish?id=my_test -d "hello world"
As soon as you send the message in terminal 2, terminal 1 should print it. You can have any number of listeners on a particular id and they will all get notified.
How does the server at work address the gap problem? What it does is that when you use the "publish" URL, it spends the next five seconds broadcasting your information to anybody who opens a connection for that ID. This means that a client will typically receive several copies of each piece of information going out for a particular ID, and it also places a rate limit on how often information for a particular ID can go out.
First, my thought on the gap problem. Each client (a client being defined by an IP address and an ID) should be assigned a queue, in the form of a circular buffer, and this queue should ensure that a client should receive exactly one copy of each piece of information published for that ID, and should receive those pieces in the order they were published. A queue can time out if there has not been an active connection on it for, say, five seconds. It's reasonable to require that a client must re-establish its connection in under five seconds.
There will be two levels of hashtable. The first hashtable branches on the ID, and is used for both publisher and subscriber requests. When you get to a leaf of that hashtable, it has two things. One is a lock for that ID, the other is a hashtable of clients by IP address. The leaf nodes of the latter hashtable are, for each client, the circular buffer queue, a lock for the queue, and a timestamp for the most recent connection drop (or a special value to indicate an active connection at the present moment).
I think a lot of this stuff can be prototyped in Python using the low-level socket interface described at http://docs.python.org/library/socket.html and then, if appropriate, can be translated to C. When I talk about "circular buffers", I'm thinking about C, but there may be a better data structure for a queue in Python, such as described at http://docs.python.org/library/collections.html#deque-objects.
There are a few other things I want in a separate long-polling server. One is the ability to do HTTPS transactions, and to do POST operations, both for security reasons.
If the server decides to time out a queue, it responds to the client with a 504 error which indicates a "gateway timeout".
I'd like the server to run Django. One reason for this is that I'd like the web app to do AJAX with the same JSON interface that the mobile client uses. So I've got things working now with Apache/WSGI/Django/MySQL. To do this on Ubuntu, you need the following packages installed and additional steps:
$ sudo apt-get install mysql-server mysql-client python-mysqldb \
apache2 libapache2-mod-wsgi python-django
$ mysqladmin -u root -p password p@ssword
$ mysql -u root -p
mysql> create database long_poll;
mysql> ^D
$ python manage.py syncdb
That stuff is working pretty well except I'm still using HTTP since I'm testing on a local machine. I should probably try to nail that down too. These are two important files for turning on HTTPS.
Limitations of the current implementation include
When the time comes to start doing real HTTPS, refer to http://www.redrobotstudios.com/blog/2009/02/18/securing-django-with-ssl/. You need secure session cookies for Django, and a RewriteRule for Apache. Maybe also http://www.linuxquestions.org/linux/answers/Networking/Apache_SSL_Howto.
Publish by making an HTTPS POST request to "/publish/", and in the parameters specify "channel_id" and "data".
Subscribe by making HTTPS POST requests to "/subscribe/", and in the parameters specify "channel_id" and "client_id".
I'd initially used Google App Engine, then saw some things I wanted to tweak in the long-polling server, so I switched to a rented VPS running Ubuntu. Here is how you set up HTTPS on such a beast:
sudo a2enmod ssl sudo a2ensite default-ssl # create a certificate openssl genrsa -des3 -out server.key 1024 openssl rsa -in server.key -out server.key.insecure mv server.key server.key.secure mv server.key.insecure server.key openssl req -new -key server.key -out server.csr openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt sudo cp server.crt /etc/ssl/certs sudo cp server.key /etc/ssl/private # don't bother with setting up a certification authority sudo /etc/init.d/apache2 reload # that worked! I can now hit https://192.168.1.2/
I found it convenient to say "sudo hostname 68.169.50.100" right before the "openssl req" command. This means that the certificate will list the IP address as the hostname, so when you type it in as the URL, your browser won't complain about an inconsistency.
I've decided not to do this on the server because I want people to be able to enter via normal HTTP and then get redirected to HTTPS. But if you wanted to shut off port 80 and only do HTTPS, then you'd follow these instructions. Go into /etc/apache2/ports.conf and comment out these two lines:
NameVirtualHost *:80 Listen 80
and add this line inside the <IfModule mod_ssl.c> section:
NameVirtualHost *:443
Finally, go into /etc/apache2/sites-available/default-ssl and change this line:
<VirtualHost _default_:443>
to:
<VirtualHost *:443>