Friday, July 8, 2011

Python, Tornado, Kafka, Oh My!

Something about Kafka:

Recently LinkedIn's Kafka has been accepted to Apache Incubator.

Kafka is a high-throughput distributed publish-scribe messaging system written in Scala.

Kafka scales very well with increased dataset as well subscribers. For detailed performance results, check this out.

Capturing internet events:

We were looking to build a data application that captures mobile activities.

Requirements are:
  • High volume
  • Data sent over the internet
Kafka being the obvious choice for streaming message to our backend systems, but we of course don't want to expose our Kafka endpoint on the web.

So, we need to build a http proxy to front our Kafka cluster.

Python and Tornado:

Being a recent Python convert (by learning Django from Lei), I wanted to build this proxy in Python.

Django is a little heavy for this use-case, all I needed is a http server.

Luckily Ikai facebooked me his talk on Tornado - a light-weight http server in Python.

Given Kafka already has a Python client, voila, we have a http proxy listening for events pumping to Kafka.

Here is the code:

import tornado.ioloop
import tornado.web

from kafka import KafkaProducer

class KafkaHandler(tornado.web.RequestHandler):
topic = "app-update"
producer = KafkaProducer('localhost',9092)
def post(self):
d = self.request.body
self.producer.send([d], self.topic)
print d

application = tornado.web.Application([
(r"/app-update", KafkaHandler),
])

if __name__ == "__main__":
application.listen(8080)
tornado.ioloop.IOLoop.instance().start()


2 comments:

  1. Dude, that is awesome. Tornado might even be a bit heavyweight for this - there's also Flask and Bottle. Tornado just sits between "too lightweight" and "too heavyweight" for me. Hits a really sweet spot.

    ReplyDelete
  2. What do you think about Twisted? (http://twistedmatrix.com/trac/)

    ReplyDelete