Something about Kafka:
Recently LinkedIn's Kafka has been accepted to Apache Incubator.
Kafka is a high-throughput distributed publish-scribe messaging system written in Scala.
Kafka scales very well with increased dataset as well subscribers. For detailed performance results, check this out.
Capturing internet events:
We were looking to build a data application that captures mobile activities.
Requirements are:
- High volume
- Data sent over the internet
Kafka being the obvious choice for streaming message to our backend systems, but we of course don't want to expose our Kafka endpoint on the web.
So, we need to build a http proxy to front our Kafka cluster.
Python and Tornado:
Django is a little heavy for this use-case, all I needed is a http server.
Given Kafka already has a Python client, voila, we have a http proxy listening for events pumping to Kafka.
Here is the code:
import tornado.ioloop
import tornado.web
from kafka import KafkaProducer
class KafkaHandler(tornado.web.RequestHandler):
topic = "app-update"
producer = KafkaProducer('localhost',9092)
def post(self):
d = self.request.body
self.producer.send([d], self.topic)
print d
application = tornado.web.Application([
(r"/app-update", KafkaHandler),
])
if __name__ == "__main__":
application.listen(8080)
tornado.ioloop.IOLoop.instance().start()
Dude, that is awesome. Tornado might even be a bit heavyweight for this - there's also Flask and Bottle. Tornado just sits between "too lightweight" and "too heavyweight" for me. Hits a really sweet spot.
ReplyDeleteWhat do you think about Twisted? (http://twistedmatrix.com/trac/)
ReplyDelete