What is Megaphone?

What is Megaphone?
The Megaphone project is about enhancing open source chat software. Specifically, the goal is to allow ejabberd to support 1,000,000 simultaneous users. See The Plan page for more details on how I plan to solve this problem. See the About this Blog page for more details on why I created this blog.

The Plan

Ejabberd is a chat system like Yahoo! Instant Messenger, Microsoft Messenger or AOL Instant Messenger.  The difference is that ejabberd is not tied to any particular company and anybody can set up their own chat system using the server.

As of this writing, ye basic ejabberd requires about 100kB per user hosted.  This works just fine until you start getting hundreds of thousands of users on the system.  For 1,000,000 users, this means that you will need around 100GB of memory --- possible but not what I would call a "commodity system."

The plan is to use a library called libevent to handle the TCP connections and then multiplex all the users over a single TCP connection to ejabberd.  This would require
  • a libevent module
  • an ejabberd module
The libevent module would handle up to about 1,000,000 connections and multiplex them onto a single TCP connection to the ejabberd server.

I was originally planning to use a program called node-xmpp-bosh (NXB) to handle the TCP & HTTP connections, but after some work with it, I came to the conclusion that the additional complexity created by the BOSH server logic made this more effort than it was worth.  I have since decided to create my own, simplified nodejs program called the ejabberd connection manager (ECM).

The ejabberd server would demux the connection into individual sessions, and then treat each of them as a regular XMPP connection.  This part will likely be based on the http_bind module that currently comes with ejabberd, since that module has to deal with something other than a straight XMPP connection.

ejabberd and ECM would communicate by sending packets back and forth that have the form:

    <packet length>|<connection ID>|<content>

Where <packet length> is a zero padded, 10 digit integer value that represents the number of bytes in the packet, including the header.  This is needed so that either side can figure out where one packet ends and the next begins without having to rely on parsing the request.  A participant reads in the first 10 bytes, converts it into an integer and then reads in the rest of the packet.

The <connection ID> is a zero padded, 20 digit integer value that tells the participant who the packet is bound for or who it is from.  This is required because we are multiplexing many connections over a single data stream,

The <content> part of the packet is an arbitrary sized, arbitrary content block of data.  While BOSH may put requirements on how this field is formatted, ECM knows nothing about such requirements and does not enforce them.

Currently, here are the steps that I see being required to complete this project:
  1. Create ECM
    1. Create the portion that deals with client requests
      1. Create code that reads in the HTTP POST requests.
      2. Create code that packages the request into a format that includes a connection number.
    2. Create the portion that deals with server responses
      1. Create code that accepts responses from the server.
      2. Create code that routes responses to the proper exchange.
  2. Create a new ejabberd module
    1. Create the portion that deals with requests
      1. Determine how to send a packet of data from ECM so that it looks like something coming from HTTP.
      2. Modify the existing BOSH module to "remember" the connection ID when responding.
    2. Create the portion that deals with responses
      1. Create code that formats the response into an ECM packet.
      2. Determine how the ejabberd BOSH module responds to requests.
      3. Modify the response process to use ECM instead of HTTP.
What's All This Then?
This page defines the overall plan to realize the megaphone project.  On a more practical level, it identifies the next step to be taken in order to advance the project.

The purpose of this page is to keep the author on track with regards to what is going on and to hopefully goad him into taking action.