What is Megaphone?

What is Megaphone?
The Megaphone project is about enhancing open source chat software. Specifically, the goal is to allow ejabberd to support 1,000,000 simultaneous users. See The Plan page for more details on how I plan to solve this problem. See the About this Blog page for more details on why I created this blog.

Monday, April 9, 2012

Its Been Fun

Previously...
  • IT ACTUALLY WORKED!
  • I talked a bit about gen_tcp and active mode.
  • Initial results were very disappointing.

I ran across an issue where ECM would only deliver one packet per blob of data given to it.  This works fine if the data blog only contains one megaphone packet, but when the server gets busy, more than one packet can be contained in each blob.  I fixed this problem --- now ECM keeps sending out data until there is nothing left to deliver.  

In addition, megaphone now sends the HTTP status code along with the rest of the data along to ECM.  ECM, in turn, uses that code when it sends a response to a client.

There is still an issue where connections get dropped during high load periods.  In addition, once the error state is reached, I cannot seem to log on using JWChat.  All this leads me to believe there is some problem lurking in ECM.

At this point, I am going to try contacting some people to see if there is an easy solution (HA!) to ejabberd's memory consumption.  Unless ejabberd's memory use can be curtailed, I don't see a lot of value in megaphone --- yes it does bring some value, but not enough to accomplish the goal that I have set (being able to host 1,000,000 users).

In any case, I will put the code for megaphone/ECM up on github in the next few days.  I may also collect some of the insights I have gained during this process.  

Next time: code and insights?

Friday, April 6, 2012

This is Not the Result You Are Looking for

Previously...
  • I changed the pass-through from single to multi-threaded.
  • IT ACTUALLY WORKED!
  • I talked a bit about gen_tcp and active mode.

I did some testing of erlang, ejabberd and megaphone.  I have to say, the results were not what I was hoping for.  

On its own, erlang consumes around 2kB of memory per process and 10kB of memory per TCP connection.  ejabberd uses around 100kB per connection.  This is bad news for megaphone because I was hoping to save a lot of memory by shrinking the number of connections down to one, but if the TCP connection only accounts for 10% of the total memory usage per connection, then megaphone is not going to enable significantly larger numbers of connections per server.

Initial testing with megaphone bears this out...unfortunately.  When using megaphone, ejabberd still uses about 100kB per user.  At 1,000,000 users, while megaphone might save 10GB, if the server still needs roughly 90GB, it's not going to make much difference.  

Another issue is that, during testing, ejabberd issued a number of 404 result codes in addition to 200's.  The headers returned appeared to be the same as for 200 messages, but the message body was empty in the case of a 404 --- as would be expected.  Currently, the megaphone protocol does not return a status code --- if things are to move forward, then this will have to be changed.

Next time: what to do.

Thursday, April 5, 2012

gen_tcp and active mode

Previously...
  • I was able to send and receive messages.
  • I changed the pass-through from single to multi-threaded.
  • IT ACTUALLY WORKED!

One of the things I came across while developing megaphone and working with erlang has been the notion of "active mode" when dealing with TCP sockets.  When using gen_tcp, you can open a socket in active mode or passive mode.  

In active mode, when a packet of data arrives at the socket, the process associated with the socket gets an erlang message.  In passive mode, the process must ask for data via gen_tcp:recv.  

In a language like C or Java one would always use what is more or less passive mode: you perform a read to get the next block of data.  Coming from that background I had trouble understanding what the rationale behind active mode was.  I think now I understand a little better.

Languages like C and Java do not have the notion of message passing that erlang does, hence active mode does not make as much sense.  With erlang, I can use active mode and create one process (thread) that handles both sending and receiving data:

loop(Socket) ->
    receive
        { tcp, ReceiveSocket, Data } -> do something...
        { write, Data } ->
            gen_tcp:send(Socket, Data)
    end,
    loop(Socket).

Note that the "write" message would have to be sent from another process.

I find this a more natural way of handling socket I/O than the C/Java approaches that I've used.  

Next time: some results from testing with ejabberd.

Wednesday, April 4, 2012

Communicating

Previously...
  • I changed the format exchanged by ECM and megaphone.
  • I was able to send and receive messages.
  • I changed the pass-through from single to multi-threaded.

I found and resolved the problem that was preventing the system from working: in this case it was prepending the last character from the previous message to the front of the next message.  After that change everything magically worked.

IT ACTUALLY WORKED!

I was able to send messages without waiting for an incoming message, I could connect several times through the same port.  It was beautiful.

The actual code for doing this is depressingly small.  I attribute this to
  1. My incredible programming skill
  2. (mostly) The task wasn't that complex
A lot of time got used up trying to understand what was going on in ejabberd and becoming more familiar with erlang and nodejs.

At any rate...

IT ACTUALLY WORKED!

(muhahahahaha!)

Next time: some interesting observations on active vs. passive TCP modes in erlang.


Tuesday, April 3, 2012

One Stop Processes

Previously...
  • I got Pidgin to work with the pass-through.
  • I changed the format exchanged by ECM and megaphone.
  • I was able to send and receive messages.

In my last installment I mentioned a problem where I was unable to send messages from the client unless it was receiving a message from someone else.  It turns out that the problem related the inherent nature of the old version of the code.

The pass-through only had a single process (erlang's version of a thread) to send and receive messages.  What's more, a client sends a POST to the server and then when the server has some data, the server responds with the content of the data in the body of the response.  That's not quite how it works, but close enough.

The problem was that, while the single thread was blocking, waiting for data from the server, new data from the client, for example when a client tries to send a message, has to wait until either the server has something for the client or a timeout takes place.

The multi-client version of the code deals with this issue by having separate threads to send and receive data over the TCP connection.  After a bit of head-banging I figured out the problem.

In a continuation of the whole deja-vu thing, I ran into the problem where the process was not the owner of the socket and was therefore getting socket closed errors.  I still don't understand why I was getting this because I was using code along the lines of:

PID = spawn (some module, some function, some args),
ok = gen_tcp:controlling_process(Socket, PID)

That should avoid the whole mess, but I got the error anyways.  I replaced this with:

gen_tcp:controlling_process(Socket, self())

I ran this from inside the process instead of from the process that spawned it, that seemed to clear things up.

Next time: more results from the new, multi-threaded pass through

Monday, April 2, 2012

Over 10 Messages Delivered

Previously...
  • I made progress with a simple BOSH pass-through.
  • I got Pidgin to work with the pass-through.
  • I changed the format exchanged by ECM and megaphone

After a bit of work, I am able to connect to ejabberd using pidgin and megaphone.  To avoid some problems I am having with some plugins that I use with pidgin, I switched over to JWChat, a javascript client for BOSH systems.

JWChat is an awesome BOSH client, what's more it's quite fast when coupled with ejabberd.  Shout out to Stefan Strigler --- you did a great job with the development of this thing.

I am able to send and receive messages now, but there are some quirks.  While I can receive messages to a client connected to megaphone, messages sent from such a client seem to "stick" and avoid being delivered until the next message for the client is received.  I thought this might be caused by using the default value for TCP's "no delay" option (default value is false), but the problem persists when that option is set to true.  

An additional annoyance is that the socket that listens for new megaphone connections goes into a "time_wait" state if I restart ejabberd, necessitating a 60sec or so pause between cycles.  I tried using gen_tcp:close but that does not seem to help.

Next time: (hopefully) progress on the send/receive message front.