Monday 30 December 2013

Encore!

Few things are just plain addictive. Like writing blogs. Once you start enjoying it, there is no stopping you. Or as some would agree, doing open source projects! Or for that matter, improving upon previously done ones especially when we do get a chance to do that.

Till now we worked on the boa version 0.94.13 available on the official website, and having a second opportunity to correct the wrong, we decided to go with the development version, 0.94.14rc21. Not sure, if it is the right choice, but that isn't what we are exactly worried about right now!

Its good to be back!

Saturday 14 December 2013

Not just code but beyond

A project, merely taken up maybe as a part of a course or otherwise would be limited to just coding. Making the end product work. Just the end result that matters.
This can lead to a much contrived way of doing things.

A project, done with the aim of software engineering on the other hand would concentrate more on the SE principles and aspects such as group dynamics and documentation.

Having experienced all of that and more, wondering how an open source project would be, we stepped into the world of delights as we ventured to take up the course offered as one of our electives. In it, we were mainly required to work on an open source project and it was our shot at giving something back to the community.
And this is what bloomed out of the entire process. A revelation. A sense of achievement and still the fire to do more, much more.
An open source project in its truest sense gave us the liberty to think and opened our minds to newer dimensions.
Not just the end result, but the journey, the journey mattered more than anything else. The knowledge gained out of the entire process. Not only making us technically sound, but ensuring an all-round development.
Distance  not  Displacement.

A self reference: The bloag : boa log that formed into a full fledged blog.
Creating a Mailing list!
Trying to contact the developers and the ecstasy when we got a reply.
Submitting a patch
Learning so many more things in the journey.
Making a video!
Who would have done that otherwise!


Our very own: Mailing List

oh what fun is porting

Amidst the chaos of testing for performance, I almost forgot to talk about cross-compatibility. Till now, we discussed about epoll instead of select, but this applies only to Linux systems. Epoll wouldn't work on Mac! We needed to ensure platform independence at least to an extent.

Hence, we went forward with implementing kqueue so as to be compatible with the FreeBSD environment. And a rush of conditional directives! We had to make changes in the Makefile too, so as to enable conditional compilation. This was again done for ensuring cross platform working of boa.

epoll and kqueue are similar in many respects. To start with, both are event based and have a control mechanism to add/delete modify file descriptors and events.
Kqueue is much more abstracted and hence general.

After implementing kqueue, we were wondering about the performance comparison between epoll and kqueue. But, it was not exactly the right thing to do since we had different system configurations on our laptops. Performance comparisons are only valid when all the other parameters remain same.
Theoretically, kqueue would do better. In the paper given link to below, more on performance has been discussed.

While going through the implementation of boa, we came across this in a comment "oh what fun is porting" and we agree too!!


Few useful links:
  Paper : Comparing and Evaluating Mechanisms : Page 217
  Select, Epoll, Kqueue
  Epoll



More tests!

By principle, software engineering states that the amount of time dedicated to testing the product should be atleast 40% of the total timeline. Even though we had not planned to do so, we did practice this principle.

Bombarding the servers with 1000 connections concurrently was not as easy as it seemed to be. Having various restrictions on system limits, proved to be a deterrent. Changing the ulimit helped to an extent. But having tested on systems which were not supposed to be used as servers, led to complications.

The number of times our laptops hung and had to be restarted by the hard way is more than our fingers can count. This led us to conclude that we had to find another way of bench marking our server.
This was the effect of huge files on the system, but using smaller files did not exactly make sense, since there was no guarantee that the desired level of concurrency would be maintained. Using commands like netstat helped us to find out the actual number of connections active (ESTABLISHED).
Hence, we knew we had to use larger files so that they would take more time to download.


Over the course of the tests, ab spewed out a multitude of errors few of them being...
         apr_poll: timeout specified has expired
This error would pop up whenever the server took too long to respond. We couldn't find a way to increase the hard-coded timeout value of 30s.
When testing on a Mac, this error usually implied that the system had "crashed" and was no longer responding to any user action.
         apr_socket_recv: connection reset by peer
This error is caused by the server sending a RST signal. It occurred whenever the server did not have sufficient resources to process the client requests.


Large file, High concurrency - exactly the conditions we want, for a  test, but a bad recipe for our machines.
Other options being limited (subjected to the ulimit), we started looking at different directions. Also for ab, we could not find out a way to limit the rate of the request to be fetched as we could do in wget. Again, acting upon the advice of an expert in the field, we changed the mtu.
This made the time taken for a single request longer and hence ensured the level of concurrency to be maintained to an extent. And, one more parameter gets added!

Initially we were benchmarking by only varying the total number of requests and the number of concurrent connections. Now we had two more. One was the mtu and the other, the file size. Instead of taking a specific file of an average size, we decided to vary that too, to test the server.

So we continued our tests, hoping to see the light at the end of the tunnel.

Thursday 12 December 2013

Performance analysis


Now, let us look at things a bit more practically.
After all the expected reality is always different from the actual one! Even though one's definitions of reality itself may vary.

Lets look at the performance of boa. So we decided. Historically, boa has been bench marked with ab and zb. We decided to go ahead with the Apache Benchmark.

A simple command:
   ab -n 1000 -c 500 http://10.0.0.1:12345/file.txt
   ab -n <total number of requests> -c <concurrency level> http://<ip addr>:<port>/<location>
and lots of time, is what we required

Repeatedly performing the same on epoll implementation gave us quite surprising results. No, not surprising to the extent of select performing better, but both of them performing at about the same level.


               Time taken
Concurrency Level

epoll vs select

When the lines coincide so perfectly, you wonder if this was the result expected! But Alas! It was not. We expected epoll to perform better, not just better much better!

But theoretically, we did expect select and epoll to perform just about the same for a lesser number of concurrent connections. Upon consulting an expert for advice on these matters, we were told that a concurrency level of 250 is peanuts!

Planning to try out a concurrency of a 1000 clients we wondered how loaded our machines would be.
So we decided to find out!

Tuesday 10 December 2013

Great Expectations

Performance analysis

Relying on the competence of Apache Benchmark, we decided to run performance tests on our version of Boa implemented with epoll and the original one implemented with select.
Theoretically, we expected to observe the following:

  • With large number of active connections, select may perform better as the possibility of activity on an fd in the fd set is high. Hence traversing the fd set gives rise to a low "miss" rate.
  • Similarly, with low concurrency, select's bit comparison will yield better results than the looping structure and word comparison in epoll.
  • A better performance of epoll arises when there are a large number of inactive connections. This is a scenario that is difficult to simulate.
  • Only a hard-limit (depending on the value specified by RLIMIT_NOFILE) persists for the maximum number of concurrent connections for epoll. For select, it is 1024. Although, in select, some workarounds are possible to change this maximum value by manipulating FD_SETSIZE (system dependent)
  • Within select and epoll, increasing the number of connections while keeping concurrency constant should not yield a varying value. The performance is expected to remain constant.
With this frame of reference in mind, we proceed to perform the tests.
Wish us Luck!

Monday 9 December 2013

Debugging macros


Revisiting C was another by product of working on boa.
It was not that there were complex constructs which we had difficulty comprehending but it was while implementing simple ones did we realize once again the beauty of C.

Especially during playing along with macros!

Macros being really simple in their definition and working, are at the same time most difficult to debug. That's what we felt, when we overlooked the simple macros when we got few errors.
Not compilation ones, they are the simplest to fix!

Upon adding our code and attempting to run boa for the second time, it ran and accepted requests.
Not a bad thing at all or on second thoughts, was it? This was better than the previous time where it would just exit upon attempting to connect.

Enter GDB.
THE most helpful tool. It works wonders. No need of adding extra printf statements everywhere just to check where exactly the program goes wrong.

And guess what?! The point to be pinpointed was, in a macro. A very simple and silly mistake. Perhaps it required a fresher mind to figure out.
Macros perform simple code substitution.
That's what we did. But a simple mapping doesn't always work out.

Simple mapping from the return value of a function call to a value to be set inside a macro.

#define EPOLL_ISSET(args..) {\
int i=0;\
for(i=0; i<count; ++i) \
  if(condition) {\
     return_val = 1; \
     break; \
} \
return_val = 0;\   //always sets return val to 0
}


#define EPOLL_ISSET(args..) {\
int i=0;\
return_val = 0;\   //SHOULD BE HERE
for(i=0; i<count; ++i) \
  if(condition) {\
     return_val = 1; \
     break; \
} \
}

This is what happens when mindless mapping is done. This was between a return statement returning a value from the function and the setting of a variable inside a macro!
The first thought was to write a function for the same. But it was avoided considering the manner in which boa is structured as such. An overhead avoided!

Time saved is time gained.

GDB With macros


Friday 29 November 2013

The Boa FSM

     We began the solemn and long awaited task of our crusade through Boa, considering it as a finite state machine (FSM). The following is our representation of the aforementioned FSM. As you can notice, it seems pretty self-explanatory once spread out and separated from the distractions of function calls and a multitude of code statements. We do hope there are no errors in our interpretation.

The FSM.
Any repeated state should be considered as a single state. It was done for the sake of neatness.
Following the FSM is the function call graph. It depicts the function that is called when a major state change occurs. This is an overly simplistic graph solely for the purpose of providing an association between the FSM and the actual code.

The function call graph

Monday 4 November 2013

Head on with select


And we attack select with epoll. Or so we thought.

Having written basic sever programs with select and epoll, we could figure out there was some sort of a correspondence between the two. For instance the select( ) call was to be replace by epoll_wait( ) and so on. Select definitely did not have an equivalent for all the other calls and the variables required for epoll mechanism but we could generalize.

So when the task of converting the approach of the web server to an epoll based one, we naturally assumed that all we had to do was a mapping. We did not expect it to be simple though, but we made a mistake, a huge one - of ASSUMING. One of our professors always told us what-ever we do, we definitely should not assume.

And this led to few difficulties.


We has initially thought we could use the information of the vague sort of mapping that existed between the two server programs written using select and epoll approaches.
But that was absolutely not true!
As all things are relative, so are programs!
The very essence of boa being the select approach to web server architecture, it had its entire code centered around the setting and "unsetting" of various fds.

FD FD everywhere, not a FD to set, FD FD everywhere, not a FD to clear!

As we know, with epoll we do not fiddle with file descriptors at all! Events, just pure events! True, we cannot call them pure, since epoll is not completely based on the pure event based structure. We shall not digress here about epoll though. Will write about it some other day perhaps.

With no fds in hand, and just a bunch of events, the entire idea of some sort of a mapping went for a toss!

With select, the entire game involves playing with the fd sets. In epoll, we do not have such a direct handle of the fds. On top of that, a single event set in place of two different versions of write fd-set and read fd-set led to further considerations to be looked into, while trying to map the same.

Monday 28 October 2013

Few Clarifications

Following in from the design issue discussed earlier, as we realized we could not completely replace select with epoll, there had to be macros involved.
Fallback options : select - to make sure in the absence of epoll, boa could work with the original configuration.
And hence we considered the option of checking the type of the system, which was just a check on the predefined macros __linux and __APPLE__.

But was this check enough?
Linux has inbuilt epoll support from kernel version 2.5.44.
Is it enough to check the kind of system that is and assume that the feature is supported?
I was not sure. Hence, the double check. Found a piece of code which claimed to check for epoll support across systems [1]. Considered to use it in the epoll implementation to make a choice.


Another realization that came our way was that we needn't change the request structure itself, instead build the code around it. The request structure could remain the same way and function as it always did.

Also, as mentioned in the comment and the previous blog post [2] we do not have to consider altering the setting of the max connections as it is set to RLIMIT_NOFILE. Even though epoll may not have a limit on the maximum number of clients connected, but system limits have to be always accounted for. Also, setting the value to RLIMIT_INFINITY could cause some system dependent issues.


[1] http://www.gnu.org/software/autoconf-archive/ax_have_epoll.html
[2] http://a-voyage-into-the-land-of-boa.blogspot.in/2013/10/first-glance-at-code.html

Monday 21 October 2013

Choosing a Mode of Execution: A Design Issue

An important aspect to examine prior to diving into the actual implementation is the design issue regarding choice of modes among select, epoll and kqueue.

Several possible solutions exist. We shall consider ones that appear most relevant.

Case 1: 
Replacing select in its entirety with either epoll or kqueue
As a convenience to the user, it makes sense to take the responsibility of choosing the path away from the user; make it in-built. 
Since the choice is only among 2 options, usage of macros would be a convenient solution.[1]
That's pretty nifty.

As our primary aim is to provide an enhancement to the web server, augmenting extensibility features will be an added advantage. The code can be made generic, adapting a common structure to handle requests through either approach. The function activated can depend on the macro chosen in as per the approach declared previously.

struct kevent {
             uintptr_t       ident;          /* identifier for this event */
             int16_t         filter;         /* filter for event */
             uint16_t        flags;          /* general flags */
             uint32_t        fflags;         /* filter-specific flags */
             intptr_t        data;           /* filter-specific data */
             void            *udata;         /* opaque user data identifier */
     };

typedef union epoll_data {
               void    *ptr;
               int      fd;
               uint32_t u32;
               uint64_t u64;
} epoll_data_t;

struct epoll_event {
               uint32_t     events;    /* Epoll events */
               epoll_data_t data;      /* User data variable */
};

We have provided the corresponding kqueue and epoll structures for reference.

            
Case 2: 
Keeping select implementation, and having a choice between select and epoll/kqueue

This approach is an extension of the above. The user has a choice to either use select or an event mechanism (ie epoll/kqeueue. This choice will be made inherently depending on the type of the system)

Code modules have a design potential.[2] But they might be an overkill with respect to BOA. 

            
Case 3: 
The choice among select, epoll and kqueue is left to the user.

You might wonder whether this is really advisable. After all, the user must not be bothered with implementation technicality. But, in the future, there may arise an operating system that provides libraries containing both epoll and kqueue.[3] Perhaps a user extremely particular about performance with the knowledge of the type of traffic expected will also find this useful.

In the actual code, a three-way switch will comfortably handle this circumstance.

Note regarding kFreeBSD: (see link [3])
Apparently you can run both Linux and FreeBSD binaries on it. It supports calls up to Linux version 2.6.18. But, using macros here will result in only epoll being used. (we check for __APPLE__ and not FreeBSD. This operating system will fall under __UNIX__ flag.)


The situation appears thus, having articulated possible solutions. We do hope for your valuable suggestions or additions. They would be most helpful in making an appropriate choice.



Monday 14 October 2013

Determining what to Change


God grant me the serenity to accept the things I cannot change, the courage to change the things I can, and the wisdom to know the difference.

This wisdom is generally what we may lack at the crucial time when the code is handled for changing. We change things that might have had other impacts, dependencies and as a result of which our entire effort might just have to be redone.

Following in from the previous brief discussion of the various function calls made, we can clearly distinguish between the two groups.

The major goal of the entire project being the transition from the use of select to efficiently make use of the epoll mechanism, the majority of the work involved in playing with socket based calls. It is clear that no initialization functions need any sort of tweaks.

The file or the function that has to be majorly changed is of course select_loop( ) itself!
The file select.c contains the major logic of the select statement which needs to be adapted for epoll.
Also, the processing is done in the form of queues. Three different types of queues are used for processing the requests. These do not require any major changes but for the dependence on their use of the structure request.

Examining dependencies further, we come across a structure - struct request which stores the incoming request details and is further used for processing the actual request. Defined in the globals.h file, it is now marked for change to incorporate the epoll based event structure.

A deluge of pointers awaits as we explore the files request.c and response.c. Mainly used for processing the request structure, they become potential candidates and so does buffer.c which contains functions which help in buffering the data before sending it to the client. To he file, get.c performs a part to get the headers of the incoming request. Reading and writing involves the use of pipes and hence the file, pipe.c is another contender. Inspecting further, it is found that the file cgi.c may need minor tweaks for it uses the file descriptor from the request structure.

Here, we have now enumerated the ones that possibly need to be changed. Now let us look at them in detail so as to understand them better.

Sunday 6 October 2013

First glance at the code

Basically written in C, the code is a simplistic one, with not many surprises(yet).

It is always a good thing to begin from the start, and hence we started looking at the file boa.c. Yes, it is practically the first one to get executed (or rather the binary of it that is).

A couple of interesting observations can be made about it looking at its organization. The file itself does not give any deep insights into the code as such, but describes the structure to the execution.

Following a top-down approach, a few static functions were found, code to set various permissions, making stdin and stdout point to dev/null and the usual parsing of command line arguments followed.

What interested me here is the setting of max_connections to RLIMIT_NOFILE which is one of the limitations we shall improve upon by not using select. Possibly this was the first line of code we knew will be altered when it was spotted.

Glimpses of few of the functions..

There are calls to several functions which basically performed the initial setup the server required to run. The foremost of them being fixup_server_root() making sure the root given in the defines.h file or the one entered via command line (-c option) is valid.
The function, read_config_files, reads the configuration file! Well what else could we expect it to do!
The configuration file must be located in the location folder marked as server root.

Digressing here a little about the config file, it has a flat hierarchy, making it all the more easy to read, understand and use effectively. To mention a few of the directives, the port number and the IP address to listen on are most commonly used. Others specify the file locations of the access and error logs. Document root is another directive that one would want to set.
The file itself is parsed with the use of lex/yacc or a similar generated parser. The function, after reading the config file, sets the appropriate variables as given in the boa.conf file. The structure to store the configuration parameters is in its entirety a marvel. All the possible config parameters are stored in 4 value tuples - Name, Type of the config entry, Boolean value that indicates the presence/absence of the entry, Value of the entry if any. The type of the config entry, being S1A, S0A, or S2A determines the number of arguments it can have.

Returning back to our file, boa.c we have several more functions to be called.
The call open_logs() placed right after the parsing of the config file, checks the values set by the previous function and opens the log files accordingly.
The server socket is created and the initial parameters set by the function create_server_socket(). The socket is bound to the appropriate server family depending on the value given in the configuration file.
As the name suggests, init_signals does the task of setting up various signal handlers. It is then checked if the server is running as root, and if not, a log entry in the error_log appears as a warning.
The function create_common_env() sets up environment variables common to CGI scripts.
To make sure the difference in the implementations wont affect how it works, the function build_needs_escape() escapes characters based on the bit positions available in unsigned long.

Only after all this initial setting up, it forks itself to push into the background if the command line option '-d' is not toggled on which tells the program to not fork.
Immediately after it forks, the timestamp is logged in.

The main loop function: select_loop() which takes the server socket as parameter is finally called. And this just marks the beginning of the voyage.

Sunday 22 September 2013

The First step to Change is Understanding

To enhance anything, we first need to completely understand it. Inside out.
Only then can we make sure that the enhancements augment the preexisting model perfectly.

Our first encounter with boa was in the classroom, although a brief one, it implored us to explore..

To bring about a change, we also need to know why it is imperative to do so and hence understand the shortcomings of the current paradigm.
Having already learnt about the different approaches to building a scalable web server, we concluded that boa, being implemented using the select approach, did have some limitations inherent to the model itself.

The select( ) system call, is not entirely scalable, which is as a consequence of the limit imposed on the number of clients - indirectly by the size of the fd_set, which generally is 1024.
It takes a state based approach that makes the kernel maintain the entire set of file descriptors.
To determine the file descriptor of interest (file descriptor on which data has been received / the one to which data has to be written to), this set has to be scanned in its entirety. This is time consuming and inefficient as the result sets are mostly sparse.

The epoll( ) system call in Linux and kqueue( ) in BSD can be used for overcoming the aforementioned overhead.
The epoll mechanism essentially lies in between that of an event based approach and a state based one. It does not set any limit on the maximum number of connections. It is also considered better in terms of performance as compared to the select based approach.
The major performance enhancement comes from the fact that the epoll system call relies on the arrival of events on particular file descriptors and does not scan the entire set of fds linearly. It also supports two different modes of operations: edge triggered and level triggered.
The kqueue approach, though is very similar to that of epoll, provides a further increase in performance. It allows multiple updates of the specified set in a single system call.

Having looked at into why something has to be changed, we decided to look into what exactly had to be changed and most importantly, things that have to be left unchanged because - Change for the sake of change is not always a good idea!

Sunday 15 September 2013

Background check

Before deep diving into the server and its architecture, I decided to first look at existing enhancements or latest changes made to it. Along with that, the timeline as I could figure out was pretty interesting.

First look at the boa website [1] lead me jumping to a premature conclusion that the server had faded away in the background as long back as in 2005.
The first official release of boa was in 1996 and after that it did have many revisions and versions, the latest being in 2005.

But what happened after that? Did it lose its importance in the computing world as other better servers came up and Google gave fewer results of boa, the web server when the phrase 'boa' was entered..?
At least it still did appear in the top 5 alternatives to the Apache web server if that was any consolation.
The web exploration of boa the web server was made even more tougher with other popular services being named boa, thankfully they were not web servers!

But boa, being the web server it is, has not lost its relevance. Having low memory requirements and working wonderfully on smaller machines, boa is still operational.
Here are a few pretty recent papers to prove this fact :
i. Design and development of embedded web server based on Arm9 and Linux [2]
ii. The Transplantation of BOA [3]

The papers describe how boa can be transplanted to embedded devices to perform tasks such as network monitoring and for a variety of intelligent appliances based applications. The prospects of an embedded web server are now wider is what the papers also seem to claim.

It is gratifying to know that boa is taken into due notice for various applications and also is one of the foremost contenders for embedded applications.

There is another thing I came across more recently. We are all aware of the hype surrounding any android application. I was elated to find out that the humble web server had an equivalent app for it too [4].

This made me finally conclude that boa was definitely in the game and after all not a wrong choice to spend some time on!


[1]Official web site http://www.boa.org/

Boa is not dead, not yet!

[2] Design and development of embedded web server based on Arm9 and Linux
[3] The Transplantation of BOA Based on Linux3.0.1 and S3C6410
[4] Android app

The beginning: Boa Log (Bloag)

Boa...
The first time I heard of it, least did I know it would get associated in a manner that the server would continue dishing out pages to me forever.

When I initially learnt about boa, along with a couple of other web servers, the thing that fascinated me was the size of the code. It reminded me of 'Small size, big impact' things.

Whenever software such as a web server is discussed, we do not imagine it to be less than 10k lines. But boa, is different. It is sleek and leaves behind a small footprint.

Its limited functionality might not have made it huge in the market, but it is good at what it was actually made for.
Being lightweight, it is extremely useful for embedded applications. It is a tiny web server that also offers high performance.