Sunday 6 October 2013

First glance at the code

Basically written in C, the code is a simplistic one, with not many surprises(yet).

It is always a good thing to begin from the start, and hence we started looking at the file boa.c. Yes, it is practically the first one to get executed (or rather the binary of it that is).

A couple of interesting observations can be made about it looking at its organization. The file itself does not give any deep insights into the code as such, but describes the structure to the execution.

Following a top-down approach, a few static functions were found, code to set various permissions, making stdin and stdout point to dev/null and the usual parsing of command line arguments followed.

What interested me here is the setting of max_connections to RLIMIT_NOFILE which is one of the limitations we shall improve upon by not using select. Possibly this was the first line of code we knew will be altered when it was spotted.

Glimpses of few of the functions..

There are calls to several functions which basically performed the initial setup the server required to run. The foremost of them being fixup_server_root() making sure the root given in the defines.h file or the one entered via command line (-c option) is valid.
The function, read_config_files, reads the configuration file! Well what else could we expect it to do!
The configuration file must be located in the location folder marked as server root.

Digressing here a little about the config file, it has a flat hierarchy, making it all the more easy to read, understand and use effectively. To mention a few of the directives, the port number and the IP address to listen on are most commonly used. Others specify the file locations of the access and error logs. Document root is another directive that one would want to set.
The file itself is parsed with the use of lex/yacc or a similar generated parser. The function, after reading the config file, sets the appropriate variables as given in the boa.conf file. The structure to store the configuration parameters is in its entirety a marvel. All the possible config parameters are stored in 4 value tuples - Name, Type of the config entry, Boolean value that indicates the presence/absence of the entry, Value of the entry if any. The type of the config entry, being S1A, S0A, or S2A determines the number of arguments it can have.

Returning back to our file, boa.c we have several more functions to be called.
The call open_logs() placed right after the parsing of the config file, checks the values set by the previous function and opens the log files accordingly.
The server socket is created and the initial parameters set by the function create_server_socket(). The socket is bound to the appropriate server family depending on the value given in the configuration file.
As the name suggests, init_signals does the task of setting up various signal handlers. It is then checked if the server is running as root, and if not, a log entry in the error_log appears as a warning.
The function create_common_env() sets up environment variables common to CGI scripts.
To make sure the difference in the implementations wont affect how it works, the function build_needs_escape() escapes characters based on the bit positions available in unsigned long.

Only after all this initial setting up, it forks itself to push into the background if the command line option '-d' is not toggled on which tells the program to not fork.
Immediately after it forks, the timestamp is logged in.

The main loop function: select_loop() which takes the server socket as parameter is finally called. And this just marks the beginning of the voyage.

1 comment:

  1. You need to full clarity on how does RLIMIT_NOFILE works when ulimit is set to 'unlimited'. When dealing with large number of connections, you may also like to consider other resources such as RLIMIT_RSS, RLIMIT_AS since these also will have some impact on how requests will be served when there are large number of concurrent requests.

    ReplyDelete