Saturday, 14 December 2013

More tests!

By principle, software engineering states that the amount of time dedicated to testing the product should be atleast 40% of the total timeline. Even though we had not planned to do so, we did practice this principle.

Bombarding the servers with 1000 connections concurrently was not as easy as it seemed to be. Having various restrictions on system limits, proved to be a deterrent. Changing the ulimit helped to an extent. But having tested on systems which were not supposed to be used as servers, led to complications.

The number of times our laptops hung and had to be restarted by the hard way is more than our fingers can count. This led us to conclude that we had to find another way of bench marking our server.
This was the effect of huge files on the system, but using smaller files did not exactly make sense, since there was no guarantee that the desired level of concurrency would be maintained. Using commands like netstat helped us to find out the actual number of connections active (ESTABLISHED).
Hence, we knew we had to use larger files so that they would take more time to download.

Over the course of the tests, ab spewed out a multitude of errors few of them being...
         apr_poll: timeout specified has expired
This error would pop up whenever the server took too long to respond. We couldn't find a way to increase the hard-coded timeout value of 30s.
When testing on a Mac, this error usually implied that the system had "crashed" and was no longer responding to any user action.
         apr_socket_recv: connection reset by peer
This error is caused by the server sending a RST signal. It occurred whenever the server did not have sufficient resources to process the client requests.

Large file, High concurrency - exactly the conditions we want, for a  test, but a bad recipe for our machines.
Other options being limited (subjected to the ulimit), we started looking at different directions. Also for ab, we could not find out a way to limit the rate of the request to be fetched as we could do in wget. Again, acting upon the advice of an expert in the field, we changed the mtu.
This made the time taken for a single request longer and hence ensured the level of concurrency to be maintained to an extent. And, one more parameter gets added!

Initially we were benchmarking by only varying the total number of requests and the number of concurrent connections. Now we had two more. One was the mtu and the other, the file size. Instead of taking a specific file of an average size, we decided to vary that too, to test the server.

So we continued our tests, hoping to see the light at the end of the tunnel.

No comments:

Post a Comment