A customer was unable to debug or reproduce why a web-based application was crashing periodically. The crashes were becoming more frequent, and increasingly affecting their business and customer satisfaction.
Steps used to solve the problem:
1. Rule out performance issues through performance testing
2. Aggregate and Review logs for finding issue trends
3. Improve error tracking in the application
4. Wait till issue reproduced
5. Used additional logging and new tools to discover underlying issue
6. Develop fix
7. Confirm resolution
Tools used to solve the problem:
1. Rational Performance Tester
Cause of the problem:
Search entry string with a tab.
Tips to avoid in the future:
1. Properly escape characters used to delimit data/fields
2. Never make assumptions when dealing with user inputted content
3. Use design patterns including encapsulation and low coupling
4. Use debug statements in logging
I have been working on a project for a client on and off for approximately a month. I was engaged to debug and solve an intermittent problem in a Java client-server application that is over ten years old. Periodically, the server application would get locked up, using 100% of the CPU, until it was manually killed and restarted. During this time, the client applet was unbearably slow or completely didn’t work. The problem had been around for a long time, but was starting to occur more frequently, likely due to more customers using the application.
After doing load testing using Rational Performance Tester (RPT) and monitoring the memory usage of the server application, it was determined that the issue was not caused by loading. I successfully verified that the application ran fine with approximately 10x the maximum number of actual users observed in production.
While performing the load testing, I analyzed server log files. They had been looked at previously, but nothing in particular stood out. I looked for various patterns around the times where problems were logged by operations personnel. Luckily, the developers created log entries when commands were started and completed. After matching up intertwined start and complete logs for commands using Cygwin scripts and Excel, I discovered that a specific command was started but never completed right around the time when the issue occurred. This was found in all of the dozen log files analyzed.
After narrowing the cause, I focused my testing using Sikuli to exhaustively test the functionality in question. I also added more logging to better understand how to solve the problem. I was not able to reproduce the problem, so we decided to go live with the version of the application with added logging and wait for it to occur again.
The problem occurred soon after and I looked at the generated log files. Immediately, I was able to determine the cause and recreate the problem. In the application, messages are being sent from the client applet to the server using delimited strings. However, the delimiters were not being escaped. Therefore, whenever a user entered a delimiter in the user interface, the corresponding message was not formed properly, causing the server application to improperly process it.
In the case of this application, message contents are used as arguments in SQL procedures with minimal error checking. Invalid and/or missing arguments were causing infinite results. So the server application was becoming locked up attempting to fetch and process the results.
There are several modifications I made to the client-server application to solve the issue and make it generally more robust.
First, I modified and tested changes to the server application to ensure the changes were effective. I modified the loop that reads from the database to include a loop counter, and to exit after an overly large number of iterations. This change eliminates the infinite loop, eliminating the need for operations personnel to restart the server application. The break condition was made overly large to guarantee it would not exit prematurely.
The algorithm used to process the message was also improved. It was modified to handle, as best it reasonably can, extra delimiters in the command. It also now validates arguments passed to SQL procedures.
Second, I modified and tested changes to the client applet. The class that packages messages was modified so that delimiters are removed from the fields put in the message. The modifications were complicated by the fact that in some cases the class was not used to create the message, rather the message was created manually with string concatenations and hard-coded delimiters (poor encapsulation). It was extremely time consuming to go through all of the messages, verifying they are created properly, modifying them so that they are using the class for packaging, and to test and verify all the changes. The lack of encapsulation and high coupling made the client-server application far more difficult to maintain.
Conclusion/Tips for Developers
This engagement highlights the importance of encapsulation and low coupling. If the class designed to package messages was consistently used, reducing coupling and making it well encapsulated, eliminating the delimiters in the fields of the message would have been much simpler to add. A small change to the class would have sufficed. Instead, the class had to be modified, code had to be changed to use the class for creating the messages, and a bunch of code had to be traced so that it was better understood.
Last, escaping delimiters is extremely important and no assumptions should be made when dealing with user entered content. Escaping delimiters is an extremely common in programming. Assuming a class will only be used for a specific purpose, for example content without tabs, can lead to future issues headaches when the class is reused. In this case, the tab character was used as a delimiter. Users should never enter the tab character, and in fact it is difficult to do so because pressing a tab changes the focus to the next element. However, it is still possible to paste tab characters into the user interface, which is what users were doing.