2013-01-18

Tips to Maximize Productivity - Cygwin



Cygwin is a project of a collection of tools that provides a Linux console based environment to Windows.  It is free software.
When internationalizing an application, matching keys in the source code with lines in properties files is a relatively large task.  Going through the entire application for each language, and ensuring all properties referenced in the source code are also listed in the property files is time consuming.
An even more quick and efficient method of finding missing keys is to write scripts to parse the source code.  Scripts that parse the source code for keys and match the keys with keys located in the property files can be written using Cygwin and some creativity.  Cygwin provides a command prompt that gives access to a collection of common Linux/UNIX tools.  By employing tools including find, xargs, cat, grep, sed, wc, sort and diff, mismatched keys can easily be found.  This includes keys either located in the source code or property file and not in the other, as well as duplicate keys.
These tools can be used for other tasks as well:

  • Searching projects for strings
  • Counting instances of specific strings for estimating effort
  • Listing files that require translation so that they can be added to tasks/work items
    • Calculating line counts of the files to estimate effort
  • Generating repetitive script files that apply changes to a database
  • Mass renaming of files
An example of one of a script that I created, full of relatively complicated commands to get keys used within properties files is the following, where getEscapedHTMLString and getEscapedJavaScriptString are methods that are called every time the properties file is accessed.  It took several attempts to get correct and was built slowly piece-by-piece.  The result is a sorted list of all the property keys referenced in code.java.  I added echos throughout to display progress.

#!/bin/bash
# copy Java files
echo Cleaning local directory...
rm -f *.java .properties PROPERTY_* JAVA
echo Copying Java source files...
find /cygdrive/c/src/ | grep "java" | xargs -I file cp file .
rm Internationalizer.java
# copy properties files
echo Copying properties files...
cp /cygdrive/c/src/XXX*.properties .
# get all property entries from Java source into file JAVA
echo Parsing Java source files...
cat *.java | sed "s/in\.get/\nin\.get/g;s/\")/\")\n/g" | grep "getEscape\|getPlain" | sed "s/in.getEscapedHTMLString//g;s/in.getEscapedJavaScriptString//g;s/in.getPlainString//g;s/(\"//g;s/\"[ ,a-zA-Z0-9)]*)[\;]*//g"  | sort > JAVA
# get all property entries from property files into file PROPERTY
echo Parsing properties files...
cat PIOS_en_CA.properties | sed "s/=.*//g" | sort > PROPERTY_EN
# Output properties missing in properties file
echo Missing English Properties \(PIOS_en_CA.properties\)
diff --ignore-blank-lines --ignore-all-space JAVA PROPERTY_EN | grep "<"
echo \*\* PROPERTY KEYS THAT ARE USED MULTIPLE TIMES IN JAVA SOURCE WILL BE REPORTED \*\*
echo Removing copied files...
rm -f *.java .properties PROPERTY_* JAVA

I used this script frequently to match keys referenced in the source code with property file keys.  This ensured I wouldn't get exceptions during execution, and quickly scanned for common typos.
It's also really useful to estimate effort.  In the same project, translation effort was estimated by the number of lines of code that needed internationalization.  I took the complete list of files and broke them down into functional categories to make testing/verification easier.  Then, to quickly determine the number of lines (which was later converted to effort in hours), a ran the following which outputs the total number of lines:

cat file1.java file2.java file3.java file4.java | wc -l

The last, and another relatively simple example is breaking a massive log file into pieces.  A client didn't have rotating logs implemented on their server, which resulted in a 2GB log file.  Obviously the log file is much to large to open and analyze as is.  However, by using split it's really easy to break the file into multiple manageable files.
The nice thing is, all of this is command-line based, so it's resource efficient and generally won't tie up your computer.  It's also extremely easy to save bash scripts for reference or repeating execution.