What I'm Up To: 2013

2013-10-07

How do you deploy multiple versions of the same portlet in Liferay?

While developing a portal site, it can be very helpful to support deploying multiple versions of the same portlet. Some of the reasons that our development team have encountered include:

Helping to debug issues that are introduced in new versions
Comparing functionality and performance between two portlets, while keeping everything else equal
Comparing functionality and performance between two portlets, while keeping everything else equal
When a portlet is used multiple times on a single site, it can be advantageous to use multiple versions of the same portlet so that all dependent portlets don't have to be updated when new features are added to the portlet that is used multiple times
In Liferay, JAR files are not cleaned up when a portlet is redeployed. Therefore, if the JAR files a portlet uses are updated and it is redeployed, both the original JAR files and new JAR files will be in the lib directory. This can cause issues if the new JAR files contain different versions of classes.

We have run into this situation a couple of times, and it leads to confusing and unexpected results

To properly version a portlet, you need to do two things which we achieved by adding version numbers to both:

Make the directory the war file is deployed to unique so that Liferay treats them as separate portlets (in the webapps directory)
Make the name of the portlet that shows up in Liferay's Add menu unique so that you can control the version of the portlet that is added to a page, and later on determine which version of the portlet is on each page

Controlling webapps directory

Through experimentation, I found that the webapps directory is based on the WAR filename. The exact WAR filename is used as the name of the webapps directory, except when the WAR filename contains the string "-portlet". When the WAR filename contains specific character sequences, everything after them is ignored. The character sequences that I know about are: -portlet; -hook; -ext. Here are a couple of examples:

WAR Filename	webapps directory
calendar_1.0.3.1.war	calendar_1.0.3.1/
calendar_1_0_4_0.war	calendar_1_0_4_0/
myportlet-portletAA.war	myportlet-portlet/
crazystuff-ext.war	crazystuff-ext/
myaccount-hook12-production.war	myaccount-hook/

Controlling the portlet name (in Liferay)

The first thing that needs to be done is to make the portlet ID unique so that Liferay can track it. The portlet id is in liferay-display.xml. I simply concatenate the version number onto the end of the portlet ID.

Next is to update the portlet name. The name must be updated and kept consistent in portlet.xml and liferay-portlet.xml. To be consistent, I simply concatenate the version number to the end of the portlet name.

One thing to keep in mind is that there is a known bug in Liferay, where an exception is thrown if the portlet name has a hyphen ('-') in it. We also found issues with periods ('.') in the name and in the WAR filename. So what we do is avoid using either of these characters (as well as spaces), and replace them with underscores ('_').

Maven

In our case, we are using Maven. So to simplify portlet versioning, we keep the version number in the POM file and variables throughout the other files where necessary. The variables are automatically replaced with the version number when the WAR file is generated.

In the POM file we setup the WAR filename to be ${pom.name}_${pom.version}.war to minimize the number of changes required when changing the name and version of the portlet or hook. Note that Maven can't be used with ext plugins.

2013-08-05

Liferay - Multi-Stage Development and Data Management

Our development environments and data migration strategy is something that has evolved during the project, and will continue to evolve during development. Process and design improvements are encouraged and investigated in an attempt to optimize the end product.

When multiple developers are involved in a large project, multi-staging is extremely important. It allows developers to work simultaneously and independently, without affecting each other. It also allows stability and loss testing along with a bunch of advantages. The number of stages required should depend on several things (this is obviously not a complete list):

Size of the project
Complexity
Number of developers

Two stages is the minimum required for any project, one for development and one for production. However, it's highly recommended to have a Test environment between Development and Production. The Test environment should be identical, or at least as close to the Production development as possible to minimize risk when deploying updates to Production.

In our situation we have three stages, which are:

Development
Test
Production

Furthermore, we are currently debating adding a fourth stage, QA, that would be always kept identical to the Production environment.

To meet our requirements, the Production environment is designed to be highly available with session replication. Each stage incrementally becomes more similar to the Production environment. The incremental changes spread out the issues that are due to environment variations, easing debugging. Each environment is designed specifically for a purpose.

Initial development is performed on a local desktop or laptop, which is not listed above. The local environment is the most flexible, easily restarted and best for independent development and debugging. Each developer has a complete environment running locally on their desktop/laptop. This environment, which is used for rapid development, initial integration and testing by the developer, debugging, etc... It gives developers the maximum freedom to work without affecting each other, which is very important near the start of the project because restarting services is quite common. Locally we're developing using Liferay Developer Studio on Microsoft Windows.

We're using Development for initial integration between developers and testing. It is running on RHEL. The Test environment adds session replication, is located in a DMZ with public access and uses an Oracle database. The differences in the development and great environment increase reliability and performance. They also make the test environment extremely similar to the Production environment.

Data migration between environments quickly became important to synchronize
configurations and reduce overall effort. We use built in features of Liferay to migrate documents, content and configurations.

To migrate web content as well as documents, we export and import LAR (Liferay ARchive) files. They're easy to use, however sometimes we notice issues with improper migration and have to repeat the export and/or import. Issues we've encountered include missing content and permissions.

You can use staging to migrate pages and other configuration, however we we are using the database migration (Control Panel -> Server Administration -> Data Migration) in Liferay. Data migration is a bit more work to use, but right now we aren't synchronizing individual pages -- which is the main advantage of staging -- we generally want to synchronize everything.

The headaches that go along with database migration are having to initialize the target database (shutting down the instance of Liferay that's using the database, and deleting all tables within the database) and then restarting both Liferay instances. It is much more powerful though, allowing us to migrate from any environment to another. We find that it is extremely useful to migrate from Development or Test to our local environment during integration and testing. It will also be important to migrate from production to test, development or locally when debugging and trying to recreate problems that occur in production.

To aid with data migration we switched from using the default hypersonic database to using MySQL in all the environments before test. The Test and Production environments are using an Oracle database. Using mysql was done for several reasons.

Allows database backups and quick restores because the database does sometimes become corrupt. We have nightly backups of our local and development databases.
Allows database migration to and from any environment
Allows direct access to the database for debugging and clearing lock records (Lock_ table)

Originally we were hosting Liferay on Glassfish, but we ran into several issues including session replication and data migration. Since then we have switched to using tomcat as our application server, and have not had any application server problems since.

So for now, we have a pretty complete development strategy with respect to staging, backups and data migration. We did not arrive at this state right away. Several of the decisions were made during development in an attempt to streamline processes, reduce effort, improve efficiency and end up with a more easily maintainable end product; something that our team is constantly working to achieve.

2013-06-08

RSS - Replacing Google Reader with ifttt and email

I used to be a Google Reader user, and used it religiously. I used the web interface, and had NewsRob on my phone. It was great. Where ever I was, I had access to all my news feeds, and never missed an article. I had my feeds nicely organized by subject, and kept on top of them. Checking my feeds was a regular part of my day, quickly skimming titles for articles I was interested in and archiving items using other tools so that I could read them later when I had the time.

When I heard that Google Reader was getting axed I wasn't sure what I'd change to. I tried a couple of alternatives that people posted (ex. Lifehacker's Five Best Google Reader Alternatives). Specifically Feedly and NetVibes, but they weren't quite what I was looking for. NetVibes is good, but I had periodic synchronization issues (entire RSS feeds would become unread) and it only syncs 100 items at a time, so it isn't that useful offline.

I continued to use NetVibes for a month or so, and then it clicked. I was interacting with feeds a lot like email (and in fact Google started using the same UI for both Google Reader and Gmail). So I figured why not just develop something that will convert RSS feed items into emails, then just use whatever email client I want to manage the articles. Some of the advantage of managing feed items as emails is that:

Email addresses are free
Email is sticking around and commonly used
There are numerous email clients
A lot of people are working on improving email clients and managing emails

I had developing a daemon that read RSS feeds and generated emails based on them on my to do list for a while. I knew it wouldn't be too difficult to do, because I had developed daemons that generated emails previously (I wrote a simple one in Python that checked the stock status of the Nexus 4 every 5 minutes and sent me an email when it changed).

The much easier solution, which I came to before wasting development work, was to use ifttt. I already use it to forward specific G+ posts to Yammer, Twitter, etc... and it works great.

I created recipes for each RSS feed I follow. I configured the recipes so that they're easy to manage
within ifttt and the emails are easily manageable. Here are the steps I followed to get setup:

Sign up for a gmail account
Sign up for an ifttt account
Create a recipe for each RSS feed

Add RSS tag to recipe description (#RSS)
Add RSS feed name to recipe description
Trigger: RSS feed (need the URL)
Action: Send an email, prefix the subject with an acronym for the RSS feed

The prefixes in the subject allow you to quickly look through items and determine the feed they came from. You can also use the prefix to sort/filter incoming emails and redirect them to folders (most email) or add the appropriate labels (gmail).

Archiving items for reading later or reference is very easy to do with emails as well, which is another nice benefit.

Some final notes:

You can use an existing email address. If you do, I suggest adding something in the subject line of the email that allows you to create a custom filter to separate the feed items from normal emails.
Keep the prefixed used in the subject short so that it doesn't dominate the subject line space when looking at the emails in your email client (especially when using a small screen like your phone)

2013-03-05

Remote Contractors - Lessons Learned

1. Executive Summary

A customer contracted us to update one of their applications. Since the project was not technically challenging and involved a significant amount of repetitive effort, outsourcing was deemed to be low risk.

The project trials outsourcing development effort to remote contractors in an effort to better understand what is involved and when outsourcing is beneficial. To reduce development costs and maximize profits, as well as the highly repetitive nature of the project, it was decided that some of the development effort would be outsourced.

The following describes project details and focuses primarily on lessons learned while working with remote contractors.

2. Introduction

2.1. Objectives

The purpose of the project was to upgrade an existing application. The application utilizes a couple of different technologies and programming languages. The required updates were clearly defined by the client.

2.2. Approach

The approach was to outsource part of the project to contract developers based in India. This was done as a trial, with the desire to exploit outsourcing in future projects. The project was selected as an ideal candidate for outsourcing because the effort and tasks were well known and low risk.

It was decided to outsource as much of the development as possible, while keeping any complicated development effort internal. This was done because the design pattern for a significant amount of the effort is quite straight-forward, so it can be easily documented, taught to contractors, and reviewed for adherence.

Other development effort was kept internal because there was a desire to increase the knowledgebase of developers within the company, adding to their familiarity with the development environment and applications. For the project, it was important to keep the knowledge base and experience in-house, so specific less repetitive tasks were not outsourced. We also wanted to keep the more interesting and challenging work internal to benefit employees and to ensure it was done properly.

3. Lessons Learned

There are several lessons learned from the project, including ones specific to outsourcing effort.

3.1. Development Environment

Due to the limited overlap of business hours, it’s very easy to waste time getting development environments working. Originally, the plan was to host a couple of Virtual Machines (VM) locally and get the contractors to remotely access them. This would allow us to tightly control and monitor the development environment used by contractors, as well has limiting their access to the end customer’s source code. However, this did not work at all. The latency that the contractors were observing made development extremely difficult. Typing and mouse movements were likely delayed by approximately one-third of a second (ping www.pluggd.in).

To eliminate the latency issue, VMs were packaged and made available to the developers. In my opinion, remote contractors have to use local development environments for a majority of their work to be productive.

I highly recommend using VMs to encapsulate the development environment. If this approach is taken, the following steps can help the process go smoothly:

Have the VM packaged and available well before the start date of the contractors.
Transferring it will take a significant amount of time.
If possible, get them to transfer it over the weekend to minimize wasted hours
Contractors should use a user with the minimum amount of permissions required
Thoroughly test the VM remotely to ensure it will work for the contractors right away, and for the entire project. It is difficult to make changes to the VM after the fact

If a development environment is going to be provided to contractors that they can install, provide a detailed step-by-step installation guide, making it as simple as possible.

3.2. Productivity and Task Assignment

Initially, assume that contractors require very specific tasks that include detailed descriptions. Concisely specifying tasks minimize assumptions and will ensure that they are completed as desired. Assumptions likely differ due to differences in education, culture and project familiarity. After a while, once becoming more comfortable with the work and capabilities of the contractors, the level of detail required in descriptions can probably be reduced.

Assume that contractors will take 3-5 days to get familiar with new development patterns. During this period, progress will be slow. Once familiar, the contractors will reach their full potential and productivity will peak. This is most relevant for highly repetitive tasks.

From experience with this project, it’s safe to assume that contractors will work slower and will be less efficient than local resources. Without doing significant investigation into the numbers, it’s assumed that the contractors were approximately 75% as productive. This could be due to various reasons such as: language barrier; no direct supervision outside of their company; individual capabilities. An exact number could be calculated by analyzing the estimated and actual hours of work items for various resources tracked in Rational Team Concert (RTC).

Expect to spend 30-60 minutes per day reviewing the work of the contractors to ensure quality. Initially, 2-3 hours of review was required because there was a lot of feedback required. This review time decreased to 30-60 minutes once they became familiar with the design pattern.

RTC was used to manage work items and as a code repository. Although RTC has tools and workflows that are designed for reviewing and accepting developer’s code into the repository, only the reviewing functionality was used. The reviewing functionality allowed us to efficiently examine proposed changes, including source code diffs and syntax highlighting. The ability to accept and reject submitted source code changes would have been useful. However, it was not used due to lack of familiarity with RTC.

Due to the learning curve, keeping contractors working on similar and repetitive tasks was most efficient. Each time a work item with a different work flow was started there was a learning curve that had to be overcome. This included the contractor being less productive, as well as requiring more supervision, review and feedback. Expect the learning curve to be larger than with local resources, likely due to less overlap in business hours, indirect contact and a language barrier causing miscommunications.

Be wary whenever a new task or workflow is being performed by contractors. Review the results closely. The contractors tended to be good at repeating tasks, however struggled with new tasks that required undefined design patterns. For example, they struggled with extending development patterns to other source code languages. Design patterns had to be clearly defined and documented. Then, the contractors were trained and their initial work closely monitored.

Getting the contractors to do testing and validation work didn’t work very well. They missed some relatively obvious defects introduced during the updates, both in the user interface and the business logic of the application. It must be noted though, that they weren’t pushed very hard or given a lot of time for this work. Given more time and direction, they may have been more successful. Last, they may have had difficulties with testing and validation work because they didn’t fully understand the applications. Further training would have helped with this.

3.3. Communication

The contractors were provided with a class and methods that encapsulated most of the changes required. This greatly simplified their training, development workflow and effort. It also minimized their changes, which in turn reduced the possibility of introducing bugs, straying from the design pattern, and our source code review effort.

A design document was also provided to the contractors, clearly detailing their aspect of the project, development environment and their work flow. Examples of all the source code that required updating was provided in the document. However, the design document alone was not enough. A screen sharing session with a Skype conference call was used to step through the workflow they were to follow. The screen sharing session was definitely required, and was likely more effective than the design document.

As it is with managing any inexperienced or unfamiliar resource, it was important to frequently follow up with them, ensure that they are working on tasks, that they were keeping busy and hadn’t hit a roadblock. Often roadblocks were easily solved by a second set of eyes, and resources tended to avoid asking for help or advice and remain unproductive. It was made clear to contact us whenever running into issues and/or if they had nothing to work on. This was fundamental to keeping the developers productive. Note that it is still important to regularly check up on them because sometimes they don’t realize the problem, or continue working without realizing they are astray.

It is important to provide all possible contact information, and the situation to use each communication path. Several methods of communication were provided, including phone, Skype calling, Skype IM, email and comments on work items in RTC. Each has advantages and disadvantages, and it’s important to use them effectively and for the contractors to feel that you’re easily accessible.

At times the contractors chose the wrong method of communication. Therefore, it’s important to correct them ASAP and clearly define the method of communication that is appropriate for each situation or problem. For example, adding a comment to a work item is great and highly recommended for documentation purposes. However, if the contractor is blocked and can’t continue working until the comment is resolved, adding a comment to a work item is not adequate. Feedback on the comment could be delayed, leaving the contractor waiting and unproductive. Therefore, some other form of communication that guarantees immediate feedback should also be used.

3.4. Online Project Management Software

Online project management software is essential when working with remote contractors. To properly manage resources and keep them productive, documenting, assigning and queuing tasks is required. Documenting the tasks reduces the need for direct communication, and queuing tasks ensures there is a backlog of work to keep resources busy. Both are important because there is a reduced number of overlapping work hours that allow direct communication regarding issues and new tasks. Maintaining a queue of tasks keeps developers busy even when there is not the time to assign more work due to higher priorities and distractions.

Repositories of tasks that include details and estimated effort, are assigned to developers and are regularly updated also allow managers to monitor progress and productivity on their own time without interfering with developers. It is also a relatively good documentation tool.

It is good practice to maintain a queue of tasks for resources to work on so that if they hit a roadblock and others are not available to help remove the roadblock, developers will not run out of tasks to work on. They can document the roadblock and request help, then move onto another task while waiting for feedback. If appropriate, it also allows developers to select tasks from a shortlist of equal priority tasks. This gives them a bit of variety and empowerment, helping to keep them interested in the project and work when they need a change of pace.

When using online project management software, it is imperative that all resources involved in the project use the repository and keep it up to date. Otherwise the repository becomes dated and useless, and can’t be used for documentation purposes.

The online project management software used for the project was RTC. It was more than adequate for the project, and provides several features that were not exploited. It was only used for source control and to create and manage work items. Arguably, it was overkill for the size and requirements of the project, and at times seemed slow and cumbersome.

Some alternatives to RTC are Redmine, Trac and Trello. Redmine and Trac are both open source free alternatives that can easily be self-hosted. They provide the ability to manage and schedule tasks, as well as integration into external source code repositories. Trello is a hosted solution that seems to be well suited to Agile development. The interface is very intuitive and easy to use, although it does lack effort tracking, and general reporting and status features. Last, since Trello is not internally hosted, before using it you should review their privacy statement and avoid including sensitive information.

4. Conclusion and Recommendations

There is a lot of potential in reducing costs by outsourcing development to remote contractors. However, overlooking the added management and learning curves could be costly. For the project involved, the development outsourced was primarily focused on updating an existing application, which involved relatively simple work flows and the development was highly repetitive. By doing so, management, learning curves and code reviews were minimized. The contractors were involved with a very limited amount of development that was more technically challenging, so it’s difficult to determine how effective outsourcing this type of development would be.

Due to latency issues, it should be assumed that remote contractors must develop using local environments. The communications latency present when contractors access remote environments is far too great to be productive. So that a project utilizing outsourcing starts more smoothly in the future, a complete development environment should be created, verified, packaged and delivered to the contractors prior to their starting date. Development workflows and expectations should be well documented and provided to the contractors as well. Furthermore, these should be reviewed with each contractor via a phone or Skype call, and a screen sharing session.

Online project management software is a necessity due to the geographic distribution of the project team and the limited overlap of business hours. It allows assigning tasks to developers, creating backlogs of tasks to keep developers busy during non-overlapping business hours, along with the ability to clearly define implementation details. This allows managers to keep resources tasked throughout the project.

2013-01-18

Tips to Maximize Productivity - Cygwin

Cygwin is a project of a collection of tools that provides a Linux console based environment to Windows. It is free software.

When internationalizing an application, matching keys in the source code with lines in properties files is a relatively large task. Going through the entire application for each language, and ensuring all properties referenced in the source code are also listed in the property files is time consuming.

An even more quick and efficient method of finding missing keys is to write scripts to parse the source code. Scripts that parse the source code for keys and match the keys with keys located in the property files can be written using Cygwin and some creativity. Cygwin provides a command prompt that gives access to a collection of common Linux/UNIX tools. By employing tools including find, xargs, cat, grep, sed, wc, sort and diff, mismatched keys can easily be found. This includes keys either located in the source code or property file and not in the other, as well as duplicate keys.

These tools can be used for other tasks as well:

Searching projects for strings
Counting instances of specific strings for estimating effort
Listing files that require translation so that they can be added to tasks/work items

Calculating line counts of the files to estimate effort

Generating repetitive script files that apply changes to a database
Mass renaming of files

An example of one of a script that I created, full of relatively complicated commands to get keys used within properties files is the following, where getEscapedHTMLString and getEscapedJavaScriptString are methods that are called every time the properties file is accessed. It took several attempts to get correct and was built slowly piece-by-piece. The result is a sorted list of all the property keys referenced in code.java. I added echos throughout to display progress.

#!/bin/bash
# copy Java files
echo Cleaning local directory...
rm -f *.java .properties PROPERTY_* JAVA
echo Copying Java source files...
find /cygdrive/c/src/ | grep "java" | xargs -I file cp file .
rm Internationalizer.java
# copy properties files
echo Copying properties files...
cp /cygdrive/c/src/XXX*.properties .
# get all property entries from Java source into file JAVA
echo Parsing Java source files...
cat *.java | sed "s/in\.get/\nin\.get/g;s/\")/\")\n/g" | grep "getEscape\|getPlain" | sed "s/in.getEscapedHTMLString//g;s/in.getEscapedJavaScriptString//g;s/in.getPlainString//g;s/(\"//g;s/\"[ ,a-zA-Z0-9)]*)[\;]*//g" | sort > JAVA
# get all property entries from property files into file PROPERTY
echo Parsing properties files...
cat PIOS_en_CA.properties | sed "s/=.*//g" | sort > PROPERTY_EN
# Output properties missing in properties file
echo Missing English Properties $PIOS_en_CA.properties$
diff --ignore-blank-lines --ignore-all-space JAVA PROPERTY_EN | grep "<"
echo \*\* PROPERTY KEYS THAT ARE USED MULTIPLE TIMES IN JAVA SOURCE WILL BE REPORTED \*\*
echo Removing copied files...
rm -f *.java .properties PROPERTY_* JAVA

I used this script frequently to match keys referenced in the source code with property file keys. This ensured I wouldn't get exceptions during execution, and quickly scanned for common typos.
It's also really useful to estimate effort. In the same project, translation effort was estimated by the number of lines of code that needed internationalization. I took the complete list of files and broke them down into functional categories to make testing/verification easier. Then, to quickly determine the number of lines (which was later converted to effort in hours), a ran the following which outputs the total number of lines:

cat file1.java file2.java file3.java file4.java | wc -l

The last, and another relatively simple example is breaking a massive log file into pieces. A client didn't have rotating logs implemented on their server, which resulted in a 2GB log file. Obviously the log file is much to large to open and analyze as is. However, by using split it's really easy to break the file into multiple manageable files.
The nice thing is, all of this is command-line based, so it's resource efficient and generally won't tie up your computer. It's also extremely easy to save bash scripts for reference or repeating execution.