Saturday, May 3, 2014

The offshoring cost delusion

At face-value it is always going to be more cost-effective to perform labour, irrespective of the industry, in countries where the labour cost is cheaper. It’s a fact that is hard to argue against, almost impossible really, and it has been happening for years in a large number of industries such as manufacturing and banking. If you can charge out a resource at a third of the price of an onshore resource then it is a compelling proposition not just for you, but for your customers as well. With respect to IT, offshoring managed service work, packaged software implementations, helpdesk and call centre operations (the kind of work that is highly governed by prescriptive processes and is accompanied with numerous repeatable, measurable steps i.e. no need to “think” so much as to just “do”) it usually will be. That’s the proposal being sold by companies that can offshore and you really you can’t argue with it. But when that same line is used for bespoke software development the situation is different. Very different.

Bespoke application development requires a number of different set of skills and qualities – that aren’t just confined to an organisations or teams competencies with a given set of technologies required to deliver a solution. It requires you to think laterally, logically, and in entirety with respect to the solution being built. You need to be able to think outside the box to get things done, either as a developer, tester or architect – there are no distinctions. This is because there are always some grey areas and hurdles to overcome and that means it can be unpredictable. So therefore a lot of what governs and underpins your ability to deliver successfully is the processes that prescribe how you identify, resolve and future-proof these areas of uncertainty.

In the bad old days of waterfall this uncertainty would always freak people out and because you released in a “big bang”, the architecture and design tended to be gold-plated inside and out from the start (whether it was always necessary to do this or not), and it would be accompanied by endless reams of documentation that would get reviewed once, and then get dropped somewhere in a CMS never to see the light of day again. The really big waterfall projects would hardly ever make a delivery date due to the big bang approach of deployments as getting it installed correctly in the testing and production environments could waste days/weeks getting everything configured and integrated. Then Agile came along and we started to see the light that little bit better by releasing smaller and building integrity into the product.

When application development work started to get moved offshore in Australia around the early 2000’s, thanks to a big push from the banks and the big telcos, every problem we faced doing traditional waterfall got exacerbated because, despite the best efforts of the onshore teams, the offshore teams would struggle to understand every requirement due to the highly complex nature of the systems that had been produced. Architecture gold-plating (because of the big bang unpredictability), and the need to wade through all the documentation to understand what the hell was going on, confused people a great deal. It would take months to understand it all and the offshore teams were (stupidly) expected to come up to speed in a matter of weeks – as promised by the glittering powerpoint presentations of their high-end consultants. They never did of course – no-one could, and a big part of the “sell” of these big offshoring companies, which was their sophisticated training centres that claimed they could train up resources to meet demand anywhere in the world, sounded as ridiculous now as it did then.  



I worked on a project such as this once, here is my story

The architecture of the system I worked on was highly complex. Ridiculously so. It was probably the worst candidate for a system to be offshored that I have seen in all my years working in IT. When the system was first transitioned offshore it collapsed in spectacular fashion because the offshore teams did not have the domain nor the extensive technology experience to implement or understand the system and how to change, enhance and support it correctly. This caused all the delivery dates for new initiatives to be missed, everyone worked late nights and after a few months the hatred towards the company, management and the poor offshore resources themselves (who were doing their best), by the onshore resources was not even trying to be kept hidden. The client was furious at the missed dates and the costs blew out badly. In this early stage numerous people had to travel to India to rescue the project at great cost to the company. In short every promise of improved quality, productivity and cost was broken.
As the onshore teams got so badly burnt by the late nights making up for offshores mistakes, it decreased workplace morale and to make matters worse, the company started cutting costs due to compensate for the project cost blowouts as the schedules started to get missed as the money wasn’t coming in. Even beers on Fridays were cancelled (in an Australian work-force that is like inviting mutiny upon yourself), and morale got worse and worse.

Down the rabbit hole of fail we go....

Because the offshore teams were failing so badly, causing much finger pointing and blaming on both sides of the fence, the architects and designers (under a considerable amount of pressure) started to gold-plate their systems and designs even more to remove any traces of ambiguity – and to cover themselves in the face of any management wrath. The designs became so complex that one of the architects years later remarked, “we made it so generic it actually became unusable”! As a result estimates to deliver the software doubled, even tripled, to compensate for the delivery problems as no-one was crazy enough to make promises based on the pre-offshoring days estimates.

The client started to get even more upset at the cost blowouts (because they were promised things would get cheaper) and managers, who had already worn out any goodwill to their architects and tech-leads could not get anyone to budge on estimates so under pressure they started over-promising to the clients hoping, praying, things would improve.

And of course they didn’t.

Now in a total panic senior company management implemented more knee-jerk responses to stem the cost bleeding because they were losing control. To appease investors alarmed at the spiralling costs they made even more cutbacks and produced lots of spin both internally and externally to paint a better picture of the situation in an attempt to “sell” the new offshore delivery model and pretend things were going well. This was of course in total contrast to the experience on the ground and made the whole situation even worse as it bred a massive amount of contempt amongst those doing the work. Even anonymous blogs were created that mocked the company and told many home truths that the company scrambled to get shut down and blocked by ISP and network providers! The environment became incredibly toxic and hostile and people left the company in droves fed up with how badly everything got managed. Those that were left jumped back on the plane for extended stays in India but then had to fight tooth and nail to claim back expenses incurred as the company kept cutting more and more corners to compensate for cost-blowouts requiring digging into more and more capital.

It was an absolute nightmare and mis-managed from the top to the bottom. The initial cost-cutting set the scene for failure and as we went further down the spiral more cost cutting just exacerbated the issues.



So how do you get it to work?

There is a lot of literature and great publications now around the DevOps movement. A movement borne out of Agile to wholly integrate development and operations into a single functioning flow-through work process that enables fast and quick integration of code, automation of virtually everything and continuous one-click deployments to name but a few desirable capabilities. If you aren’t DevOps capable don’t even think about doing bespoke application delivery out of another country. For a start you can’t be as Agile as you can be onshore so you’re already at a disadvantage when it comes to team collaboration and working closely with the business. Also if offshore start spruiking the cheaper labour costs as a reason to not investing up-front in DevOps practices such as automation testing, code quality checking, continuously deployable pipelines and coded-infrastructure as they’ll just “assign cheaper resources to review and validate manually” then start scanning the job classifieds because you are going to go down.

Move it all to the cloud and automate the shit out of it

You have to invest in your offshore delivery centres, moreso than onshore and you have to do it well. For a start avoid physical infrastructure and put all your development and testing in the cloud. Before you do that ensure all your servers and environments are coded up. That means you can restore and rebuild entire environments from the click of a button. It is not impossible, yes it will take time, but the long term benefits will protect and insure you against failure. This must be completed and in-place weeks before development even starts so the environments are completed, properly monitored, tested and cost-forecasted.

Hire properly offshore

No graduates.

If the people on your delivery project do not have a minimum of 5 years technical delivery experience then do not hire them. In some countries there is a culture of “it’s okay, they got good marks, they’ll learn as they go” and in IT this theory simply does not work. You need delivery experience, you need to be technically proficient and you need to have experience working with other cultures in other countries. Good people cost money, offshore is no different, but offshore it is even more critical because you really need smart and motivated people that can think on their feet.

Automate quality checking to the nth degree

And that means everything. Unit Test Code coverage above 90%, code metrics checking for cyclomatic complexity, lines of code, class coupling, maintainability and anything else you can throw at it. This all has to be automated on check-in. Furthermore database script changes should be tested against database rebuilds, selective automation test execution for critical end-to-end functions should also be thrown in.

Use code reporting tools in conjunction with build and code management software as much as you can. Produce reports and get them delivered to delivery managers daily so they can see how things are tracking.

Test the hell out of it often

Full suite of automation testing should run every night. As a minimum. Automation testers should start on day one as should DevOps people. No project should be without this. The capability to extend and support must work hand-in-glove with development and functional testing to amplify the feedback loops and keep everything rigorously tested

Get onshore and offshore passports and visa’s ready

People need to travel and they need to travel often. If you want to have any hope of doing offshore delivery successfully you have to do this. There should be back and forth travel occurring every two weeks for the duration of the project. Extended stays should not happen, two-week stints are enough. This will ensure the teams are always working together and avoids the “us and them” situation from occurring. The more people work together, the better they work together and travel should be happening from the topmost level of PM’s through to the low level.

Do not avoid doing this. Do not think that travel at the start of the project to cement the understanding is all that is needed with travel at the end catered for to “bring it all home”. This does not work because there is all that time in the middle where the problems occur. I have seen this situation occur numerous times.

Summary

Cost-cutting up-front without investing properly in transitioning, training, infrastructure etc… is incredibly stupid when offshoring application development and it is even more disastrous when performed on projects in-flight when things start to go bad. Offshoring is not going to be cheap in the short term, and it is not going to be cheap in the long-term if you try and cut corners. You have to invest and invest heavily to get it to work from day zero in every area.

In my experience the developers, testers, DevOps and architects (mostly) get the need for all this extra overhead, both onshore and offshore. And it is not a one-off thing. Doing this is a cycle of investment, from project to project, that must be repeated constantly, to refine and enhance so that you delivery standards are continuously improved. The first project will be the worst, it will cost you way more than you planned but if you stick with it and maintain the investment and support the costs will start to come down. You have to retain this focus, the more you keep up the focus on ever improving quality the faster and faster you will get. Doing bespoke application delivery between countries is not easy and it is not, and never will be, fundamentally cheaper. As soon as you realise this, and budget your projects accordingly to address it, you will have a hope of succeeding.

Friday, April 18, 2014

Offshoring - lessons and observations from the coal-face

IT is all about adaptation and change, if you can’t handle it, you’re in the wrong industry.

October 2013. My fifth trip to India and like my previous trips it was as eventful, colourful and fun as ever. I really love visiting this country, the people are really cool, the food hot and the culture vibrant. It’s a great place to visit for work in IT and I have had many wonderful experiences being there. This following series of blog posts will deal with some of my major observations of working with IT Delivery projects in India, and the challenges and observations that I have made over the last 10 years being involved with them from initially as a .NET/COM developer through to Solution Architecture and DevOps

First up let me say that there are lot of great things about offshoring that work and work very well. Managed services work, call centres and help-desks are great candidates for offshoring because, by and large, the processes they follow are (usually) well-defined and the steps to resolve and assist with issues are highly prescriptive. Application delivery however is a very different beast and this is what I am going to concentrate on.

In a typical IT application delivery project you are always going to require some base, non-negotiable fundamentals for success such as:
  • Strong management support to streamline and plan work
  • Close collaboration and good working relationships between teams – in Agile that goes without saying
  • Good architecture and well-defined requirements – complemented by governance processes to manage gaps and change.
  • People who can think laterally to solve issues and solve them early, quickly and efficiently
  • Good investment in development and environment infrastructure to ensure there are no excessive downtimes waiting for software to build and deploy.
  • An embedded culture of continuous improvement and automation to future-proof against risk and maintain a high quality of work
  • Plenty more but that will do for now.

Running a delivery project offshore is no different however it is amazing at how of all the items I listed above, how poorly done they can be when their importance is scaled done or the corners end up getting cut in an effort to rein in costs.

Cutting corners on an IT Delivery project is always be a very stupid thing to do, but this is exacerbated so much more when you throw offshore delivery into the mix. Simply put if you believe that being able to deliver software offshore for your customers will be fundamentally cheaper for you as a business than doing it onshore, or you can cut corners such as not invest heavily in supporting infrastructure and staff, reduce travel budgets, or not hire experienced architects, testers, developers and DevOps staff because people will just “work harder” or “work smarter” then you are delusional.

One of the most important things to ensure success is the implementation of effective communication between teams and the right infrastructure to support it such as video conferencing, tools like Lync and frequent travel back and forth. Failure to do this creates an “us and them” culture that spreads like a virus. If it is not addressed, the environment can turn very toxic as the bad blood spreads from the teams at the lower end of the spectrum, who are usually the most fearful of losing their jobs, bubbling up to the top causing reactionary and defensive tactics on both sides to avoid accountability



Typically the end results of this are onshore teams will try their hardest to put down, discredit and deliberately (in extreme cases) sabotage work done by offshore. Conversely offshore teams will become hell-bent on proving how much better they are than the onshore teams by constantly raising issues, escalating even the most trivial problems to management and being deliberately vague and obtuse, even lie, to avoid doing work or taking responsibility.

In short nobody wins, and if these toxic attitudes are left unchecked to fester then all hell breaks loose, projects fail, and people leave the company. I’ve seen numerous cases where onshore staff turn highly aggressive in telephone conferences as the pressure builds, finger pointing emails begin flowing back and forth, and everything descends into an ugly mess. And all this within the same company – thankfully clients were never witness to things like this!

 And it is really stupid because it is so easy to rectify by doing a couple of simple things such as:
  1. Have a lot of onshore/offshore travel so the teams mix and integrate. Once every few months is not enough, people should be flying back and forth every two weeks so relationships are formed and people get comfortable with each other.
  2. Don’t place employees on these projects that have difficulty dealing with offshore teams. For some people cultural barriers are just too much to overcome and they will be affronted with the feeling of being forced to adapt and change “their ways” to suit others. These people need to be weeded out immediately

But considering this stuff has been documented 1000 times over, and a lot better than what I can do, I’ll instead delve into some other areas

Coming up next time: The Cost Delusion

Wednesday, April 2, 2014

AWS, PowerShell and Jenkins – your complete cloud automation management solution

I had the opportunity to set up a complete DevOps architecture for a big onshore/offshore (Australia/India) project recently and amongst the many tasks I was set was that the entire development environment (source control, builds etc..) test environments (automation test, functional test, performance test and showcase) had to be hosted in AWS in Sydney within a VPC and secured.

First up great! This was music to my ears, no more stuffing around with physical machines and fighting death cage matches with support people to get hardware upgraded. I could control the environment, the domain, basically everything.

So over the next 9 months I toiled away and came up with, what I think, was a really good solution, a fair bit of it I detail below. To go into the total ins and outs of it would be akin to rivalling War and Peace so I’ll contrast on the important parts of it, namely how I got the most out of the AWS SDK’s.

The setup

Setting all this up initially took a lot of trial and error. You really cannot do this kind of thing without properly planning how your VPC will be set up. Security Groups, Subnets, routing tables, acls etc… there is a bit to get your head around but having said that this excellent blog post sums it all up nice and quick:  Get your head around that and you’re well on your way to nailing this stuff

After a week and a few long nights we had Active Directory setup, groups and user accounts provisioned, we had come to grips with the Remote Desktop gateway server and the NAT Server. Although at this point we started campaigning long, and hard to get our support team to set up a VPN between the corporate network and our AWS VPC and trust the two domains. Eventually after two months of emails, phone calls, risk escalation, intensive nagging we got 4 hours of the support guys time to set it up. You don’t have to say anything at this point, I know what you are thinking and yes it is true we started saving time immediately.

So cool, we now have AWS VPC set up, I can RDP to the AD machine from my local desktop without needing and I have created a Windows Server 2012 Core image to build all my machines upon.

Next hurdle, how do I manage the infrastructure and categorise it

Experience tells me that if I had of just started creating images everywhere for the whims of developers, testers and architects I would have had a hideous mess on my hands by nightfall. Plus I still needed to set up TFS for source control, builds, project work tracking so of course that means SQL Server too.
So in short I needed a way to be able to categorise my instances to control them – enter the AWS Metadata tags. This very simple feature allows you to simply “tag” an instance with whatever key/value you like. Create 1, create 100 it doesn’t matter. Well creating 100 is probably going to be a pain but you get the idea. A couple of hours of putting thoughts to paper, a meeting and a quick chat and we came up with a set of tags that would categorise our instances.

  • Core – always on, candidate for reserved instances
  • DevInfra – Development Infrastructure – almost always on, 20 hours a day minimum.
  • TestInfra – Testing Infrastructure, on for about 16 hours a day
  • DemandOnly – Demand instances only, manual startup, always shut down every day if running
·        We added a couple more over the journey but these four are certainly good enough to get most stuff off the ground.

So now we have TFS installed, developers are developing, builds are building, delivery managers are setting up work items and…. you get the idea.

Next hurdle, how to automate the shit out of everything so that I keep costs down?

Firstly I did not want to have to worry about checking startup, shutdown, backups, snapshots etc… all day, I needed a way to set up a machine with the right software that enables me to schedule automation jobs, keep a history, work with Windows Operating Systems and the AWS .NET SDK, oh yeah and I didn’t want to pay for it either.

There are a number of ways to skin this pussy cat but I combined a bunch of modularised PowerShell scripts and ran it all through Jenkins 

Why PowerShell?

Because it’s all built on Windows. If you’re not using PowerShell to build up and configure your Windows machines you’re doing it wrong.

Why Jenkins?

I know the product well (always good to stick with known, knowns) and it really is a great tool with good online support. It enables scheduling of jobs that can run virtually anything, it can build software, chain jobs together in pipelines and it has a ton of plugins too. Sure it’s a Java tool but only an idiot would assume you can only look for answers in the Microsoft world.

The end result

After a month of solid scripting and testing I had created enough PowerShell scripts and functions that enabled me to do the following with the instances in my VPC all controlled through Jenkins using the metadata tags

  • Startup and Shutdown of instances
    • Core on all the time, DevInfra on 20 hours per day, TestInfra on 16 hours a day
  • Snapshots – Core and DevInfra snapshots are created every day
  • S3 Database Backups – All my database full backups that ran every night were copied to S3
  • Redundancy – New snapshots created were also copied over to the US West Region every night
  • Environment rebuilds – Cloud Formation scripts ran every night to rebuild the test environments so we had a totally clean machine to deploy to daily
  • AWS Cleanup - I created jobs to clean up S3 and instance snapshots once they got older than a couple of weeks

The best part about this solution was that if we added new instances we just tagged them appropriately and then all the maintenance of the startup, shutdown, snapshotting took care of itself. Matter of fact I stopped looking at it after a couple of months as it all ran like clockwork.

We even got really clever with it such as shutting down TFS Build Agents when the amount of queued builds was low through polling the TFS API services and then starting them back up when the builds queued up and so on. We also extended Jenkins to do software builds for the purpose of running Sonar over the top of them and then also to create deployment pipelines so that the testers could self-serve their own environments.

Couple of things I learned along the way

CloudFormation can be a pain in the butt for Windows, when it works for you it’s beers all round, when it doesn’t you’ll be swearing long and hard into the night getting that goddamn, effing server to join the friggin domain! And yeah if you’re using SharePoint don’t use it. Matter of fact find the guy that recommended this as part of the technical solution and slap them, it’s a horrendously painful and complicated product to set up and it does not play nice with CloudFormation or re-attaching volumes from snapshots either. SharePoint – I hate it. There I said it.

Conclusion

Using AWS for your dev and testing is an absolute win over the more traditional methods (physical servers - arrgh!, VMWare - Hosts forever running out of capacity).

Sure it will take time and investment in your DevOps staff to plan for and use it appropriately (show me a new infrastructure technology that doesn't need it), but the payoff in being able to completely automate your environments, increase/decrease resources as you need, scale up instances (such as your build servers when they are starting to run hard) and the lower TCO is impossible to ignore. Best of all once you have done it once, and done it properly, you can reuse a lot of what you created for other projects and clients.