Wednesday, April 2, 2014

AWS, PowerShell and Jenkins – your complete cloud automation management solution

I had the opportunity to set up a complete DevOps architecture for a big onshore/offshore (Australia/India) project recently and amongst the many tasks I was set was that the entire development environment (source control, builds etc..) test environments (automation test, functional test, performance test and showcase) had to be hosted in AWS in Sydney within a VPC and secured.

First up great! This was music to my ears, no more stuffing around with physical machines and fighting death cage matches with support people to get hardware upgraded. I could control the environment, the domain, basically everything.

So over the next 9 months I toiled away and came up with, what I think, was a really good solution, a fair bit of it I detail below. To go into the total ins and outs of it would be akin to rivalling War and Peace so I’ll contrast on the important parts of it, namely how I got the most out of the AWS SDK’s.

The setup

Setting all this up initially took a lot of trial and error. You really cannot do this kind of thing without properly planning how your VPC will be set up. Security Groups, Subnets, routing tables, acls etc… there is a bit to get your head around but having said that this excellent blog post sums it all up nice and quick:  Get your head around that and you’re well on your way to nailing this stuff

After a week and a few long nights we had Active Directory setup, groups and user accounts provisioned, we had come to grips with the Remote Desktop gateway server and the NAT Server. Although at this point we started campaigning long, and hard to get our support team to set up a VPN between the corporate network and our AWS VPC and trust the two domains. Eventually after two months of emails, phone calls, risk escalation, intensive nagging we got 4 hours of the support guys time to set it up. You don’t have to say anything at this point, I know what you are thinking and yes it is true we started saving time immediately.

So cool, we now have AWS VPC set up, I can RDP to the AD machine from my local desktop without needing and I have created a Windows Server 2012 Core image to build all my machines upon.

Next hurdle, how do I manage the infrastructure and categorise it

Experience tells me that if I had of just started creating images everywhere for the whims of developers, testers and architects I would have had a hideous mess on my hands by nightfall. Plus I still needed to set up TFS for source control, builds, project work tracking so of course that means SQL Server too.
So in short I needed a way to be able to categorise my instances to control them – enter the AWS Metadata tags. This very simple feature allows you to simply “tag” an instance with whatever key/value you like. Create 1, create 100 it doesn’t matter. Well creating 100 is probably going to be a pain but you get the idea. A couple of hours of putting thoughts to paper, a meeting and a quick chat and we came up with a set of tags that would categorise our instances.

  • Core – always on, candidate for reserved instances
  • DevInfra – Development Infrastructure – almost always on, 20 hours a day minimum.
  • TestInfra – Testing Infrastructure, on for about 16 hours a day
  • DemandOnly – Demand instances only, manual startup, always shut down every day if running
·        We added a couple more over the journey but these four are certainly good enough to get most stuff off the ground.

So now we have TFS installed, developers are developing, builds are building, delivery managers are setting up work items and…. you get the idea.

Next hurdle, how to automate the shit out of everything so that I keep costs down?

Firstly I did not want to have to worry about checking startup, shutdown, backups, snapshots etc… all day, I needed a way to set up a machine with the right software that enables me to schedule automation jobs, keep a history, work with Windows Operating Systems and the AWS .NET SDK, oh yeah and I didn’t want to pay for it either.

There are a number of ways to skin this pussy cat but I combined a bunch of modularised PowerShell scripts and ran it all through Jenkins 

Why PowerShell?

Because it’s all built on Windows. If you’re not using PowerShell to build up and configure your Windows machines you’re doing it wrong.

Why Jenkins?

I know the product well (always good to stick with known, knowns) and it really is a great tool with good online support. It enables scheduling of jobs that can run virtually anything, it can build software, chain jobs together in pipelines and it has a ton of plugins too. Sure it’s a Java tool but only an idiot would assume you can only look for answers in the Microsoft world.

The end result

After a month of solid scripting and testing I had created enough PowerShell scripts and functions that enabled me to do the following with the instances in my VPC all controlled through Jenkins using the metadata tags

  • Startup and Shutdown of instances
    • Core on all the time, DevInfra on 20 hours per day, TestInfra on 16 hours a day
  • Snapshots – Core and DevInfra snapshots are created every day
  • S3 Database Backups – All my database full backups that ran every night were copied to S3
  • Redundancy – New snapshots created were also copied over to the US West Region every night
  • Environment rebuilds – Cloud Formation scripts ran every night to rebuild the test environments so we had a totally clean machine to deploy to daily
  • AWS Cleanup - I created jobs to clean up S3 and instance snapshots once they got older than a couple of weeks

The best part about this solution was that if we added new instances we just tagged them appropriately and then all the maintenance of the startup, shutdown, snapshotting took care of itself. Matter of fact I stopped looking at it after a couple of months as it all ran like clockwork.

We even got really clever with it such as shutting down TFS Build Agents when the amount of queued builds was low through polling the TFS API services and then starting them back up when the builds queued up and so on. We also extended Jenkins to do software builds for the purpose of running Sonar over the top of them and then also to create deployment pipelines so that the testers could self-serve their own environments.

Couple of things I learned along the way

CloudFormation can be a pain in the butt for Windows, when it works for you it’s beers all round, when it doesn’t you’ll be swearing long and hard into the night getting that goddamn, effing server to join the friggin domain! And yeah if you’re using SharePoint don’t use it. Matter of fact find the guy that recommended this as part of the technical solution and slap them, it’s a horrendously painful and complicated product to set up and it does not play nice with CloudFormation or re-attaching volumes from snapshots either. SharePoint – I hate it. There I said it.


Using AWS for your dev and testing is an absolute win over the more traditional methods (physical servers - arrgh!, VMWare - Hosts forever running out of capacity).

Sure it will take time and investment in your DevOps staff to plan for and use it appropriately (show me a new infrastructure technology that doesn't need it), but the payoff in being able to completely automate your environments, increase/decrease resources as you need, scale up instances (such as your build servers when they are starting to run hard) and the lower TCO is impossible to ignore. Best of all once you have done it once, and done it properly, you can reuse a lot of what you created for other projects and clients. 

1 comment:

  1. Love your blog. Do you have time for a chat about Rubrik, AWS, Azure & APIs? I am based in Melbourne.


    Simon Williams
    Regional Sales Manager at Rubrik
    M +61 432 975 857 E W