Tips: Deploying a web application to the cloud

I am wrapping up helping a client with a build out of a drupal site to ec2. The site itself is a pretty standard CMS implementation–custom content types, etc. The site is an extension to an existing brand, and exists to collect email addresses and send out email newsletters. It was a team of three technical people (there were some designers and other folks involved, but I was pretty much insulated from them by my client) and I was lucky enough to do a lot of the infrastructure work, which is where a lot of the challenge, exploration and experimentation was.

The biggest attraction of the cloud was the ability to spin up and spin down extra servers as the expected traffic on the site increased or decreased. We choose Amazon’s EC2 for hosting. They seem a bit like the IBM of the cloud–no one ever got fired, etc. They have a rich set of offerings and great documentation.

Below are some lessons I learned from this project about EC2. While it was a drupal project, I believe many of these lessons are applicable to anyone who is building a similar system in the cloud. If you are building an video processing super computer, maybe not so much.

Fork your AMI

Amazon EC2 running instances are instantiations of a machine image (AMI). Anyone can create a machine image and make it available for others to use. If you start an instance off an image, and then the owner of the image deletes the image (or otherwise removes it), your instance continues to run happily, but, if you ever need to spin up a second instance off the same AMI, you can’t. In this case, we were leveraging some of the work done by Chapter Three called Project Mercury. This was an evolving project that released several times while we were developing with it. Each time, there was a bit of suspense to see if what we’d done on top of it worked with the new release.

This was suboptimal, of course, but the solution is easy. Once you find an AMI that works, you can start up an instance, and then create your own AMI from the running instance. Then, you use that AMI as a foundation for all your instances. You can control your upgrade cycle. Unless you are running against a very generic AMI that is unlikely to go away, forking is highly recommended.

Use Capistrano

For remote deployment, I haven’t seen or heard of anything that compares to Capistrano. Even if you do have to learn a new scripting language (Ruby), the power you get from ‘cap’ is fantastic. There’s pretty good EC2 integration, though you’ll want to have the EC2 response XML documentation close by when you’re trying to parse responses. There’s also some hassle involved in getting cap to run on EC2. Mostly it involves making sure the right set of ssh keys is in the correct place. But once you’ve got it up and running, you’ll be happy. Trust me.

There’s also a direct capistrano/EC2 integration project, but I didn’t use that. It might be worth a look too.

Use EBS

If you are doing any kind of database driven website, there’s really no substitute for persistent storage. Amazon’s Elastic Block Storage (EBS) is relatively cheap. Here’s an article explaining setting up MySQL on EBS. I do have a friend who is using EC2 in a different manner that is very write intensive, that is having some performance issues with his database on EBS, but for a write seldom, read often website, like this one, EBS seems plenty fast.

EC2 Persistence

Some of the reasons to use Capistrano are that it forces you to script everything, and makes it easy to keep everything in version control. The primary reason to do that is that EC2 instances aren’t guaranteed to be persistent. While there is an SLA around overall EC2 availability, individual instances don’t have any such assurances. That’s why you should use EBS. But, surprisingly, the EC2 instances that we are using for the website haven’t bounced at all. I’m not sure what I was expecting, but they (between three and eight instances) have been up and running for over 30 days, and we haven’t seen a single failure.

Use ElasticFox

This is a FireFox extension that lets you do every workaday task, and almost every conceivable operation, to your EC2 instances. Don’t delay, use this today.

Consider CloudFront

For distributed images, CloudFront is a natural fit. Each instance can then reference the image, without you needing to sync files across instances. You could use this for other files as well.

Use Internal Network Addressing where possible

When you start an EC2 instance, Amazon assigns it two IP addresses–an external name that can be used to access it from the internet, and an internal name. For most contexts, the external name is more useful, but when you are communicating within the cloud (pushing files around, or a database connection), prefer the internal DNS. It looks like there are some performance benefits, but there are definitely pricing benefits. “Always use the internal address when you are communicating between Amazon EC2 instances. This ensures that your network traffic follows the highest bandwidth, lowest cost, and lowest latency path through our network.” We actually used the internal DNS, but it makes more sense to use the IP address, as you don’t get any abstraction benefits from the internal DNS, which you don’t control–that takes a bit of mental adjustment for me.

Consider reserved instances

If you are planning to use Amazon for hosting, make sure you explore reserved instance pricing. For an upfront cost, you get significant savings on your runtime costs.

On Flexibility

You have a lot of flexibility with EC2–AMIs are essentially yours to customize as you want, starting up another node takes about 5 minutes, you control your own DNS, etc. However, there are some things that are set at startup time. Make sure you spend some time thinking about security groups (built in firewall rules)–they fall into this category. Switching between AMIs requires starting up a new instance. Right now we’re using DNS round robin to distribute load across multiple nodes, but we are planning to use elastic IPs which allow you to remap a routable ip address to a new instance without waiting for DNS timeouts. EBS volumes and instances they attach to must be in the same availability zone. None of these are groundbreaking news, it’s really just a matter of reading all the documentation, especially the FAQs.

Documentation

Be aware that there are a ton of documentation, one set for each API release, for EC2 and the other web services that Amazon provides. Rather than starting with Google, which often leads you to an outdated version of documentation, you should probably start at the AWS documentation center. This is especially true if you’re working with any of the systems that are newer with perhaps not as stable an API.

In the end

Remember that, apart from new tools and a few catches, using EC2 is not that different than using a managed server where you don’t have access to the hardware. The best document I found on deploying drupal to EC2 doesn’t talk about EC2 at all–it focuses on the architecture of drupal (drupal 5 at that) and how to best scale that with additional servers.

Technorati Tags: , ,


Setting variables across tasks in capistrano

I am learning to love capistrano–it’s a fantastic deployment system for remote server management.  I’m even learning enough ruby to be dangerous.

One of the issues I ran into was I wanted to set a variable in one task and use it in another (or, more likely, in more than one other task).  I couldn’t find any examples of how to do this online, so here’s how I did it:

task :set_var
self[:myvar]= localvar
end

task :read_var
puts self[:myvar]
end

Note that myvar and localvar need to be different identifiers–“local variables take precedence”.  Also, the variable can be anything, I think.  I use this method to create an array in one task, then iterate over it in another.

Technorati Tags: , ,


Amazon AMI search

It’s interesting to me that there is no Amazon Machine Image (AMI) search.  AMIs are virtual machine images that you can run on EC2, Amazon’s cloud computing offering.  Sure, you can browse the list of AMIs, but that doesn’t really help.  Finding an image seems to be haphazard, via a google search (how I found this alfresco image) or via the community around a product on an image (like this image for pressflow, a high performance drupal).

I’m not the only person with this complaint.  The Amazon EC2 API only provides limited data about various images, but surely some kind of search mechanism wouldn’t be too hard to whip up, if only on the image owner and platform fields.

Does anyone know where this exists?  My current best solution for finding a specific AMI is to use the fantastic ElasticFox FireFox plugin and just search free form on the ‘Images’ tab.

Technorati Tags: , ,


Notes from Tom Malaher’s cloud computing presentation

A former colleague, Tom Malaher, did an online presentation about cloud computing on Mar 11 at the Calgary JUG.  You can view the recording of it now.  It was titled: Cloud Computing and Amazon Web services (AWS), and was a great survey of cloud computing and then a nice dive into AWS.  I used to work with Tom and always enjoy the depth and breadth of his presentations.

Below are some of my notes.

  • This was their first online meeting, due to cash flow issues (lack of sponsorship), and to make it easier for speakers out of the Calgary area.  It was put on using Elluminate.com.  (This client was installed using JNLP; very easy to install and setup).  You can use Elluminate for up to three participants for free (but you cannot record your session).
  • Definition of cloud computing is in tug of war in vendor land.  According to Infrastructure Executive Council, cloud computing is elastic, multi-tenant, on-demand, usage based metering (no long term contracts), self service

Tom outlined a number of variations on cloud computing

  • Infrastructure as a service (s3, ec2)
  • Platform as a service (Google app engine, Microsoft Azure)
  • Software as a service (Google docs, salesforce.com)
  • Grid computing–more homogenous, but lots of overlap

Diving into Amazon Web Services, he outlined all the webservices that Amazon provides.  I had already heard of a number of these, but two caught my eye:

  • DevPay–pass through payment for Amazon Web Services.
  • Public Data Sets–public domain data sets easily available for computation on the AWS platform

Composing AWS services makes sense, since there are no bandwidth charges between Amazon service calls within Amazon’s data centers (e.g. EC2->S3).

He had some interesting figures from the IEC: 70% surveyed are not using cloud computer (40% aren’t even considering it).  Only 10% are hosting an ‘app’ on the cloud (with no definition of an app).  I asked a question of Tom about what is considered an app.  I have a client who is hosting backups and images on s3, and friends who regularly back up servers to s3.  Is that an ‘app’?  I don’t think so, but Tom didn’t have a definition of ‘app’ for this survey.

Tom also did an interesting cost analysis when he was looking at pros and cons for AWS.

The 1and1.com high end hosting agreement: 1gb ram 50gb hd, 2000gb transfer: $59/month.

For a comparable AWS instance, with an ec2 image, 1.7 MB ram, 160gb hard drive (ephemeral), 2000 gb transfer, persistent 50gb hard drive: worst case $479.50/month, but for one day: ~$16.

In my opinion, this is the key con of AWS right now, at least for full fledged applications. It’s simply not cost competitive with some of the hosting you can find out there.

And with regular hosts, you don’t have to deal with as much infrastructure overhead. Tools like ElasticFox and S3Fox can help.  I’ve used S3Fox and love it.
The development model is suprisingly similar (Tom mentioned building his demo on his home machine and using some of the more exotic services, like SQS; then, when he was ready for the full cloud deployment, he just moved his war file to the appropriate image after some setup).

Then Tom demoed an app built by composing a number of Amazon web services.  Starting an an ec2 machine image (AMI) takes a long time (but still less than building a machine from scratch :).  During entire presentation and demo (1 hour, 3 instances, some messaging, he was only charged 50 cents.

Other interesting uses: The NY Times used it to build a bunch of web friendly pngs from tiffs of papers past.
You can use a regular RDBMS, with Elastic Block Storage.

Someone asked: where does AWS fit in larger organizations?  Tom thought it was a good fit for small organizations…  But he was not really sure about large organizations.

In my opinion, many of the technical decision makers I know are willing to use S3 as a storage mechanism, but they still want a backup solution, in case Amazon is unavailable (as it sometimes is).  This unavailability would be even more damning if you had an entire webapp running off ec2 and the other services.

Buying your own dedicated server has its own risks, but many people are still used to that paradigm.  But, for quickly scaling, or for a special one time project that needs a lot of firepower (like the NYTimes project above), it makes sense.

Stepping back from AWS, the idea of cloud computing seems to be continuing to make progress and attack the issues of network connectivity, security and cost that make it a hard sell at the present.  I love the delineation of the variations (infrastructure as a service, etc), and not all cloud computing will look like AWS.
Overall, a great presentation.  If you have the time (I stayed for some of the Q&A, and left at the 90 minute mark), it’s worth a listen. Go ahead, check it out.

Technorati Tags: ,


Oracle AMIs for EC2

Many years ago, I did an internship with the database group at the company where I was working. I still have the printout on installing Oracle, and I remember it being at least 40 pages. There was a lot of voodoo with user accounts and kernel settings.

While I’ve worked with Oracle since, I haven’t been responsible for installing it; things may have become easier. But now, they definitely are easier. From the Amazon Developer Newsletter:

Oracle has produced four publicly-available Amazon EC2 AMIs with pre-installed and configured software for Enterprise, Standard or Express editions. In a matter of minutes, developers can have a fully configured Oracle Database computing environment running on Amazon EC2 that includes the web-based management tool Enterprise Manager Database Control and the web-based rapid development tool Applications Express (APEX).

Technorati Tags: , ,



© Moore Consulting, 2003-2017 +