Skip to content

Web Applications - 6. page

Palm Pre ‘Hello World’ error on Windows

If you’re running through the Palm WebOS ‘Hello World’ application on Windows (I use XP) with the command line tools, you’ll want to change this line this.controller.pushScene("first"); to this this.controller.pushScene("First");, as outlined in this forum post.  Apparently, case matters somewhere in there, and the palm command line tools generate upper case view and assistant file names.

This oversight is a bit embarrassing/peculiar, since the ‘Hello World’ example is often the first thing developers turn to when learning a new platform, and it was reported in July and acknowledged by a Palm employee that same month. I can only surmise that this works fine on other platforms (which is definitely possible, given that the palm tools don’t lowercase the first letter of the scene, as expected, on Windows).

I’m not sure about whether this affects the Eclipse plugin. This is using the 1.3.1-314 SDK.

[tags]palm pre,hello world[/tags]

Mozilla Crash Reports

Check it out: Mozilla has made their crash reports available online.  You can see crashes for all kinds of Mozilla apps, not just Firefox.  But I think that the firefox stats are the most interesting.  Here’s the top crashing domains for the last-but-one FF release (3.5.4).  Here’s the top crashing urls for 3.5.4 (go FarmVille!).  Here’s more about Firefox crash reporting in general (including links to the software that runs the online crash report server).

I’m not sure how useful this data is to normal web developers, since you can’t see if your domain is causing crashes unless it is among the top urls and/or domains in the trend reports.  However, if you had a relatively high traffic website and you noticed, after a new FF rollout or new rollout of your webapp, that the FF user percentage had dropped off a cliff, you could try to see if your webapp was listed here with a large number of crashes.

Microsoft has the other main crash reporting program I’ve seen regularly, but apparently they don’t release statistics, based on their privacy policyOpenOffice collects crash data, but that organization doesn’t appear to release the data either.

Bravo for FF for releasing their crash data.  I looked around, but didn’t see any academic research based on this data–I imagine you could find some interesting trends (checkins vs crashes per version, etc).

[tags]firefox,mozilla,transparency[/tags]

Tips: Deploying a web application to the cloud

I am wrapping up helping a client with a build out of a drupal site to ec2. The site itself is a pretty standard CMS implementation–custom content types, etc. The site is an extension to an existing brand, and exists to collect email addresses and send out email newsletters. It was a team of three technical people (there were some designers and other folks involved, but I was pretty much insulated from them by my client) and I was lucky enough to do a lot of the infrastructure work, which is where a lot of the challenge, exploration and experimentation was.

The biggest attraction of the cloud was the ability to spin up and spin down extra servers as the expected traffic on the site increased or decreased. We choose Amazon’s EC2 for hosting. They seem a bit like the IBM of the cloud–no one ever got fired, etc. They have a rich set of offerings and great documentation.

Below are some lessons I learned from this project about EC2. While it was a drupal project, I believe many of these lessons are applicable to anyone who is building a similar system in the cloud. If you are building an video processing super computer, maybe not so much.

Fork your AMI

Amazon EC2 running instances are instantiations of a machine image (AMI). Anyone can create a machine image and make it available for others to use. If you start an instance off an image, and then the owner of the image deletes the image (or otherwise removes it), your instance continues to run happily, but, if you ever need to spin up a second instance off the same AMI, you can’t. In this case, we were leveraging some of the work done by Chapter Three called Project Mercury. This was an evolving project that released several times while we were developing with it. Each time, there was a bit of suspense to see if what we’d done on top of it worked with the new release.

This was suboptimal, of course, but the solution is easy. Once you find an AMI that works, you can start up an instance, and then create your own AMI from the running instance. Then, you use that AMI as a foundation for all your instances. You can control your upgrade cycle. Unless you are running against a very generic AMI that is unlikely to go away, forking is highly recommended.

Use Capistrano

For remote deployment, I haven’t seen or heard of anything that compares to Capistrano. Even if you do have to learn a new scripting language (Ruby), the power you get from ‘cap’ is fantastic. There’s pretty good EC2 integration, though you’ll want to have the EC2 response XML documentation close by when you’re trying to parse responses. There’s also some hassle involved in getting cap to run on EC2. Mostly it involves making sure the right set of ssh keys is in the correct place. But once you’ve got it up and running, you’ll be happy. Trust me.

There’s also a direct capistrano/EC2 integration project, but I didn’t use that. It might be worth a look too.

Use EBS

If you are doing any kind of database driven website, there’s really no substitute for persistent storage. Amazon’s Elastic Block Storage (EBS) is relatively cheap. Here’s an article explaining setting up MySQL on EBS. I do have a friend who is using EC2 in a different manner that is very write intensive, that is having some performance issues with his database on EBS, but for a write seldom, read often website, like this one, EBS seems plenty fast.

EC2 Persistence

Some of the reasons to use Capistrano are that it forces you to script everything, and makes it easy to keep everything in version control. The primary reason to do that is that EC2 instances aren’t guaranteed to be persistent. While there is an SLA around overall EC2 availability, individual instances don’t have any such assurances. That’s why you should use EBS. But, surprisingly, the EC2 instances that we are using for the website haven’t bounced at all. I’m not sure what I was expecting, but they (between three and eight instances) have been up and running for over 30 days, and we haven’t seen a single failure.

Use ElasticFox

This is a FireFox extension that lets you do every workaday task, and almost every conceivable operation, to your EC2 instances. Don’t delay, use this today.

Consider CloudFront

For distributed images, CloudFront is a natural fit. Each instance can then reference the image, without you needing to sync files across instances. You could use this for other files as well.

Use Internal Network Addressing where possible

When you start an EC2 instance, Amazon assigns it two IP addresses–an external name that can be used to access it from the internet, and an internal name. For most contexts, the external name is more useful, but when you are communicating within the cloud (pushing files around, or a database connection), prefer the internal DNS. It looks like there are some performance benefits, but there are definitely pricing benefits. “Always use the internal address when you are communicating between Amazon EC2 instances. This ensures that your network traffic follows the highest bandwidth, lowest cost, and lowest latency path through our network.” We actually used the internal DNS, but it makes more sense to use the IP address, as you don’t get any abstraction benefits from the internal DNS, which you don’t control–that takes a bit of mental adjustment for me.

Consider reserved instances

If you are planning to use Amazon for hosting, make sure you explore reserved instance pricing. For an upfront cost, you get significant savings on your runtime costs.

On Flexibility

You have a lot of flexibility with EC2–AMIs are essentially yours to customize as you want, starting up another node takes about 5 minutes, you control your own DNS, etc. However, there are some things that are set at startup time. Make sure you spend some time thinking about security groups (built in firewall rules)–they fall into this category. Switching between AMIs requires starting up a new instance. Right now we’re using DNS round robin to distribute load across multiple nodes, but we are planning to use elastic IPs which allow you to remap a routable ip address to a new instance without waiting for DNS timeouts. EBS volumes and instances they attach to must be in the same availability zone. None of these are groundbreaking news, it’s really just a matter of reading all the documentation, especially the FAQs.

Documentation

Be aware that there are a ton of documentation, one set for each API release, for EC2 and the other web services that Amazon provides. Rather than starting with Google, which often leads you to an outdated version of documentation, you should probably start at the AWS documentation center. This is especially true if you’re working with any of the systems that are newer with perhaps not as stable an API.

In the end

Remember that, apart from new tools and a few catches, using EC2 is not that different than using a managed server where you don’t have access to the hardware. The best document I found on deploying drupal to EC2 doesn’t talk about EC2 at all–it focuses on the architecture of drupal (drupal 5 at that) and how to best scale that with additional servers.

[tags]ec2,amazon web services,capistrano rocks[/tags]

Interesting GWO Case Study

I’ve written before about Google Website Optimizer.  But it’s always nice to see hard data.

Here’s an interesting GWO Case Study I found online, via a presentation by Angie Pascale.  It focuses on optimizing landing pages for a college system.  Conclusions:

Although the SEM agency did not find a correlation between brain lateralization and form location, they did succeed in optimizing Westwood’s program landing pages. On average, the program pages saw a 39.87% conversion rate improvement, with 83.1% being the highest upgrade. After significant results were revealed, the agency stopped each experiment and changed the format for every page to reflect the best-performing contact form location.

[tags]gwo, case study[/tags]

Setting variables across tasks in capistrano

I am learning to love capistrano–it’s a fantastic deployment system for remote server management.  I’m even learning enough ruby to be dangerous.

One of the issues I ran into was I wanted to set a variable in one task and use it in another (or, more likely, in more than one other task).  I couldn’t find any examples of how to do this online, so here’s how I did it:

task :set_var
self[:myvar]= localvar
end

task :read_var
puts self[:myvar]
end

Note that myvar and localvar need to be different identifiers–“local variables take precedence”.  Also, the variable can be anything, I think.  I use this method to create an array in one task, then iterate over it in another.

[tags]capistrano, remote deployment, ruby newbie[/tags]

Amazon AMI search

It’s interesting to me that there is no Amazon Machine Image (AMI) search.  AMIs are virtual machine images that you can run on EC2, Amazon’s cloud computing offering.  Sure, you can browse the list of AMIs, but that doesn’t really help.  Finding an image seems to be haphazard, via a google search (how I found this alfresco image) or via the community around a product on an image (like this image for pressflow, a high performance drupal).

I’m not the only person with this complaint.  The Amazon EC2 API only provides limited data about various images, but surely some kind of search mechanism wouldn’t be too hard to whip up, if only on the image owner and platform fields.

Does anyone know where this exists?  My current best solution for finding a specific AMI is to use the fantastic ElasticFox FireFox plugin and just search free form on the ‘Images’ tab.

[tags]amazon, ec2, can I get a ‘search search'[/tags]

A survey of CDNs for use with Drupal

I have spent some time researching Content Delivery Networks (CDNs) and how they can integrate with Drupal.  Note that I have not yet implemented a CDN solution, so my experiences and opinion may change….  I will try to do a second post or update when we’ve actually rolled something out live.

Here are some criteria I’d use in selecting a drupal module for CDN management:

  • Do you need a CDN?  This is the key question, as a CDN can speed up your site, but introduces a layer of managment and expense that might not be worth the hassle.
  • Do you mind patching drupal core?  This might be a maintenance issue going forward.
  • Do you want to have just images on your CDN, or javascript and CSS as well?  What about video?
  • How contained within the drupal interface do you need your interactions with a CDN to be?  Are you comfortable using a third party tool sometimes?
  • Do you have an existing CDN to work with, or are you selecting a CDN from scratch?  Obviously, you have more flexibility in the second case.
  • Do you mind coding? Some of these modules seem like they are 75% of the solution, but you might need to write some code to finish things up.

There are a number of modules that attempt to integrate a CDN into Drupal, or might help doing so.  All of these had a release for Drupal6.

  • CDN: this seems like a great fit.  Active development, good sized issue queue, support for multiple CDNs.  It also patches core. Here’s a list of CDNs used with this module.
  • media_mover: this module seems like it might be useful if you were needing to move image and or video files to a CDN.  That might require some coding, although I remember there being some S3 and FTP support.
  • creeper: this module is all about Amazon API integration, including CloudFront.  Plus, what a great name!
  • parallel: fairly new module that changes the source hostnames of images, css files and javascript html tags.  Therefore, they can be served off a CDN, or another web server, etc.
  • storage_api: this is a general storage service with a CDN focus, but doesn’t appear to be well documented or supported as of this time.
  • cloudfront: adds Amazon CloudFront support to the imagecache module

These all seem to be useful in their own ways.  The current project I’m working on is already invested in the Amazon infrastructure, mainly because of Project Mercury, so cloudfront is our current choice.

Did I miss any key modules?

[tags]drupal cms, cdns rock[/tags]

NearlyFreeSpeech.net: pay only for the hosting you use

I had a friend tell me about NearlyFreeSpeech.net. Much like Amazon’s cloud computing services, you only pay for what you use.  Unlike Amazon, there’s no complicated infrastructure or proprietary protocols to get familiar with.  I doubt it has the reliability and scalability of Amazon either.

The pricing is pretty crazy: a penny a day for a website, $1/GB for your first GB of transfer, etc.  There’s a calculator to give you an idea of what you’d pay.

For a certain type of user, who my ‘web presence in two hours’ method just won’t work for, and who can use time in lieu of money, this seems like a great solution. I’m thinking, for example, of non-profits that are just trying to get a web presence and who don’t want to use one of the blog sites for reasons of design control.  If all you have is a static site, this can be very affordable:

“Static sites don’t have any baseline charges at all; you pay only for the storage and bandwidth you use, making them incredibly affordable if you’re on a limited budget and you’re working with a prebuilt website like those produced by many of the most popular web design programs.”

I don’t have any idea what kind of support or uptime they offer, but I love the idea of hosting that might start at $3/month, but can scale up easily and transparently.  They’ve been around since 2002, so they must be doing something right.

[tags]hosting, don’t buy what you can’t afford[/tags]

Using APIs to move time entries from FreshBooks to Harvest

I recently was working for a client who has their own time tracking system–they use Harvest.  They want me to enter time that I work for them into that system–they want more insight into my time use than monthly invoice. However, I still use my own invoicing system, FreshBooks (more on that choice here) and will need to invoice them as well.  Before the days when APIs were common, or if either of these sites did not have an API, I would have had three, equally unsavory, choices:

  • Convince the client to use my system or at least access it for whatever data they needed
  • Send reports (spreadsheets) to the client from my system and let them process it
  • Enter my time in both places.  This option would have won, as I don’t like to inconvenience people who write me checks.

Luckily, both Harvest and FreshBooks provide APIs for time tracking (Harvest doco here, FreshBooks doco here). I was surprised at how similar the time tracking data formats were.  With the combination of curl, gnu date, sed, Perl and bash, I was able to write a small script (~80 lines) that

  • pulled down my time data for this client, for this week, from FreshBooks (note you have to enable API access to your account for this to work)
  • mapped it it from the FreshBooks format to the Harvest format
  • then posted it to Harvest.

A couple of caveats:

  • I still log in to Harvest to submit my time (I didn’t see a way to submit my time in the API documentation), but it’s a heck a lot easier to press one button and submit a weeks worth of time than to do double entry.
  • I used similar project and task codes in both systems (or, more accurately, I set up the FreshBooks tasks and projects to map to the Harvest ones, since FreshBooks is what I had control over).  That mapping was probably the most tedious part of writing the script.

You can view my script here, or at least a sanitized version thereof.  it took about an hour and a half to do this. Double entry might have been quicker in the short term, but now I’m not worried about entry mistakes, and submitting my time every week is easy!  I could also have used XSLT to transform from one data format to the other, but they were so similar it was easier just parse text.

[tags]getharvest,freshbooks,time tracking, process automation[/tags]

Useful Tools: piwik, a worthy web statistics package

I recently installed a open source web analytics tool called piwik.  (You can demo it at that site.) I found out about it via the sourceforge.net mailing list. It was the featured project for July 2009. It bills itself as an alternative to Google Analytics (GA) (actually, right now, the home page states “Piwik aims to be an open source alternative to Google Analytics.”) and I can see why it does so. The architecture is similar, JavaScript executing on every page and sending data to a server; the interface is similar as well, with lots of whizzy Web 2.0, JavaScript heavy features and detailed data.

I had to see been using the Wusage installation that came with my web hosting service. piwik was quite a step up from that, with richer graphs, results and UI. Plus, because it was JavaScript executing and I was assured that every visit was actual visit by an actual person. Since it’s hosted on my server, I control all the data, which was a sticking point for me considering using Google Analytics.

I recently upgraded to 0.4.2, which broke the dashboard, but I’ve been assured a fix is in SVN (Update Aug 4: They no longer plan to fix the bug, but there is a workaround in that thread.).  If you want to get the latest code, go hereYou can download 0.4.1, the last working version I know of, here. I’ll update this to point to the piwik website when they have a release up that works. For some reason they don’t have a release archive that I could find.

So what’s good about piwik?  Well compared to what, Google analytics, or other website analytics tools? This is a fundamental question, because if you are using GA just for the web stats piece, or are using some other static logfile analysis tool, piwik is well worth reviewing.

In comparison to Google Analytics

The downside is

  • you have to maintain another server/database, etc.  I imagine that someone will offer piwik via SAAS sometime soon, though I couldn’t find anyone doing that right now.
  • it’s a beta product and is not as mature as Google Analytics, as evidenced by the 0.4.2 issue above
  • some key GA features are missing (goals, funnels, etc).

In comparison to the other website analytics tools I’ve used, AWstats (which I’ve written about before and is open source) and wusage (not open source, but free with my hosting contract), piwik has

  • a slick user interface
  • JavaScript execution, so you know you’re getting a real browser instead of a bot (the javascript browser guarantee)
  • click outs easier to track
  • easier configuration
  • javascript widgets available

The downside is:

This is obviously not intended to be a full, detailed analysis of all the differences between these tools, but I think that piwik has a lot of promise.  They have a roadmap full of planned features but they definitely aren’t yet an alternative to Google Analytics for anyone who uses some of the more advanced features of that product. Funnels, the click overlay or goals, are all unsupported in piwik as of this version. In the forums, I saw several requests for such richer analysis tools, and in the roadmap I saw a goal tracking plugin as a blocker for version 1.0, so the team is aware of the lack.

When browsing around doing research for this post, I saw a post (sorry, couldn’t find it again) about how piwik features would be developed for smaller websites because it’s an open-source alternative, but I believe that the support of openX (an ad server company that I wrote about in the past), who is funding at least one of the developers, will prevent such feature capture.  In addition, I find that open source projects that have an existing project to model themselves on (like GA), tend to try to reach feature parity.  If piwik continues on its current valid path of replicating Google Analytics features, then I think it will live up to its aim.

If you’re simply using Google Analytics to see who referred traffic to your sites, or for which keywords search engines are showing your site, and you want something more open or more control of your data, piwik is a good fit.  If you use any other web stats tool, and want a slicker admin interface or the javascript browser guarantee, piwik is also worth a look.

Update, 7/31: A friend pointed out this broad survey of the current state of free (as in beer) web analytics options
[tags]piwik,the javascript browser guarantee,google analytics, piwik vs google analytics, web stats[/tags]