Masterless puppet and CloudFormation

I’ve had some experience with CloudFormation in the past, and recently gained some puppet expertise.  I thought it’d be great to combine the two, working on a new project to set up the ELK stack for a client.

Basically, we are creating an ec2 instance (or a number of them) from a vanilla image using a CloudFormation template, doing a small amount of initialization via the UserData section and then using puppet to configure them further.  However, puppet is used in a masterless context, where the intelligence (of knowing which machine should be configured which way) isn’t in the manifest file, but rather in the code that checks out the modules and manifests. Here’s a great example of a project set up to use masterless puppet.

Before I dive into more details, other solutions I looked at included:

  • doing all the machine setup in UserData
    • This is a bad idea because it forces you to set up and tear down machines each time you want to make a configuration change.  Leads to a longer development cycle, especially at first.  Plus bash is great for small configurations, but when you have dependencies and other complexities, the scripts can get hairy.
  • pulling a bash script from s3/github in UserData
    • puppet is made for configuration management and handles more complexity than a bash script.  I’ll admit, I used puppet with an eye to the future when we had more machines and more types of machines.  I suppose you could do the same with bash, but puppet handles more of typical CM tasks, including setting up cron jobs, making sure services run, and deriving dependencies between services, files and artifacts.
  • using a different CM tool, like ansible or chef
    • I was familiar with puppet.  I imagine the same solution would work with other CM tools.
  • using a puppet master
    • This presentation convinced me to avoid setting up a puppet master.  Cattle not pets.
  • using cloud-init instead of UserData for initial setup
    • I tried.  I couldn’t figure out cloud-init, even with this great post.  It’s been a few months, so I’m afraid I don’t even remember what the issue was, but I remember this solution not working for me.
  • create an instance/AMI with all software installed
    • puppet allows for more flexibility, is quicker to setup, and allows you to manage your configuration in a VCS rather than a pile of different AMIs.
  • use a container instead of AMIs
    • isn’t docker the answer to everything? I didn’t choose this because I was entirely new to containerization and didn’t want to take the risk.

Since I’ve already outlined how the solution works, let’s dive into details.

Here’s the UserData section of the CloudFormation template:


          "Fn::Base64": {
            "Fn::Join": [
              "",
              [
                "#!/bin/bash \n",
                "exec > /tmp/part-001.log 2>&1 \n",
                "date >> /etc/provisioned.date \n",
                "yum install puppet -y \n",
                "yum install git -y \n",
                "aws --region us-west-2 s3 cp s3://s3bucket/auth-files/id_rsa/root/.ssh/id_rsa && chmod 600 /root/.ssh/id_rsa \n",
                "# connect once to github, so we know the host \n",
                "ssh -T -oStrictHostKeyChecking=no git@github.com \n",
                "git clone git@github.com:client/repo.git \n",
                "puppet apply --modulepath repo/infra/puppet/modules pure-spider/infra/puppet/manifests/",
                { "Ref" : "Environment" },
                "/logstash.pp \n",
                "date >> /etc/provisioned.date\n"
              ]
            ]

So, we are using a bash script, but only for a little bit.  The second line (starting with exec) stores output into a logfile for debugging purposes.  We then store off the date and install puppet and git.  The aws command pulls down a private key stored in s3.  This instance has access to s3 because of an IAM setup elsewhere in the CloudFormation template–the access we have is read-only and the private key has already been added to our github repository.  Then we connect to github via ssh to ‘get to know the host’.  Then we clone the repository containing the infrastructure code.  Finally, we apply the manifest, which is partially determined by a parameter to the CloudFormation template.

This bash script will run on creation of the EC2 instance.  Once this script is solid, if you are testing adding additional puppet modules, you only have to do a git pull and puppet apply to add more functionality to the modules.  (Of course, at the end you should stand up and tear down via CloudFormation just to test end to end.)  You can also see how it’d be easy to have the logstash.conf file be a parameter to the CloudFormation template, which would let you store your configuration for web servers, database servers, etc, in puppet as well.

I’m happy with how flexible this solution is.  CloudFormation manages the machine creation as well as any other resources, puppet manages the software installed in those machines, and git allows you to maintain all that configuration in one place.


400 error on heroku git:clone

heroku photo

Photo by jacobian

I’m working on an estimate for changes to a heroku hosted web application.

I was trying to run heroku git:clone --app appname after adding myself as a collaborator. I was running this on a new vagrant box running ubuntu.

However, I kept getting this error message:

vagrant@precise64:~$ heroku git:clone --app appname
Cloning from app 'appname'...
Cloning into 'appname'...
error: The requested URL returned error: 400 while accessing https://git.heroku.com/appname.git/info/refs
fatal: HTTP request failed

And I couldn’t understand why.

After some fiddling, I determined that first you need to have an ssh key generated:

vagrant@precise64:~$ ssh-keygen -t rsa

And then you can run:

vagrant@precise64:~$ heroku git:clone --app appname --ssh-git
Cloning from app 'appname'...
Cloning into 'appname'...
Warning: Permanently added the RSA host key for IP address 'ipaddress' to the list of known hosts.
Fetching repository, done.
remote: Counting objects: 902, done.
remote: Compressing objects: 100% (473/473), done.
remote: Total 902 (delta 433), reused 826 (delta 379)
Receiving objects: 100% (902/902), 28.06 MiB | 556 KiB/s, done.
Resolving deltas: 100% (433/433), done.

Hope this helps someone, somewhere.


Open Source, Consulting and Building SaaS Products

construction photo

Photo by JD Hancock

I was browsing Hacker News the other day, and ran across this article, lamenting how difficult it was to support a company with an open source project and that insomuch as one could, consulting generated far more revenue than selling SaaS services like hosting.  For the record, I’ve never touched LocomotiveCMS.  From a brief glance, it looks nice.

While I feel for them, I think that they have alternatives:

  • Sell premium support.  Right now, it appears the only way to get premium support is to host with them, and it seems that many clients are more interested in self hosted solutions.  Makes sense–if you are a rails developer (the target market for this CMS) you already have a hosting solution.  But if premium support was offered separately, they could hire someone (possibly part time) less skilled than Didier, the primary developer, and have them take care of tier 1 support.  And still offer a warm fuzzy feeling for harder problems, which would escalate to Didier.  Companies like to pay for that kind of service, even if they don’t always use it.  This strategy would also decrease the amount of revenue needed to hire someone to help Didier (customer server folks are less expensive than developers).
  • Sell an ebook (or a couple).  These are far easier to create and sell than a SaaS product.  (I use leanpub!)  It could be an ‘authoritative guide to LocomotiveCMS’ or just focus on one part.  Since Didier knows which questions he often answers for people who have paid him money, he’s probably got a very good idea of where the pain points are.
  • Someone suggested this in the comments, but a marketplace for plugins to LocomotiveCMS seems like a natural way to go.  Again, i don’t know that community, and marketplaces for CMSes can be hard to kick start, but this is worth evaluating.
  • I’m sure there are others.  Here’s an exhaustive list of business models, courtesy of the AVC community, so if I were them, I’d review and see what was a fit.

In my comment on the HN post, I talk about how products often face a “round peg in an elliptical hole” problem. I meant that products often solve 80% of the problem for 80% of the users.  They also require users to change their processes (more crystallization).  Typically there’s just enough offset that people feel cognitive drag.  (Of course, the same thing usually happens with custom solutions, you just don’t know that until you are done.  Doh!)

Especially in crowded markets, like CMSes, it is far far easier to sell enough hours to make a living customizing a solution than it is to sell enough products to make a living.  Brennan Dunn covers this ground well.  Every consulting company I’ve ever seen or been a part of, and every consultant I’ve ever known (except the ones who were contracting for one client and really were employees with more flexibility), dreams of transitioning from non scalable consulting by the hour to scalable product sales.  One friend even had a name for it–the “von MacIntyre machine”, which would make money while he slept.

But it’s hard.


Python Minesweeper Programming Problem

At an interview, I was asked to build a python program, using TDD, which would output the results of a minesweeper game.  Not a fully functional game, just a small program that would take an array of bomb locations and print out a map of the board with all values exposed.

So, if you have a 3×3 board, and there is a bomb in each corner, it would print out something like this:

x 2 x
2 4 2
x 2 x

Or if you have a 3×3 board and there is only a bomb in the upper left corner, it would print out something like this:

x 1 0
1 1 0
0 0 0

I did not complete the task in the allotted time, but it was a fun programming exercise and I hope illuminating to the interviewers. I actually took it home and finished it up. Here’s the full text of the program:

class Board:
        def __init__(self, size = 1, bomblocations = []):
		self._size = size
		self._bomblocations = bomblocations
    	
	def size(self):
		return self._size
    	
	def _bombLocation(self,location):
                for onebomblocation in self._bomblocations:
                	if onebomblocation == location:
				return 'x'

	def _isOnBoard(self,location):
		if location[0] =self._size:
		    	return False
		if location[1] >=self._size:
		    	return False
		return True

	def whatsAt(self,location):
		if self._bombLocation(location) == 'x':
			return 'x'
		if not self._isOnBoard(location):
			return None
		return self._numberOfBombsAdjacent(location)

	def _numberOfBombsAdjacent(self,location):
		bombcount = 0
		# change x, then y
		currx = location[0] 
		curry = location[1] 
		for xincrement in [-1,0,1]:
			xtotest = currx + xincrement
			for yincrement in [-1,0,1]:
				ytotest = curry + yincrement
				#print 'testing: '+ str(xtotest) + ', '+str(ytotest)+ ', '+str(bombcount)
				if not self._isOnBoard([xtotest,ytotest]):
					continue	
				if self._bombLocation([xtotest,ytotest]) == 'x':
					bombcount += 1
		return bombcount

	def printBoard(self):
		x = 0
		while x < self._size:
			y = 0
			while y < self._size:
				print self.whatsAt([x,y]),
				y += 1
			x += 1
			print
				
def main():
	board = Board(15,[[0,1],[1,2],[2,4],[2,5],[3,5],[5,5]])
	board.printBoard()

if __name__ == "__main__": 
	main()

And the tests:

import unittest
import app

class TestApp(unittest.TestCase):

    def setUp(self):
    	pass

    def test_board_creation(self):

        newboard = app.Board()
        self.assertIsNotNone(newboard)

    def test_default_board_size(self):
     	newboard = app.Board()
        self.assertEqual(1, newboard.size())	

    def test_constructor_board_size(self):
     	newboard = app.Board(3)
        self.assertEqual(3, newboard.size())	

    def test_board_with_bomb(self):
     	newboard = app.Board(3,[[0,0]])
        self.assertEqual('x', newboard.whatsAt([0,0]))	

    def test_board_with_n_bombs(self):
     	newboard = app.Board(4,[[0,0],[3,3]])
        self.assertEqual('x', newboard.whatsAt([0,0]))	
        self.assertEqual('x', newboard.whatsAt([3,3]))	

    def test_board_with_bomb_check_other_spaces_separated_bombs(self):
     	newboard = app.Board(4,[[0,0],[3,3]])
        self.assertEqual(1, newboard.whatsAt([0,1]))	
        self.assertEqual(1, newboard.whatsAt([1,0]))	
        self.assertEqual(1, newboard.whatsAt([1,1]))	
        self.assertEqual(0, newboard.whatsAt([1,2]))	
        self.assertEqual(0, newboard.whatsAt([2,1]))	
        self.assertEqual(1, newboard.whatsAt([3,2]))	
        self.assertEqual(1, newboard.whatsAt([2,3]))	
        self.assertEqual(1, newboard.whatsAt([2,2]))	

    def test_check_other_spaces_contiguous_bombs(self):
     	newboard = app.Board(4,[[0,1],[0,0]])
        self.assertEqual(1, newboard.whatsAt([0,2]))	
        self.assertEqual(2, newboard.whatsAt([1,0]))	
        self.assertEqual(0, newboard.whatsAt([2,1]))	

    def test_off_the_board(self):
     	newboard = app.Board(3,[[0,0],[1,2]])
        self.assertEqual(None, newboard.whatsAt([3,3]))	

This was written in python 2.7, and reminded me of the pleasure of small, from the ground up software (as opposed to gluing together libraries to achieve business objectives, which is what I do a lot of nowadays).


Building a System with IronMQ and Python

messages photo

Photo by andrewrennie

One of my most recent projects was writing a system to deliver real estate listing data to a content management system. The CMS was not in my control. Since the listing data source was bursty and I wasn’t sure how the CMS would handle the load, I decided to use a message queue, where the messages would have a JSON payload. Message queues are great at decoupling components of a system.

For the queue, I used IronMQ. The company already was using it, it has a free tier (up to 24 messages a second), the service has been stable and reliable, it has great language SDKs, and setting up a durable message queue is something I’d rather outsource. (I do wish Zapier supported it.) In other situations (when posting messages from mobile apps), we ran Varnish in front of IronMQ so that it could be replaced easily. In this case, we didn’t because there were fewer moving pieces (it was server to server communication and it would be easier to swap out IronMQ should that be required).

I wrote the bridge code from the listing database to the message queue in python. The shop was mostly Java and some python, and python seemed a better fit for a small ‘pull from here, push to there’ application. I used pytest for unit testing, jenkins to run the unit tests in a CI environment, and autopep8 for formatting. My colleague was a more experienced python programmer, so I was able to lean on him for questions. I didn’t find python hard to pick back up (I’d scripted in python a little years ago), and it was a fun language to code in. Reminded me of perl w/r/t packages and quick developer feedback. I did miss Java’s dependency management, though (my college recommended virtualenv as a possible solution).

The JSON payload would allow developers writing the message consumer to use almost any language they wanted–any language if they used the IronMQ REST API rather than an SDK.

I can’t share the code, but for this kind of problem, python was a great solution. And I’ll reach for IronMQ any time I need a message queue. This pair of technologies was quick to implement, easy to deploy, and high performance wasn’t really a requirement, since the frequency of the listing delivery was the real bottleneck.


What a pleasurable way to learn a language!

This site was recommended to me, and I have to say, it is a fun way to become more familiar with the syntax of a language. There’s the journey aspect:

things are not what they appear to be: nor are they otherwise
your path thus far [...X______________________________________________] 19/280

and the fact that when you see something you want to investigate further, you just write another unit test:

  def test_slicing_arrays
    array = [:peanut, :butter, :and, :jelly]

    assert_equal [:peanut], array[0,1]
    assert_equal [:peanut,:butter], array[0,2]
    assert_equal [:and,:jelly], array[2,2]
    assert_equal [:and,:jelly], array[2,20]
    assert_equal [], array[4,0]
    assert_equal [], array[3,0] # my addition
    assert_equal [], array[4,100]
    assert_equal nil, array[5,0]
  end

Now, running through these koans certainly isn’t going to make me a Ruby expert, but I will have passing familiarity with the language and be ready to use it on my next small project.

Apparently I’ve been living under a rock, because there appear to be koans projects for quite a few languages: java, haskell, erlang (cue whatsapp reference), and even bash. I was, however, unable to find a koans package for assembler.


Older versions of Sinon.js don’t work with jquery 2.0

This is a quick hit, hopefully to help someone avoid spending the half day I just did.

The older versions of sinon.js, a helpful javascript testing tool which lets you mock up and stub out objects, do not work with jquery 2.0.  Even though 2.0 is API compatible with the 1.x series, apparently some different stuff happens under the covers.  This is an issue for me because a few months ago, I followed these instructions to set up our testing infrastructure, and used sinon.js version 1.4.2.  That worked fine with jquery 1.8.2, but when I upgraded everything, tests where I mocked up server calls failed–the backbone model’s parse method was never called.

The answer?  Use at least version 1.7.1 of sinon.js.


Running a Google Apps Script Once a Month

I needed a way to email a Google spreadsheet to my boss once a month, for some reporting purposes.  I could have put an entry in my calendar reminding me to do it, but I thought it would be a great time to try out the Google Docs scripting that I had read about for a year or two, and seen an AppSumo video about.  (I got the AppSumo video for free, from an ad on HARO.)

It was laughably easy to get write the actual script (here’s a great set of tutorials).  The only rub was Google doesn’t allow you to run scripts in month intervals, only hourly, daily or weekly.  A small bit of scripting got around that.

Here’s the final script (edited to remove sensitive data):

function myFunction() {
  var dayOfMonth = Utilities.formatDate(new Date(), "GMT", "dd");
  if (dayOfMonth == 05){
    MailApp.sendEmail("email@example.com", "Spreadsheet Report Subject", 
'https://spreadsheets.google.com/a/mydomain.com/ccc?key='+SpreadsheetApp.getActiveSpreadsheet().getId());
  }
}

I set up a daily trigger for this script and installed it within the spreadsheet I needed to send.

I really really like Google Apps Script.  I think it has the power to be the VB of the web, in the way that VB made it easy to automate MS Office, reduce drudgery, and allow non developers to build business solutions.  It also ties together some really powerful tools–check out all the APIs you can access.

Once you let non developers develop, which is what Google Apps Script does, you do run into some maintenance issues (versioning, sharing the code, testing), but the same is true with Excel Macros, and solving those issues is for greater minds than mine.


Review of SkaDate dating software

I recently helped a client move an existing dating site from a custom ASP/MS-SQL system to an off the shelf PHP/mysql platform.

The off the shelf software we ended up choosing was SkaDate.  I haven’t really found a good review of SkaDate out there, so I asked Max Chadwick to collaborate with me on a review.  (Max provided some design, system configuration and project management, and I focused on back end system setup and data migration.)  Updated 4/2: We used SkaDate 7.5, versions 1485 and 1550.

Note that we had a challenge not present for a typical SkaDate installation: migrating ~1100 user accounts (and mail messages) from one unknown system to another.  I had to learn two very different data models and map from one to the other.  On a fixed bid project.  Whoops.

Oh well, live and learn.

Skadate pluses:

  • Price: this is a big one.  You get a lot of features for only $350.
  • Technology: it is built on the LAMP stack, so there are a lot of developers out there who can help you extend the platform.
  • Support: they have a client site with some useful PDF documentation.
  • Had a defined and documented upgrade procedure (even though it was a hassle).
  • Changing look and feel was relatively easy; we went with one of the many predefined templates and only had to hack a little bit of CSS and a couple of images.
  • Caching: SkaDate caches of php files and css.  Performance was reasonably snappy on a shared hosting account.
  • You get the source code.
  • Version control support: it wasn’t hard to find out which files/directories to pull into version control.
  • Support for 5-6 languages out of the box.  We didn’t use this, though.
  • Geographic features: SkaDate knows a lot about cities and where they are located, around the world.

SkaDate Minuses:

  • Support: they charged $90/hour $95/month for support; I didn’t end up asking them for much help, but the times I did, they immediately wanted ssh access to the server (which tweaked me out).  I’m guessing that SkaDate might be a loss leader for ‘support services’.
  • In general the administrative interface was unintuitive and could use some work.
  • Intricate object and data model: lots of indirection, and because it is PHP, no IDE to help you unravel it.  I discovered this when asked to turn off a particular feature (that didn’t have an admin setting)–I’d have to hunt through 3-4 files to find out where a UI element was set up.
  • Secretive nature: they don’t really give you any documentation until you pay for it; however, they do provide a demo and when I emailed them and explained the situation (“I’m a developer and want enough information to do a bid for a project”) they responded with some of the documentation.
  • They had a new release midway through the project, and then a user found several of the php files had been hacked.  Not exactly confidence inspiring.
  • Initial configuration of the site was complicated: all the site features were turned on and the site was pre-configured with specific payment/membership options. The tricky part was not just turning off features, but figuring out that not only did the feature need to be turned off, but that the navigation needed to be disabled for that same feature on another section of the admin area.
  • Setting up custom dating fields was cumbersome partly because of the poor interface but also because there were a number of dating fields already set up that needed to be removed.
  • The software uses dollar signs ($) in some of the automated directories.  This caused an issue with mod_security on one of the hosts we tried to use; go with one of their suggested hosting providers.

I think SkaDate is a good choice for a basic dating site; if you need just the features in the demo and you’re willing to spend some time unraveling the administration UI and you are on a tight budget. Expect some bugs and frustrations, but hey, you only paid $350!
I would hesitate to use it as a platform for a more fully featured dating site until I’d reviewed the alternatives.

Final grade: B-

Technorati Tags: ,


Using APIs to move time entries from FreshBooks to Harvest

I recently was working for a client who has their own time tracking system–they use Harvest.  They want me to enter time that I work for them into that system–they want more insight into my time use than monthly invoice. However, I still use my own invoicing system, FreshBooks (more on that choice here) and will need to invoice them as well.  Before the days when APIs were common, or if either of these sites did not have an API, I would have had three, equally unsavory, choices:

  • Convince the client to use my system or at least access it for whatever data they needed
  • Send reports (spreadsheets) to the client from my system and let them process it
  • Enter my time in both places.  This option would have won, as I don’t like to inconvenience people who write me checks.

Luckily, both Harvest and FreshBooks provide APIs for time tracking (Harvest doco here, FreshBooks doco here). I was surprised at how similar the time tracking data formats were.  With the combination of curl, gnu date, sed, Perl and bash, I was able to write a small script (~80 lines) that

  • pulled down my time data for this client, for this week, from FreshBooks (note you have to enable API access to your account for this to work)
  • mapped it it from the FreshBooks format to the Harvest format
  • then posted it to Harvest.

A couple of caveats:

  • I still log in to Harvest to submit my time (I didn’t see a way to submit my time in the API documentation), but it’s a heck a lot easier to press one button and submit a weeks worth of time than to do double entry.
  • I used similar project and task codes in both systems (or, more accurately, I set up the FreshBooks tasks and projects to map to the Harvest ones, since FreshBooks is what I had control over).  That mapping was probably the most tedious part of writing the script.

You can view my script here, or at least a sanitized version thereof.  it took about an hour and a half to do this. Double entry might have been quicker in the short term, but now I’m not worried about entry mistakes, and submitting my time every week is easy!  I could also have used XSLT to transform from one data format to the other, but they were so similar it was easier just parse text.

Technorati Tags: , , ,



© Moore Consulting, 2003-2017 +