Skip to content

All posts by moore - 39. page

Settle the Routine, Focus on the Important

IMG_20141016_165308295
My Bike Lock

This is a picture of my bike lock.

There are many bike locks, but this one is mine.

I always put my bike lock on as seen.  My helmet with the lock running through the styrofoam (because, on the off chance someone wants to steal a bike helmet, they’d have to destroy mine to get it).  The lock running through both the front wheel and the frame, so that the wheel can’t be stolen–nothing sadder than a locked bike with a missing front wheel–quick release works well.  The actual lock (where I put my key) sheltered from the elements, both because the helmet protects it and because it is facing down.

After commuting on my bicycle hundreds of times and leaving it overnight a handful of times, I’ve determined this is the optimal lock procedure.  Not that I’m not open to new ideas–a few years ago I took my helmet with me, but then I saw someone lock their helmet like this and thought “what a wonderful idea!”.  But once I arrived at what seemed an optimal solution, I just put it on autopilot.  This frees up my mind to pursue other things.  It also means I have not analyzed the way I lock my bike in a deep way until I started to write this blog post, and once this is done, I won’t think about it for the foreseeable future.  The bike lock setup is settled.  I’m always looking for other aspects of my life to settle, while still being open to possible improvements (if you lock your bike differently, let me know!).

There’s a similar tension in software development.  On the one hand, I want to be open to new ideas, frameworks, concepts and solutions.  On the other hand, it can be easy to go after the newest ‘shiny thing’ every time, spin my wheels, and not accomplish what I need to accomplish.  One way to the latter issue is to make some decisions in my work life.  This can range from the trivial–I have used the same aliases file for over a decade–to foundational–I only work on the unix stack.

Sometimes settling constraints are imposed by the project–it’s a RoR app that needs to be upgraded, or there are already extensive java libraries that this application will use (and that can mean that you won’t get the gig or the job).  Time also can be a limiting factor–deadlines have a wonderful way of focusing you to work on the problem at hand, rather than running off to explore that new library.  It’s also important to realize that you can settle the trivial–what shell you work in, what OS you use, how you lock your bike–which lets you focus on the important–the app you are writing, the restaurant you biked to.

(Of course, what is trivial for some projects may be foundational for others.)

When you are working on a side project, sometimes settling on a technology can be hard–it is more exciting to explore that new XYZ rather than grind away at a bug or a new feature in an app that I’ve already written.  But, in the end, extending and shipping an existing app almost always is more rewarding.

We software developers live on a knife edge–on one side is irrelevance, though it may be profitable (COBOL), on the other side is flitting from new technology to new technology (whether it is Erlang, Haskell, NodeJS or something new) but never mastering them.

Fun stuff I’ve done

amusement park photo
Photo by Loozrboy

One of the companies I’ve met with wanted a bit more context around work I’ve done, so I wrote up some of the ‘greatest hits’ of tech work I’ve done in the past couple of years. It was so much fun, I thought I’d post it here so I’ll remember it in a few years.  This is just some of the tech highlights, and doesn’t include other things I’ve learned (managing folks, softer development skills, etc).

  • Colorado CSAs.info is a directory of Colorado CSAs. This site received 28k visits in 2013, and 26k visits so far in 2014. You can view the source. The source has some quick and dirty aspects, since this is a side project.
  • Home valuation processing software output. This was a fairly simple algorithm (linear regression among geographic datasets) but involved some interesting processing because of the number of records involved (1M+) and data sources. The graph shown here was generated by code I wrote. The content is human generated, however. I worked directly with the CEO on requirements for this v1 project.
  • I wrote an ebook about an aspect of mobile app development last year:
  • I built large chunks of COhomefinder (which has not been touched in 2 years, because the business has decided to move forward with an outsourced home search solution). The last code I touched was replacing google maps with mapquest. I also built a widget for featured home listings that could be dropped into other websites. You can see both on the search results page.
  • I wrote a hybrid mobile app that displays news and info about neighborhoods in the front range. This never really got much traction, unfortunately.

It’s fun to look back.

Python Minesweeper Programming Problem

At an interview, I was asked to build a python program, using TDD, which would output the results of a minesweeper game.  Not a fully functional game, just a small program that would take an array of bomb locations and print out a map of the board with all values exposed.

So, if you have a 3×3 board, and there is a bomb in each corner, it would print out something like this:

x 2 x
2 4 2
x 2 x

Or if you have a 3×3 board and there is only a bomb in the upper left corner, it would print out something like this:

x 1 0
1 1 0
0 0 0

I did not complete the task in the allotted time, but it was a fun programming exercise and I hope illuminating to the interviewers. I actually took it home and finished it up. Here’s the full text of the program:

class Board:
        def __init__(self, size = 1, bomblocations = []):
		self._size = size
		self._bomblocations = bomblocations
    	
	def size(self):
		return self._size
    	
	def _bombLocation(self,location):
                for onebomblocation in self._bomblocations:
                	if onebomblocation == location:
				return 'x'

	def _isOnBoard(self,location):
		if location[0] =self._size:
		    	return False
		if location[1] >=self._size:
		    	return False
		return True

	def whatsAt(self,location):
		if self._bombLocation(location) == 'x':
			return 'x'
		if not self._isOnBoard(location):
			return None
		return self._numberOfBombsAdjacent(location)

	def _numberOfBombsAdjacent(self,location):
		bombcount = 0
		# change x, then y
		currx = location[0] 
		curry = location[1] 
		for xincrement in [-1,0,1]:
			xtotest = currx + xincrement
			for yincrement in [-1,0,1]:
				ytotest = curry + yincrement
				#print 'testing: '+ str(xtotest) + ', '+str(ytotest)+ ', '+str(bombcount)
				if not self._isOnBoard([xtotest,ytotest]):
					continue	
				if self._bombLocation([xtotest,ytotest]) == 'x':
					bombcount += 1
		return bombcount

	def printBoard(self):
		x = 0
		while x < self._size:
			y = 0
			while y < self._size:
				print self.whatsAt([x,y]),
				y += 1
			x += 1
			print
				
def main():
	board = Board(15,[[0,1],[1,2],[2,4],[2,5],[3,5],[5,5]])
	board.printBoard()

if __name__ == "__main__": 
	main()

And the tests:

import unittest
import app

class TestApp(unittest.TestCase):

    def setUp(self):
    	pass

    def test_board_creation(self):

        newboard = app.Board()
        self.assertIsNotNone(newboard)

    def test_default_board_size(self):
     	newboard = app.Board()
        self.assertEqual(1, newboard.size())	

    def test_constructor_board_size(self):
     	newboard = app.Board(3)
        self.assertEqual(3, newboard.size())	

    def test_board_with_bomb(self):
     	newboard = app.Board(3,[[0,0]])
        self.assertEqual('x', newboard.whatsAt([0,0]))	

    def test_board_with_n_bombs(self):
     	newboard = app.Board(4,[[0,0],[3,3]])
        self.assertEqual('x', newboard.whatsAt([0,0]))	
        self.assertEqual('x', newboard.whatsAt([3,3]))	

    def test_board_with_bomb_check_other_spaces_separated_bombs(self):
     	newboard = app.Board(4,[[0,0],[3,3]])
        self.assertEqual(1, newboard.whatsAt([0,1]))	
        self.assertEqual(1, newboard.whatsAt([1,0]))	
        self.assertEqual(1, newboard.whatsAt([1,1]))	
        self.assertEqual(0, newboard.whatsAt([1,2]))	
        self.assertEqual(0, newboard.whatsAt([2,1]))	
        self.assertEqual(1, newboard.whatsAt([3,2]))	
        self.assertEqual(1, newboard.whatsAt([2,3]))	
        self.assertEqual(1, newboard.whatsAt([2,2]))	

    def test_check_other_spaces_contiguous_bombs(self):
     	newboard = app.Board(4,[[0,1],[0,0]])
        self.assertEqual(1, newboard.whatsAt([0,2]))	
        self.assertEqual(2, newboard.whatsAt([1,0]))	
        self.assertEqual(0, newboard.whatsAt([2,1]))	

    def test_off_the_board(self):
     	newboard = app.Board(3,[[0,0],[1,2]])
        self.assertEqual(None, newboard.whatsAt([3,3]))	

This was written in python 2.7, and reminded me of the pleasure of small, from the ground up software (as opposed to gluing together libraries to achieve business objectives, which is what I do a lot of nowadays).

Using JSONP For Angular Requests

screen photo
Photo by Neil T

I was writing an angular app (source) that was accessing the Best Buy API, just to play around and get more familiar with Angular.  All of my previous apps had been to APIs that I controlled, and could thus use CORS to set headers.  Obviously, not so with the Best Buy API.

Whoops.

Luckily Angular makes accessing data via JSONP almost exactly the same as accessing data via XMLHttpRequest/CORS.  Rather than use $http.get, you use $http.jsonp. You have the same promise returned, and can handle the results in the same way. I didn’t dive into error handling (but if Angular follows jQuery’s lead, it looks like there’s none), and obviously JSONP can only be used to read information, but the guts of injecting a script, etc, are all handled for you.

 

RSS Pick: Dion Almaer

dion almaer photo
Photo by marcosfernandez

I think that the RSS reader is such a fantastic invention. It lets me monitor many bloggers and news sites, and see new content.  This lets you have an eye on lots of writers, including some that haven’t written for a long time.  I’m going to be highlighting blogs that I follow, one per month.

The first is Dion Almaer’s, who, unfortunately, has moved most of his writing to Medium.  But Dion is a great technologist.  He currently is employed at WalmartLabs Mobile.  He’s written such gems as:

Your coding voice:

When people ask me about Java and why I don’t often write applications in it, my answer is not that I think “Java sucks”. I think the JVM is amazing technology, and there are a ton of fantastic APIs. Using Java is a great answer for many situations. However, the least amount of fun that I have had programming has been when using the Java language. It isn’t just that it feels frustratingly verbose, although that is part of it.

and Browsers are Finally Catching Up (in 2009):

But, the browsers are finally changing. The new crop come with technologies that show that the browser vendors are thinking about building a platform for desktop quality applications. The Chrome comic book was full of this.

Remember the Chrome Comic Book?

Dion, thanks for sharing your knowledge, please resurrect your blog!  (Dion, I know this is an old photo–feel free to send me a new one and I’ll update this post.)

Avoiding “Module ‘XXX’ is not available” Errors in AngularJS

headbang photo
Photo by Nirazilla

As I continue to build applications with AngularJS, I see how strong the ecosystem is.  While there aren’t quite as many plugins for Angular as there are for JQuery, many of the JQuery plugins have been wrapped up as Angular directives.

One of the issues I banged my head against a couple of times when using a new directive was how many places I had to modify code, and the totally non intuitive error messages that were displayed.

Here are the places you need to modify if you want to use a directive:

  • a module definition (your app or a controller).  You need to add the directive to the list of dependencies: var controllers = angular.module("controllers", [ "localytics.directives",'ngModal' ]);.
  • the index.html file. You need to add links to the javascript files and css that the directive uses. If you are using bower to manage these components, you can find the files under the directory managed by bower.
  • the karma.conf.js file. This file sets up the environment for your unit tests. You want to set up the files attribute to point to the javascript files you added to the index.html page above.

If you don’t add the correct module name to your dependency list, misspell it, or don’t add the javascript to the index.html or your karma configuration file, you will see this error message in your console:

Uncaught Error: [$injector:modulerr] Failed to instantiate module app due to:
Error: [$injector:modulerr] Failed to instantiate module controllers due to:
Error: [$injector:modulerr] Failed to instantiate module ngModall due to:
Error: [$injector:nomod] Module ‘ngModall’ is not available! You either misspelled the module name or forgot to load it. If registering a module ensure that you specify the dependencies as the second argument.
http://errors.angularjs.org/1.2.25/$injector/nomod?p0=ngModall
at http://192.168.0.200:8000/app/bower_components/angular/angular.js:78:12

And the app won’t start.

Hope this helps someone else avoid some head banging.

Cassandra Dev Day Denver

data photo
Photo by justgrimes

Most of my datastore experience is with RDBMS like mysql, oracle and postgresql (though I did work with some key value stores like berkleydb back in the day). So when a full day, free intro to Cassandra was offered, I jumped on it, even though it was in Centennial. You can view the schedule, speakers and talk synopses for the day. There were two tracks, beginner and advanced. Since I didn’t know anything about Cassandra, I followed the beginner track.

First, though, it was amazing how many people were there. The two main companies behind the talk, Pearson Education and DataStax, a vendor providing a commercial, supported version of Cassandra, ended up having to provide two overflow rooms, and it was still standing room only for some of the talks. Quite a nice turnout, and I think the sponsors were pleasantly shocked. I was also surprised by the number of folks from Boulder. I happened to sit next to two folks from Westminster and Superior, and ended up having a common friend or colleague with each. Small world.

I learned a ton about Cassandra, from its internals, to its topology (the ring’s the thing) to abstractions that let you query it (CQL, which is a subset of SQL) to data modeling to using the java driver, which makes accessing Cassandra almost as easy as using JDBC. While there are some SQL concepts that appear to map fairly well to Cassandra, I put quotes around them below to remind myself of the fact that a Cassandra ‘table’ isn’t the same as a RDBMS table, ditto for ‘row’, ‘primary key’ and other important concepts.

I think the biggest takeaway for me was that Cassandra is a “write many, read once” system. Because you can only query efficiently on one or two keys, if you have multiple queries, you want to write the data multiple times in a denormalized system, one ‘table’ for each query. Because of this, Cassandra shines in use cases where you are doing a lot of inserts, have known queries, and need speed and availability (sensor data was mentioned several times).

How does this actually work? Here’s an example (as best I understand it–here are some others from people who actually have experience using this technology):

If you have click stream data, in standard apache format, and you want to be able to have it stored in a database and highly available for a few specific queries, Cassandra might be a good choice. Here’s a line of my clickstream, from my blog:

157.55.39.40 - - [15/Oct/2014:08:03:30 -0600] "GET /wordpress/archives/date/2007/07 HTTP/1.1" 200 66258 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

This is time based data, and has some valuable information, and some not so valuable information. What things might I be interested in querying on? Well, I might care about the user agent, request time, ip address, status code, or URL path requested. I probably don’t care about the HTTP method, HTTP version or the bytes served. But, for the sake of this example, let’s say my application needs to show the location of the most recent 1000 users for a given country, for a fancy widget for my website. I will use maxmind or a similar service for mapping ip addresses to country. That’s all I care about. (Yes, I know this is contrived–I had to revise this example a couple of times to make it fit with Cassandra’s usage model.) I would set up in this ‘table’ in Cassandra.


CREATE TABLE IF NOT EXISTS location (
country text,
time timestamp,
user_agent text,
status_code int,
path text,
PRIMARY KEY (text, time)
)

In this table, country is the partition key, and time is the clustering key. That means that this query: select location from location where country = 'USA' limit 1000; will be screaming fast. If I wanted to look at paths requested by time, I would not add an index to this table, but instead create another whole table, say request_path. Then my insertion code would write to both location and request_path. And then clients that wanted path information would use the specific table. Yup, denormalization is the name of the game.

This means that Cassandra has certain specific use cases, and that trying to use it as a general purpose data storage and query engine is foolish. Several presenters mentioned that Cassandra plus Apache Spark for general queries was a good solution.

Denormalization as standard operating procedure isn’t the only mind bending facet of Cassandra. Others:

  • the presenters also talked about the high availability and replication of Cassandra–you can actually configure it to be data center aware so that it automatically replicates data across different data centers.
  • For each keyspace (a set of tables, so similar to a schema), you specify how a replication factor–how many times each piece of data is stored.
  • For every read or write, you specify how many nodes must either agree or accept the data, respectively.
  • Adding nodes is the preferable way to deal with scale. You can add a node easily, and Cassandra will auto partition data and spread existing data across the new node.
  • The biggest Cassandra setup they mentioned was 75K nodes.
  • Each ‘row’ can have up to 2 billion records. If a row stretches across nodes, you’ll kill performance.
  • There’s a process called ‘compaction’ which is similar to Java’s garbage collection, and just like GC, you have to pay attention to how compaction works, because it will affect performance.

All in all, very interesting day, and I appreciated the experience. One more (interesting) tool to add to the toolbox.