Skip to content

Easyrec: a recommendation engine worth looking at

I love recommendation engines.  These are the software that Amazon has everywhere showing “users who bought this also bought” recommendations.

I love them because they are an easy way to leverage the wisdom of the crowd to help users.  They also get better the more data you feed into them, so once you set one up, it just makes your site better and better.

For a while, I’ve wanted to explore mahout as a recommendation engine solution, but felt intimidated by how much work integration would be.  Luckily, I did a bit of searching and turned up this stackoverflow question about java recommendation engines.

Looking at some of the alternatives, I dug up easyrec, an open source recommendation engine.  Rather than solving a couple of different machine learning problems like mahout does, easyrec focuses on recommendations.

It also has a javascript API (for both sending information and displaying recommendations) and a demo installation you can use on your site, so it is trivial to integrate into a website to see if it works for you.  I did run into an issue with the demo server, but a post to the forums got it resolved in a few days.

Easyrec has support for generating recommendations for more than one kind of item (so if you want to display different recommendations within specific categories of an ecommerce site, that is possible) and is self hostable in any java container (which is recommended if you are going to use it in any commercial capacity).  You can also build the recommendations off of the following actions: views, rating, or purchase.

You can also customize easyrec with java plugins, though mahout definitely offers far more options for configuruation.

I haven’t noticed any speed changes to my site with the javascript installed, though I’m sure adding some more remote javascript code didn’t speed up page rendering.  I noticed an uptick in time on site after I installed it (small, on the order of 5%).

If you have a set of items that are viewed together, using easyrec can leverage the wisdom of the crowds with not much effort on your part.  It’s not as powerful or configurable as alternatives, but it drop dead simple to get started with.  It’s worth a look.

My ODesk experience

A few months ago, I had a friend who mentioned that he was investigating ODesk to find help for his software project.  I’d heard of ODesk before and was immediately interested.  I have a directory of Colorado farm shares which requires a lot of data entry in the early months of each year, as I get updated information from farmers.  So, I thought I’d try ODesk and see if someone could help with this task.

Because this was my first time, I was cautious.  I worked with only one contractor, and only used about 17 hours of her time.  We worked off and on for about 3 months.  She was based in the Philippines, so everything was asynchronous.  We communicated only through the ODesk interface (which was not very good for a messaging system).

I chose her based on her hourly rate (pretty cheap), skillset (data entry) and reviews (very good).  I made the mistake of contacting her to apply for the job, but letting others apply as well, and in the space of 3 days had over 90 applicants for the position.

After selecting her, and her accepting my offer, I created a data entry account, and described what I wanted.  This was actually good for me, as it forced me to spell out in detail how to add, update or remove an entry, which is the start of the operations manual for my site.

After that, I managed each task I’d assigned to her through a separate email thread.  I did light review of her work after she told me she was done updating, and did go back and forth a couple of times over some of the tasks.  In general, she was very good at following instructions, and OK at taking some initiative (updating fields beyond what I specified, for example).  There were some bugs in my webapp that caused her some grief, and some updates I just did myself, as it was quicker to do them than to describe how to do them.

The variety of work you can get done via ODesk is crazy, and the overall ODesk process was very easy.  You just need to have a valid credit card.  If you are looking to start on a project right away, be aware that some lead time is required (they charge your card $10 to validate your account, and that takes some time to process).

Even though it didn’t save me a ton of time, it was a useful experiment and I’ll do it again next year.  For simple tasks that can be easily documented and outsourced, it’s a worthwhile option to explore.  Though be careful you don’t outsource too much!

JFreeChart for the win

test

I wanted to make some charts showing linear regressions, but they needed to be images (so libraries like d3.js were out).  My first inclination was to use Google’s Image Charts API, but it has been deprecated.  I looked around and couldn’t find anything like it.

So, it looked like I’d need to code something up.  Some searching around turned up some great open source projects.  Using JFreeChart and Apache Commons Math, and some blog posts (this one about regression, this one about drawing dashed lines) and a post on StackOverflow about scatter plot basics, I was able to put together the above regression line chart in about half a day.

A tutorial for the Census Data API

I’ve been exploring APIs ever since attending the hackathon at Gluecon last year.  Since I work for a real estate firm, I’m interested in APIs with a geographic component.  I had heard that USA Today had APIs for census data.  They do.  I also wondered if the census bureau had any APIs available.  They do.  However, the USA Today APIs are much much friendlier.

But the USA Today APIs have some limitations.  Also, the data sets of these two APIs intersect, but neither is a subset of the other.  So, I thought it’d be useful to have a quick start for the census data API.

  1. First, register for an API key.  This only requires an email address and a name.  The key got sent to my spam folder, so make sure you check that if you don’t see the email about five minutes after you submit.
  2. Decide what type of information you want to find.  There are two sets of information–the 2010 census summary file or thee American community survey (for the last five orsix years).  Since the ACS is not available in the USA Today  API, that’s what I’ll explore, but the methodology is similar for the summary file.
  3. The first place to look is the variables XML file showing you what data is available.  Here is the ACS variables file for the most recent five years, and here is the census summary variables file.  For a high level overview of what you can find, you’ll want to focus on the ‘concept’ tags.  You’ll see names like “C24070.  Industry by Class of Worker for the Civilian Employed Population 16 Years and Over” and “B24123.  Detailed Occupation by Median Earnings for the Full-Time, Year-Round Civilian Employed Female Population 16 Yrs and Over”.  While the Census has documents explaining what each of these mean, this website seems to have aggregated the metadata that will let you explore the types of data available.
  4. OK, you’ve found your interesting data.  I’m going to look at the concept “B08126.  MEANS OF TRANSPORTATION TO WORK BY INDUSTRY” for the rest of this example.  You need to pick a variable from the concept, so I’ll pick B08126_003E/Construction and some subsets: B08126_018E/Car, truck, or van – drove alone, B08126_033E/Car, truck, or van – carpooled, and B08126_048E/Public transportation (excluding taxicab). As you see, you can request multiple variables at once.
  5. Now you need to find the area that you want to look at.  The census developer site has examples which I found to be massively confusing when I first looked at them.  What you need to do is decide what level of geography you want, and then find the appropriate code.  Here are state codes.  If you want anything more detailed, you’ll probably want a FIPS code which ranges from states to as small as the Census data is gathered.  Here’s a list of valid geographic delineations.  You may have the information if you are interested in a county, but if all you have is a lat/lng (as if, say, you are looking at a property listing), you can look up the FIPS code.  If you need to get a lat/lng for a city, I’d suggest google.
  6. I’m going to look up Lafayette, CO, which google tells me is at 39.9936, -105.0892.  So, the FIPS request looks like http://data.fcc.gov/api/block/find?latitude=39.9936&longitude=-105.0892&showall=false&format=json and reveals that the FIPS code for this lat/lng is 080130608005010.
  7. A FIPS code is broken into sections.  The first two numbers are the state, so 08 is Colorado’s code.  The next three are the county, so 013 represents Boulder county.  The next six are the tract code, so 060800 is the tract that we are looking at (tracts “provide a stable set of geographic units for the presentation of decennial census data”).  You can see tracts on maps, and our specific tract [pdf].  After the tract comes the block, which is an even smaller subsection of homes.  The block is 5010.  Apparently FIPS codes are on their way out, in favor of GNIS codes.  I couldn’t find a lat/lng to GNIS code service, however.
  8. Now you know what area you are looking at.  Rather than the county, I want to examine census data based on the tract I’ve found (good old tract 060800).
  9. Next step is to build the URL for the API request.  Here’s where the examples come in handy.  For tract level data for the ACS, we look at this example, and see the format of the call: http://api.census.gov/data/2011/acs5?get=[the data code that you found above]&for=tract:[tract code or *]&in=state:[state code]+county:[county code]&key=[your key]
  10. Therefore, my API request looks like http://api.census.gov/data/2011/acs5?get=B08126_033E,B08126_018E,B08126_048E&for=tract:060800&in=state:08+county:013&key=… and I get something like this: [[“B08126_003E”,”B08126_033E”,”B08126_018E”,”B08126_048E”,”state”,”county”,”tract”],[“309″,”57″,”188″,”0″,”08″,”013″,”060800”]] stating that there are 309 construction jobs reported in this tract that answered the transportation question, and that no one took public transport, but more carpooled than drove alone.

Note that I ignored the margin of error fields, but they are there.  Now you have the data, you can use something like d3.js to display it.

Note that the TOS of the Census APIs don’t mention explicit limits but if you are planning to make a lot of calls, consider downloading the full dataset and using that instead.

Finally, you can sign up for the Census API developer forums, though this seems to be more of a feature request forum and less of a technical help arena.  And there doesn’t appear to be anonymous browsing of forums–you have to sign in to view anything.

Update 1/22: From the developer forums comes the comment that the forums are meant to be both technical support and feature requests.

RSS email campaigns

Email is one of the best ways to keep users coming back to your site.  I’d lay out the arguments, but Patrick McKenzie already has.  Go read that article if you doubt my statement.

If you generate a lot of content via your blog, then an email campaign that pulls from your RSS feeds is a great way to generate newsletter content.  If you have two sources of content–a content blog and a link blog, for example–you can have the newsletters pull from both RSS feeds and have them feed different sections of your newsletter.

I am a big fan of mailchimp as an email delivery service (they have a great free plan I’ve used multiple times), and was ready to roll up my sleeves and use their API to read from a set of RSS feeds and generate a newsletter from that feed.  But, luckily, they’ve already implemented RSS email campaigns.  If you want a monthly newsletter and were confident of new articles in your RSS feeds, you can schedule the newletters out for a year, and know that you’d be sending out topical fresh content every month.  (Note that if you didn’t have new articles, the newsletter would still be sent if you didn’t intervene.  And I can’t think of something that would make an internet company look dumber than to send the same content twice in an email newsletter.)

I’m aware of solutions like paper.li that allow you to aggregate multiple sources of content in a pretty package, but mailchimp seems to give you more control and can be used for non RSS email campaigns as well.

Take a look!

TimelineJS installation tips

TimelineJS is a JS library for making fantastic looking timelines.  The data that drives these timelines can be in JSON or in google spreadsheets.

I wanted to install this on a site I manage, but ran into a couple of issues.

First, there’s not great installation instructions.  Basically, you need to download the source from github and then copy the ‘compiled’ directory wherever you want the files to go.

I got the files installed, then cribbed from the google spreadsheet example.

Then, I kept seeing this error in the javascript console:  ReferenceError: VMM is not defined.  (That is the Firefox error message–Chrome had one that was slightly different.)  Looking in the chrome javascript console revealed that some js and css files were not being downloaded–they were 404ing.  Adding the js and css config options with full paths to the timeline-min.js and timeline.css files fixed this error.

The last issue I ran into was that the div into which I was putting the timeline was not 100% of the height of the page.  I wanted other content around it.  But the timeline wasn’t showing up at all.  I fixed this issue by putting another div around the timeline containing div, and setting the outer div’s height explicitly.  Then I made the containing div’s height 100%.

Hope these tips help someone out.

Hackfest tips for companies with few developers

Last year, my company ran a hackfest.  This year, we are doing it again.  The company I work for, 8z Real Estate, is about 20% real software developers, though everyone at the company is familiar with software and technology.

How do we run a successful hackfest when only a few employees can build software?

  • Include everyone.  It will be a richer, more fun experience with more people.  Get executive buy in–I found the original ‘fed ex day’ post helpful in explaining the idea.
  • Set goals and expectations.  At a typical hackfest (or hackathon), running code is the goal.  For us, autonomy and exploration is more important.  In the announcement email we state: “the idea is to give everyone a chance to do something work related that they want to do, or try, or explore, but don’t have time to because of the hustle and bustle of work life.”
  • Reset deliverable expectations.  Rather than running code, the deliverable at a typical hackfest,  at an event with many non technical attendees other deliverables should be embraced.  Spreadsheets, powerpoints, text documents, mockups, link gallerys, images–these are all artifacts that capture an exploration.  They can also be referred to in the future.  (I don’t think a presentation without an artifact is a good idea, because of the lack of permanence, although I guess you could videotape it.)
  • Encourage developers to work on different teams.  Spreading the developer viewpoint and code writing ability across as many groups as possible is a good thing, as it will allow the groups to push their ideas further. That said, if a developer really wants to work on his or her own idea, don’t force them to join a team.
  • Make sure contractors feel welcome.  Because this isn’t a typical workday, it can be difficult to justify paying contractors to attend.  But a hackfest reinforces company culture and can make contractors feel part of the team.  We compromise by inviting contractors and paying them a mutually agreed upon reduced hourly rate.  If they are technical, this also adds to the pool of developers as well.
  • Have the hackfest onsite, preferably in one conference room. Especially for the first one, the hum of people working will be motivating and exciting.
  • Have the hackfest happen on one day.  Pick one that is slow–for real estate, that means closer to the year end holidays.
  • Plan for some ‘normal’ work to be done on the day of the hackfest.  We need to provide daily customer support, so on hackfest day we try to compress a full day’s work into a few hours, then shut off the phones.

Then, some general hackfest principles that I believe are true no matter what the attendee’s skillset.

  • Start with a timeboxed ideation whiteboard session.  This lets everyone see all the great ideas, and find what interests them.
  • Use the ideation session to head off any ‘typical work’ tasks that people suggest (‘I just have to verify 4 more bugs on the foobar widget’).
  • Let teams self organize, but encourage cross pollination between departments and teams.  A hackfest is a great way to build trust between people who don’t normally work together.  On the other hand, if someone is very passionate about an idea that no one else cares enough about team up on, let that person pursue their passion.
  • Handling managers at a hackfest is a sticky subject.  On the one hand, there is benefit to treating them as another employee–they get the benefits of working with different people and ideas.  On the other hand, because of their job, they may (unintentionally!) warp the autonomy of the team.  Last year, the CEO worked alone, but all the other managers were treated as employees.  Discuss this issue, especially with higher level executives, before the day of the hackfest.
  • Order in lunch, which keeps the momentum going.

Any tips for a good hackfest, especially one attended by fewer developers than non technical people?

Trying out a habit

I have been wanting to practice meditation for the longest time.  Periodically, I would subscribe to newsletters, read articles, download apps (I love the Chakra Chime app) watch videos, and get fired up about the benefits.  Then I would meditate for one or two days, and then would have a tough day and fall into bed exhausted, meditation forgotten.  Having fallen off the bandwagon one day, it was easier to skip it the next day, then meditate the following day, then skip it the next three days, until I wasn’t meditating at all.

I mentioned this difficulty to Corey, a friend, and he recommended a different approach.  It has three components:

  1. A monthly calendar.  You can print one out from this site.  Write the activity at the top.  Put it by your bed.
  2. A sharpie.  Put it by your calendar.
  3. An agreement with yourself that no matter what, you’ll do what you want to do once a day.
Once you perform the activity, you can put a big fat X on that calendar.  I’ll tell you what, once you get four or five Xes, you start to gain some momentum.  Even when I’ve had some really long tiring days, I still want to keep the streak going, and the calendar provides that extra bit of motivation to do it.
I don’t know if I’ll continue to meditate once I’m done with the calendar, but at the least this method made it easy to try it out as a habit.  If you have a habit you’ve been wanting to try, but haven’t been able to make room in your life for, try Corey’s three step method.

 

Run through the finish

Make that last code change.

Write that last test.

Look in that document, rather than trying to remember what color the icon is supposed to be.

Write that documentation.

Look at your work with the eyes of a user.

I work in a small development department (2 developers plus a number of contractors) and I need to constantly remind myself to run through the finish.

I ran in high school and college (cross country and track) and at the very end, it is easy to let your guard down and coast.  After all, you’ve done almost all of the hard work.  And no one is really behind you.  And it hurts (oh yes, it hurts).  So, why not ease off a bit?

The problem is, there is someone behind you, and they have you in their cross hairs.  They have the incentive and the vision of their competition, but you have the lead.  Why give up any advantage?

Development isn’t painful, but it can be a slog.  Yes, yes, I’m sure there are shops that never have a slog.  But, for most mortals, there are requirements that are changed or were forgotten, tasks that are less fun than others, tweaks to the UI for the Nth time, vendor rigidity that never surfaced during the sales process, and other sundry annoyances.  But you as the developer own the final product.  You can choose to coast, since you did 99% of the work (and did it well), or you can choose to run through the finish.

Run through the finish.