Skip to content

Monthly Archives: September 2018

What makes you a better developer, working at an early stage startup or working in a team?

Bike courierI gave a presentation at Boulder.rb last night about my experience being the technical co-founder of a startup for two years. After the presentation, someone asked a really interesting question. Between working as a solo co-founder of a startup or working in a larger team at an established company, which experience makes you a better developer?

First, a digression. I often commute around town by bike. There are many benefits to doing so, but one of the ones I think about a lot is being on a bike gives you the ability to move through streets more freely. Specifically, you can switch between acting like a car (riding on the road) and acting like a pedestrian (riding on the sidewalk). Used judiciously, this ability can get you places faster than either (hence, bike messengers).

In my mind, a true developer is like that. They can bounce between the world of software (and across the domains within it) and the world of business to solve problems in an efficient manner.

Back to the question at hand. I think that the answer is based on what you mean by better. Are you looking to gain:

  • customer empathy
  • ability to get stuff done through barriers of ignorance and resource constraint
  • a wide set of experience across a lot of different software related domains (security, operations, ux, data modelling, requirements gathering, planning, bug fixing, etc)

If getting these skills make you into the better kind of developer you want to be, you will be well served by being a technical co-founder or founding engineer (more thoughts about the distinction here).

If on the other hand, you are looking for:

  • deep knowledge of a smaller subset of the software world
  • the ability to design software for long term maintainability and performance
  • experience working with a team of stakeholders, each with a different perspective on the problem you are solving

then you are seeking a place on a team, with process, code reviews, conference attendance and free snacks (most likely).

Who is a better developer? The person with experience working with (possibly leading) a team and deep knowledge of a subset of technology? Or the person who can be a jack of all trades and take a product from an idea to something customers will pay for?

I’m going to leave you with the canonical consultant’s answer: “It depends.” The former I’d call an expert programmer and the latter a true developer. They are both extremely valuable, but are good in different companies situations.

Amazon Alexa

I had a lot of fun working on a one day ‘hackfest’ project with Amazon Alexa. I learned a lot about voice UX and Alexa implementation details.It’s an interesting platform, especially if you have broad brand recognition and can deliver high level valuable information via short chunks of text.

From my blog post on the Culture Foundry site:

The multi step interaction is a bit clunky, but I think it’s a great way to avoid collisions between different skills. Basically, the user calls out an ‘invocation’ like ‘open color picker’. Interactions with Alexa after that are send directly to that particular skill until an end point is reached in the interaction tree. Each of these interactions is triggered by a different voice command, and is handled by something called an ‘intent’. Intents can have multiple triggering commands (‘what is my favorite color’ vs ‘what is my color’, for example). There’s also a lightweight, session level storage while the entire invocation is occurring, which means you can easily pass data between intents without reaching out to a more persistent data storage.

You can read the whole post over there.

Aborted Adventures with Amazon Athena and US PTO data

GoddessesI was playing around recently with some data (from the US Patent and Trademark Office), trying to import it into S3 and then to Athena. Athena is a serverless big data query engine with schema on read semantics. The data was not available on the AWS public dataset repo. Things didn’t go as well as planned. Here’s how I wanted them to go:

  1. download some data
  2. transform it into CSV (because Athena doesn’t support currently XML and I didn’t want to go full EMR, even though hive supports XML)
  3. upload it to s3 bucket
  4. create a table based on the data
  5. run some interesting queries using Athena
  6. possibly pull some of the data in Amazon Machine Learning to do some predictions
  7. possibly put some of the data in an s3 bucket as JSON and use datatables to create a nice user interface

Like pretty much every development project I’ve ever been part of, there were surprises. What was different is that I had a fixed amount of time since this was an exploratory project, I set a timebox. I didn’t complete much of what I wanted to get done, but wanted to document what I did.

I was able to get through step 5 with a small portion of data (13k rows). I ended up working a lot on windows because I didn’t want to boot up a vagrant box. I spent a lot of time re-learning XSLT in order to pull the data I wanted out of the XML. I used a tool called xmlstarlet for this, which worked pretty well with the small dataset. Here’s the command I ran to pull out some of the attributes of the XML dataset (you can see that I also learned about batch file arguments):

xml sel -T -t -m //case-file -v "concat(serial-number,',',registration-number,',',case-file-header/registration-date,'\n

,',case-file-header/status-code,',',case-file-header/attorney-name)" -n %filename% > %outfile%

And here’s the Athena schema I created:


CREATE EXTERNAL TABLE trademark_csv (
serialnumber STRING,
registrationnumber STRING,
registrationdate STRING,
statuscode INT,
attorneyname STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
LOCATION 's3://aml-mooreds/athena/trademark/';

After I had done the quick prototype, I foolishly moved on to downloading the full dataset. This caused some issues with disk storage and ended up taking a long time (the full dataset was ~300 files from 500M to 2GB in size, each containing about 150k records). I learned that I should have pulled down one large file and worked it through my process rather than making automating each step as I went. For one, xmlstarlet hasn’t been updated for years, and I couldn’t find a linux package. When tried to compile it, it was looking for libxml, which was installed on my ec2 instance already. I didn’t bother to head further down this path. But I ran into a different issue. When I ran xmlstarlet against a 500MB uncompressed XML file, it completed. But any of the larger files caused it to give an ‘out of memory’ error. I saw one reference in the bugtracker, but it didn’t seem to apply.

So, back to the drawing board. Luckily, many languages have support for event based parsing of XML. I was hoping to find a command line tool that could run XSLT in order to reuse some of my logic, but it doesn’t appear to exist (found this interesting discussion and this one). python seemed like it might work well.

Then I ran out of time. Oh well, maybe some other time. It is fun to think about how I can automate all of this. I was definitely seeing where lambda functions and some other AWS features could have fit in nicely. I also think that using RDS might have made more sense than Athena, given the rate of update and the amount of data.

Lessons learned:

  • what works for 13k records won’t necessarily work when you have 10x, let along 100x, that number
  • work through the entire pipeline with real world data before automating any part of it
  • use EC2 whenever you need to download a lot of data
  • make sure your buckets and athena are in the same region. I wasn’t, and there was no warning. That’s fine with small data, but could have hurt from a financial viewpoint if I’d been successful at loading the whole dataset
  • it can be fun to play around with this type of stuff, but having a timebox keeps you from going down the rabbit hole too far

Atoms and Bits

Atom

I found this post by Fred Wilson about atoms and bits to be a nice counterpoint to this post from a few years back by Chris Dixon. There are so many advantages that a software product has over a business that delivers physical goods, whether that is inventory, startup costs, or distribution. (Yes, if you build a mobile app, you have to pay the vig to Apple/Google, but you also can distribute at zero incremental cost to hundreds of millions of people.) From Fred’s post:

we should … understand that the timelines [for working on businesses that focus on atoms] will be longer and the road to adoption will be more challenging.

The fact is that atoms are just harder to work with than bits. However, that very difficulty can provide a moat around your business (“where there’s muck there’s brass”). If your business is bits, find another moat (branding, network effects, niche domination).