Skip to content

Ever felt like your codebase was out of control?

I certainly have. A couple of times in my career the combination of technical debt, business model shift and lack of time for a proper fix have left me feeling out of control.

But reading this post on Hacker News made me realize that it all could have been so so much worse. A couple of “best ofs”:

To give you some examples, I originally came on as a contractor because they had some refactoring they wanted done. The entire system was home built (including the programming language) and there was a file size limit of 32,767 lines. They had many functions that were approaching this limit and they didn’t know what to do, so they hired me.

and:

Once upon a time, there was a search product and one of the data sources that it could search was a Solr/Lucene database. This should be no problem, since search is what Solr does. It should be as simple as passing the user’s query through to Solr and then reading the response. The problem was, it was important to know exactly which parts of any matched records were relevant to the search.

 

The Guy Before Me™ decided that the best way to implement this would be to split the user’s search into individual words, perform a separate search query through Solr’s HTTP API for each individual word, and then do a bunch of very clever and complex post-processing on the result sets to combine them into a single set of results.

and (last one, I promise):

At my first gig I teamed up with a guy responsible for a gigantic monolith written in Lua. Originally, the project started as a little script running in Nginx. Over the course of several years, it organically grew to epic proportions, by consuming and replacing every piece of software that it interfaced with – including Nginx.

 

There were two ingredients in the recipe for disaster. The first is that Lua comes “batteries excluded”: the standard library is minimalist and the community and set of available packages out there is small. That’s typically not an issue, as long as one uses Lua in the intended way: small scripts that extend existing programs with custom user logic (e.g. Nginx, Vim, World of Warcraft). The second is that Lua is a dynamic language: it’s dynamically typed, and practically everything can be overridden, monkey patched and hacked, down to the fundamental iterators that allow you to traverse data structures.

shivers. There, but for the grace of God.

AWS Advent Amazon Machine Learning Post

Last year I wrote about AWS Advent, which is an exploration of the vast reaches of AWS in the first 24 days of December. This year, I submitted a post for it. And that post is now up and available. I wrote about Machine Learning (ML) generally and Amazon Machine Learning (AML) specifically. From the post:

AML Models are immutable, as are data sources. If you need to incorporate ongoing data into your model, which is generally a good idea, you need to automate your datasource and model building process so they are repeatable. Then, when you have new data, you can rebuild your model, test it out, and then change which model is “in production” and making predictions.

Plenty more stuff in there about the ethics of ML and the various pieces of the AML pipeline. Check it out.

 

Moving a database driven side project to Netlify

FishermanI recently moved a side project to Netlify. Netlify, if you aren’t familiar with it, is a static file hosting service with a great workflow, free SSL certificates, and a built in CDN. I made the move because the side project was a database driven site, but didn’t really use other server side interactivity (no user generated content, etc). I haven’t invested much in the side project over the past couple of years, but it still gets tens of hits a day and is useful to a users. It’s a top hit on google for certain keywords. I didn’t want to spend time updating the underlying application, but wanted it to be more secure. The site is only updated over a month or so every year. It is then read only for the next eleven months of the year, as it provides information on a very seasonal service.

Moving to Netlify (or, frankly, any other static site provider) provides the following benefits:

  • extremely low cost (Netlify is free for the level of traffic I have).
  • faster browsing experience for the end user (due to being behind a CDN).
  • free SSL certificates, with no hassle of setting up letsencrypt myself.
  • indirection–the source site can now be anywhere. I could put it on an ec2 instance that I only boot up when I need to push a new build, or could even host it locally. This means that I don’t need to worry about the updating the source application.

I could have chosen to do this with S3/Cloudfront/AWS Certificate manager/Lambda (or using technologies from some other cloud provider). But even though I’m familiar with all the steps to do this, it was lot of work. And Netlify was free and had done all the grunt work of bringing all these technologies (or similar ones, I’m not familiar with the tech behind the site, but they do write about it a bit).

What steps did I take to move a database driven directory site to Netlify? Enough that I wanted to document this for my future self.

  • Change the DNS for the site to have a lower TTL. During the transition from your site to Netlify, users may see versions served from either the old site or the new site, and you want to minimize that.
  • Remove all forms and other server side interactivity that your site has. You can replace them with javascript driven interactivity (like Disqus for comments). Netlify has some support for functions, but I didn’t explore that.
  • Set up Netlify using a default URL (they provide it, it is something like ‘foo-bar-123.netlify.com’). I deploy from Bitbucket. (I know some people hate on Bitbucket, and I don’t like all of their UI, but they are free for private repos, which is a win for me.) This let me understand how to deploy a simple one page site.
  • Download the site. I used wget: wget -mk http://mysite.com/ . The switches set up mirroring and convert all links to be relative.
  • Move the site to a subdirectory (I used web) and configure Netlify to deploy the HTML from that subdirectory. This lets you have other scripts outside of the webroot that can help you build the site.
  • Check in the site with the downloaded content and push it up to Bitbucket.
  • Watch the site be deployed to the Netlify default URL.
  • Access the site via SSL: ‘https://foo-bar-123.netlify.com’ and look in the console for “mixed active content” errors. Fix those as applicable. (If this was for a client, rather than a side project, I’d solve all of these.) Note that if the source site is http only, you may need to hardcode https URLs.
  • See that “pretty” html pages like ‘https://foo-bar-123.netlify.com/about’ are rendered as text.
  • Ask on Stackoverflow about the problem.
  • End up solving it by generating a _headers file after downloading the site. Update Stackoverflow question with the answer.
  • Test again that the site at the default URL looks good.
  • Update the webserver config and DNS for the database driven site. I wanted it to answer to both the old addresses: mysite.com/www.mysite.com and a new address: generator.mysite.com
  • Update DNS to point mysite.com and www.mysite.com to the Netlify site. Instructions here.
  • Wait for DNS to propagate. When it does, check to make sure that SSL works.
  • Add a password for generator.mysite.com
  • Test the download process with the password (the wget command changes to wget --user=USER --password=PASS -mk http://mysite.com/
  • Write a script to do the download process.
#!/bin/sh
wget -mk  --user=USER --password=PASS http://generator.mysite.com/
mv generator.mysite.com mysitecom
cd mysitecom
find `pwd` -type f |grep  -v \\. |sed "s#`pwd`/##" > list 
for i in `cat list`; do echo "/$i" >> _headers; echo "  Content-Type: text/html" >> _headers; done
rm list
cd ..
rm -rf web
mv mysitecom web
git add web
git commit -m "pull in latest from generator.mysite.com"
# could do a git push here as well if you wanted

Now, any time that I make changes to the site, I need to run this script and push to Netlify. I could put it in cron or a scheduled lambda if I thought I was going to make changes often. For now, I’m content to run it manually. One downside is that I self-host my analytics code and that site is not SSL enabled. So I’ll need to make a change if I want to continue to get statistics.

So, in the end I have a fast, secure site that is hosted outside my infrastructure and that is easy to update. This process, while not trivial, was easy enough that it has me thinking about where else I could use it (this blog, for instance). Highly recommend.

Excited to be speaking at Develop Denver

Coffee break at a conferenceI’m excited to be speaking at Develop Denver. This is a local conference with a wide variety of topics of interest to developers, designers and in general anyone who works in the interactive industry. From their website, they want to:

[bring] together developers, designers, strategists, and those looking to dive deeper into the interactive world for two days of hands on code & design talks.

I’ll be doing two presentations. The first is my talk on Amazon Machine Learning, which I’ve presented previously. The second is a lightning talk on the awk programming language. I’m excited to be presenting, but I’m also looking forward to interesting talks from other speakers, covering topics such as IoT, software development for the developing world, web scraping, APIs, oauth, software development, and hiring practices. (That list is tilted toward my interest in development–there’s plenty for everyone.)

If you’re able to join, it’s happening in about two weeks in downtown Denver (Oct 18-19 in the RiNo district). Here’s the link for tickets, and here’s the agenda.

Fifteen years

Birthday cakeWow. 15 years ago today I wrote my first post. Thank you for reading!

I’ve been blogging for longer than I’ve done any other activity in my life. Longer than any job, any romantic relationship, any hobby.

And I can’t see stopping. Sure, I may use other outlets, but I’m trying to keep the streak of no month going by without a post going (darn you, November 2011!). I think that writing is an essential part of software development and it’s been an essential part of my reflective process. I primarily blog for myself. But it is still fun when folks reach out to me and let me know my writing has helped them in some way. It’s also fun when I google a problem and realize that I’ve already written up an answer.

I encourage everyone I meet to blog. It is a fantastic way to accumulate credibility and expertise. (You never learn something as well as when you are trying to explain it to others.) It also gives me a chance to expound on some of my thoughts around jobs, business, and software development. I’m lucky that my work almost always is something interesting enough to write about, at least once a month. Maybe one of these days I’ll put together a ‘best of’ and make a PDF.

I also think it is important to blog at a place you control. Yes, Medium or Facebook make it easier to gain a following. But what they giveth, they can taketh away. I may only have limited engagement here on this blog, but what I write is mine and will never be anyone else’s.

Hard to believe that my blog will be able to drive next year. Again, thanks for reading.

What makes you a better developer, working at an early stage startup or working in a team?

Bike courierI gave a presentation at Boulder.rb last night about my experience being the technical co-founder of a startup for two years. After the presentation, someone asked a really interesting question. Between working as a solo co-founder of a startup or working in a larger team at an established company, which experience makes you a better developer?

First, a digression. I often commute around town by bike. There are many benefits to doing so, but one of the ones I think about a lot is being on a bike gives you the ability to move through streets more freely. Specifically, you can switch between acting like a car (riding on the road) and acting like a pedestrian (riding on the sidewalk). Used judiciously, this ability can get you places faster than either (hence, bike messengers).

In my mind, a true developer is like that. They can bounce between the world of software (and across the domains within it) and the world of business to solve problems in an efficient manner.

Back to the question at hand. I think that the answer is based on what you mean by better. Are you looking to gain:

  • customer empathy
  • ability to get stuff done through barriers of ignorance and resource constraint
  • a wide set of experience across a lot of different software related domains (security, operations, ux, data modelling, requirements gathering, planning, bug fixing, etc)

If getting these skills make you into the better kind of developer you want to be, you will be well served by being a technical co-founder or founding engineer (more thoughts about the distinction here).

If on the other hand, you are looking for:

  • deep knowledge of a smaller subset of the software world
  • the ability to design software for long term maintainability and performance
  • experience working with a team of stakeholders, each with a different perspective on the problem you are solving

then you are seeking a place on a team, with process, code reviews, conference attendance and free snacks (most likely).

Who is a better developer? The person with experience working with (possibly leading) a team and deep knowledge of a subset of technology? Or the person who can be a jack of all trades and take a product from an idea to something customers will pay for?

I’m going to leave you with the canonical consultant’s answer: “It depends.” The former I’d call an expert programmer and the latter a true developer. They are both extremely valuable, but are good in different companies situations.

Amazon Alexa

I had a lot of fun working on a one day ‘hackfest’ project with Amazon Alexa. I learned a lot about voice UX and Alexa implementation details.It’s an interesting platform, especially if you have broad brand recognition and can deliver high level valuable information via short chunks of text.

From my blog post on the Culture Foundry site:

The multi step interaction is a bit clunky, but I think it’s a great way to avoid collisions between different skills. Basically, the user calls out an ‘invocation’ like ‘open color picker’. Interactions with Alexa after that are send directly to that particular skill until an end point is reached in the interaction tree. Each of these interactions is triggered by a different voice command, and is handled by something called an ‘intent’. Intents can have multiple triggering commands (‘what is my favorite color’ vs ‘what is my color’, for example). There’s also a lightweight, session level storage while the entire invocation is occurring, which means you can easily pass data between intents without reaching out to a more persistent data storage.

You can read the whole post over there.

Aborted Adventures with Amazon Athena and US PTO data

GoddessesI was playing around recently with some data (from the US Patent and Trademark Office), trying to import it into S3 and then to Athena. Athena is a serverless big data query engine with schema on read semantics. The data was not available on the AWS public dataset repo. Things didn’t go as well as planned. Here’s how I wanted them to go:

  1. download some data
  2. transform it into CSV (because Athena doesn’t support currently XML and I didn’t want to go full EMR, even though hive supports XML)
  3. upload it to s3 bucket
  4. create a table based on the data
  5. run some interesting queries using Athena
  6. possibly pull some of the data in Amazon Machine Learning to do some predictions
  7. possibly put some of the data in an s3 bucket as JSON and use datatables to create a nice user interface

Like pretty much every development project I’ve ever been part of, there were surprises. What was different is that I had a fixed amount of time since this was an exploratory project, I set a timebox. I didn’t complete much of what I wanted to get done, but wanted to document what I did.

I was able to get through step 5 with a small portion of data (13k rows). I ended up working a lot on windows because I didn’t want to boot up a vagrant box. I spent a lot of time re-learning XSLT in order to pull the data I wanted out of the XML. I used a tool called xmlstarlet for this, which worked pretty well with the small dataset. Here’s the command I ran to pull out some of the attributes of the XML dataset (you can see that I also learned about batch file arguments):

xml sel -T -t -m //case-file -v "concat(serial-number,',',registration-number,',',case-file-header/registration-date,'\n

,',case-file-header/status-code,',',case-file-header/attorney-name)" -n %filename% > %outfile%

And here’s the Athena schema I created:


CREATE EXTERNAL TABLE trademark_csv (
serialnumber STRING,
registrationnumber STRING,
registrationdate STRING,
statuscode INT,
attorneyname STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
LOCATION 's3://aml-mooreds/athena/trademark/';

After I had done the quick prototype, I foolishly moved on to downloading the full dataset. This caused some issues with disk storage and ended up taking a long time (the full dataset was ~300 files from 500M to 2GB in size, each containing about 150k records). I learned that I should have pulled down one large file and worked it through my process rather than making automating each step as I went. For one, xmlstarlet hasn’t been updated for years, and I couldn’t find a linux package. When tried to compile it, it was looking for libxml, which was installed on my ec2 instance already. I didn’t bother to head further down this path. But I ran into a different issue. When I ran xmlstarlet against a 500MB uncompressed XML file, it completed. But any of the larger files caused it to give an ‘out of memory’ error. I saw one reference in the bugtracker, but it didn’t seem to apply.

So, back to the drawing board. Luckily, many languages have support for event based parsing of XML. I was hoping to find a command line tool that could run XSLT in order to reuse some of my logic, but it doesn’t appear to exist (found this interesting discussion and this one). python seemed like it might work well.

Then I ran out of time. Oh well, maybe some other time. It is fun to think about how I can automate all of this. I was definitely seeing where lambda functions and some other AWS features could have fit in nicely. I also think that using RDS might have made more sense than Athena, given the rate of update and the amount of data.

Lessons learned:

  • what works for 13k records won’t necessarily work when you have 10x, let along 100x, that number
  • work through the entire pipeline with real world data before automating any part of it
  • use EC2 whenever you need to download a lot of data
  • make sure your buckets and athena are in the same region. I wasn’t, and there was no warning. That’s fine with small data, but could have hurt from a financial viewpoint if I’d been successful at loading the whole dataset
  • it can be fun to play around with this type of stuff, but having a timebox keeps you from going down the rabbit hole too far

Atoms and Bits

Atom

I found this post by Fred Wilson about atoms and bits to be a nice counterpoint to this post from a few years back by Chris Dixon. There are so many advantages that a software product has over a business that delivers physical goods, whether that is inventory, startup costs, or distribution. (Yes, if you build a mobile app, you have to pay the vig to Apple/Google, but you also can distribute at zero incremental cost to hundreds of millions of people.) From Fred’s post:

we should … understand that the timelines [for working on businesses that focus on atoms] will be longer and the road to adoption will be more challenging.

The fact is that atoms are just harder to work with than bits. However, that very difficulty can provide a moat around your business (“where there’s muck there’s brass”). If your business is bits, find another moat (branding, network effects, niche domination).

Going to GopherCon and Thoughts on Golang

GopherI am excited to go to GopherCon this year. I’ve been maintaining a couple of codebases written in Go/golang. Some are smaller webhook and automation programs, but a couple are larger data processing systems–take this data from these two sources, munge it a bit, put it over here. (Unfortunately, this is all custom, not using an ETL toolkit like, say, ratchet).

I’ve found Go to be an interesting challenge. It’s C based, but there are a few wrinkles/idioms that I’ve enjoyed figuring out (and more that I’m learning).

Things that surprised me:

  • GOPATH and the need for a domain name in paths.
  • That you have to search on golang to find anything useful.
  • The fact that any file in a package can add functions to any struct (I think I have the terms correct, please forgive me if I don’t)
  • The lack of an editor that can do reference searching (“show me all places this function is called”). I think VS.Code can do this, but have downloaded it and the Golang extension and can’t seem to figure out it. (This is likely my failing, not golangs, but I was looking forward to coding in a static language for just this reason. Well, this and safe refactoring.)
  • The strictness. I’m actually pleasantly surprised by it (no unused variables seems like such a no-brainer!) but golang is quite opinionated in terms of language syntax.

I unfortunately haven’t had as much time to write golang as I planned when signing up for this conference, but I’m looking forward to meeting some other folks and the excitement that always happens when you attend a conference. In particular, I’m looking forward to “Go says WAT?” which is patterned after the famous WAT video and “From prototype to production: Lessons from Reddit’s ad platform”. Hope to see you there.