Amazon Alexa

I had a lot of fun working on a one day ‘hackfest’ project with Amazon Alexa. I learned a lot about voice UX and Alexa implementation details.It’s an interesting platform, especially if you have broad brand recognition and can deliver high level valuable information via short chunks of text.

From my blog post on the Culture Foundry site:

The multi step interaction is a bit clunky, but I think it’s a great way to avoid collisions between different skills. Basically, the user calls out an ‘invocation’ like ‘open color picker’. Interactions with Alexa after that are send directly to that particular skill until an end point is reached in the interaction tree. Each of these interactions is triggered by a different voice command, and is handled by something called an ‘intent’. Intents can have multiple triggering commands (‘what is my favorite color’ vs ‘what is my color’, for example). There’s also a lightweight, session level storage while the entire invocation is occurring, which means you can easily pass data between intents without reaching out to a more persistent data storage.

You can read the whole post over there.


Aborted Adventures with Amazon Athena and US PTO data

GoddessesI was playing around recently with some data (from the US Patent and Trademark Office), trying to import it into S3 and then to Athena. Athena is a serverless big data query engine with schema on read semantics. The data was not available on the AWS public dataset repo. Things didn’t go as well as planned. Here’s how I wanted them to go:

  1. download some data
  2. transform it into CSV (because Athena doesn’t support currently XML and I didn’t want to go full EMR, even though hive supports XML)
  3. upload it to s3 bucket
  4. create a table based on the data
  5. run some interesting queries using Athena
  6. possibly pull some of the data in Amazon Machine Learning to do some predictions
  7. possibly put some of the data in an s3 bucket as JSON and use datatables to create a nice user interface

Like pretty much every development project I’ve ever been part of, there were surprises. What was different is that I had a fixed amount of time since this was an exploratory project, I set a timebox. I didn’t complete much of what I wanted to get done, but wanted to document what I did.

I was able to get through step 5 with a small portion of data (13k rows). I ended up working a lot on windows because I didn’t want to boot up a vagrant box. I spent a lot of time re-learning XSLT in order to pull the data I wanted out of the XML. I used a tool called xmlstarlet for this, which worked pretty well with the small dataset. Here’s the command I ran to pull out some of the attributes of the XML dataset (you can see that I also learned about batch file arguments):

xml sel -T -t -m //case-file -v "concat(serial-number,',',registration-number,',',case-file-header/registration-date,'\n

,',case-file-header/status-code,',',case-file-header/attorney-name)" -n %filename% > %outfile%

And here’s the Athena schema I created:


CREATE EXTERNAL TABLE trademark_csv (
serialnumber STRING,
registrationnumber STRING,
registrationdate STRING,
statuscode INT,
attorneyname STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
LOCATION 's3://aml-mooreds/athena/trademark/';

After I had done the quick prototype, I foolishly moved on to downloading the full dataset. This caused some issues with disk storage and ended up taking a long time (the full dataset was ~300 files from 500M to 2GB in size, each containing about 150k records). I learned that I should have pulled down one large file and worked it through my process rather than making automating each step as I went. For one, xmlstarlet hasn’t been updated for years, and I couldn’t find a linux package. When tried to compile it, it was looking for libxml, which was installed on my ec2 instance already. I didn’t bother to head further down this path. But I ran into a different issue. When I ran xmlstarlet against a 500MB uncompressed XML file, it completed. But any of the larger files caused it to give an ‘out of memory’ error. I saw one reference in the bugtracker, but it didn’t seem to apply.

So, back to the drawing board. Luckily, many languages have support for event based parsing of XML. I was hoping to find a command line tool that could run XSLT in order to reuse some of my logic, but it doesn’t appear to exist (found this interesting discussion and this one). python seemed like it might work well.

Then I ran out of time. Oh well, maybe some other time. It is fun to think about how I can automate all of this. I was definitely seeing where lambda functions and some other AWS features could have fit in nicely. I also think that using RDS might have made more sense than Athena, given the rate of update and the amount of data.

Lessons learned:

  • what works for 13k records won’t necessarily work when you have 10x, let along 100x, that number
  • work through the entire pipeline with real world data before automating any part of it
  • use EC2 whenever you need to download a lot of data
  • make sure your buckets and athena are in the same region. I wasn’t, and there was no warning. That’s fine with small data, but could have hurt from a financial viewpoint if I’d been successful at loading the whole dataset
  • it can be fun to play around with this type of stuff, but having a timebox keeps you from going down the rabbit hole too far

Atoms and Bits

Atom

I found this post by Fred Wilson about atoms and bits to be a nice counterpoint to this post from a few years back by Chris Dixon. There are so many advantages that a software product has over a business that delivers physical goods, whether that is inventory, startup costs, or distribution. (Yes, if you build a mobile app, you have to pay the vig to Apple/Google, but you also can distribute at zero incremental cost to hundreds of millions of people.) From Fred’s post:

we should … understand that the timelines [for working on businesses that focus on atoms] will be longer and the road to adoption will be more challenging.

The fact is that atoms are just harder to work with than bits. However, that very difficulty can provide a moat around your business (“where there’s muck there’s brass”). If your business is bits, find another moat (branding, network effects, niche domination).


Going to GopherCon and Thoughts on Golang

GopherI am excited to go to GopherCon this year. I’ve been maintaining a couple of codebases written in Go/golang. Some are smaller webhook and automation programs, but a couple are larger data processing systems–take this data from these two sources, munge it a bit, put it over here. (Unfortunately, this is all custom, not using an ETL toolkit like, say, ratchet).

I’ve found Go to be an interesting challenge. It’s C based, but there are a few wrinkles/idioms that I’ve enjoyed figuring out (and more that I’m learning).

Things that surprised me:

  • GOPATH and the need for a domain name in paths.
  • That you have to search on golang to find anything useful.
  • The fact that any file in a package can add functions to any struct (I think I have the terms correct, please forgive me if I don’t)
  • The lack of an editor that can do reference searching (“show me all places this function is called”). I think VS.Code can do this, but have downloaded it and the Golang extension and can’t seem to figure out it. (This is likely my failing, not golangs, but I was looking forward to coding in a static language for just this reason. Well, this and safe refactoring.)
  • The strictness. I’m actually pleasantly surprised by it (no unused variables seems like such a no-brainer!) but golang is quite opinionated in terms of language syntax.

I unfortunately haven’t had as much time to write golang as I planned when signing up for this conference, but I’m looking forward to meeting some other folks and the excitement that always happens when you attend a conference. In particular, I’m looking forward to “Go says WAT?” which is patterned after the famous WAT video and “From prototype to production: Lessons from Reddit’s ad platform”. Hope to see you there.


Easily extracting conversations from a slack group

Man slack liningSlack is an amazing productivity tool when used correctly. One of the primary uses I’ve seen is for open source projects to provide support (Craft CMS, OG-AWS) or for communities to be built (Techfriends, Denver Devs). If you don’t have the luxury of the owner of your slack being Slack’s VP of engineering, the costs of $x/month/user can cause these types of slacks to remain on the free plan.

Which means that you are limited to the last 10k of messages.

And that’s fine for the vast majority of messages. Sometimes, however a discussion is so good that it deserves to be indexed and shared, which means it needs to be pulled out of the Slack walled garden and onto the web (I also wrote about how to do this with the Facebook Group walled garden last year). Sometimes you might just want to save it beyond the 10k message limit for your own selfish reasons.

You can of course do this extraction manually (I did so here and here). But that’s a lot of work.

Another option is to use Zapier. The slack integration is trivial to set up, and has a number of options. From there you can push to a google spreadsheet (if you want to do further reification) or directly to WordPress (or any of the other integrations).

The nice part about this is that the Zapier slack integration is that you have a variety of options that can trigger the publishing of a message to a spreadsheet:

  • a post of a public message in a specific channel
  • a post of a public message in any channel
  • starring of a message by you
  • attachment of a certain reaction emoji (I picked a floppy disk) to a message, no matter who adds the emoji

I’ve just started doing this but am excited to have a low friction way to pull high value conversations out of slack. Slack is great for synchronous communication and easy discussion. When real knowledge drops, it should be shared with the future and anyone who can type into a search box. Do make sure to let folks know because there may be some expectation of privacy that you’ll want to respect.


Where I’m Writing

It’s been a bit quiet around here, but since I joined Culture Foundry I’ve been writing over on their site more. I’ve written on a number of topics, but this one about how to move files between different servers in order to scale a traditional CMS application horizontally, is one of my favs:

Compared to the other methods of syncing files, this works well with non cloud native applications. It also has the virtue of using old tested tools. You can use this system on prem or in the cloud, anywhere with SSH and rsync. This system works well with large numbers of small objects because rsync can be configured to only push new objects.

I’ll be splitting my blogging time between this site and the Culture Foundry site in the future. See you there.

 


Startup COOs: What do they do?

I really enjoyed this post about what startupo COOs do. It was interesting because it wasn’t just opinion, there was also data. I particularly enjoyed the evolution of the author’s role over time, and the percentage of COOs that owned various responsibilities. To be fair, it was a small set of respondents in one geographic area, but still interesting.

However, the money quote was:

I actually have [simple way] of explaining what I do, and I would sum it up this way: taking things off my CEO’s plate, and figuring out how to thoughtfully scale the company.

Of course every company’s different, but I’ve seen the pattern of a visionary and an operator in a couple of companies, and it’s powerful.


Rails Views Cached In Production Environment

Railroad tracksI was troubleshooting a data issue in a production environment. It wasn’t heroku, rather a rails environment hosted on AWS. It was Rails 4.2, ruby 2.2.3.

First off, it’s worth noting that there were two or three bugs that were commingled and causing issues for our client. A number of folks had spent a long time trying to troubleshoot the issue. At this point, I was tasked with taking a look and had access to all the environments. The problem only seemed to appear on production, and appeared to be a data issue. I was editing views directly on production to track down where the data issue appeared, as well as running queries on the production database and using the rails console to see what rails thought was happening. In other words, it was a hot mess. However, this debugging story isn’t the point of this post. Rather, I ran into the most peculiar situation and wanted to document it so that if I ever ran into it in the future, I would remember it.

Basically, I had a view that looked something like this:

text
<% cache('[key]') %>
other text
<% end %>

I changed text to be new text which included some useful debugging information. Debugged the problem and went on my merry way. The next day, early, I realized that I hadn’t changed it back, so logged back into prod and changed it back to text. Reloaded the page and didn’t see the change. What? Tried to clear the cache using the rails console and Rails.cache.delete(). No change.

After lots of googling, I realized that the view text, outside of cache tags, is cached in some other fashion. I finally figured out how to reset the cache by following these steps:

  • edit config/environments/production.rb
  • set config.cache_classes=false
  • restart passenger by touching tmp/restart.txt (see here for more on that)
  • reload the page, and now I could see text instead of new text
  • set config.cache_classes=true
  • restart passenger by touching tmp/restart.txt

This only happens when you both have a mutable production environment and are changing the view files in that environment. This won’t occur if you were using a platform like Heroku, or if you never troubleshot on production.


(Links To) Advice For Someone Selling a SaaS Business

Sold signI ran into someone at a meetup recently who’d built a SaaS that had a pretty decent MRR. Enough to support one person. Which is a huge achievement!

He was wondering what options he had to either grow or exit the business. This is something I’ve been reading about for a number of years, so I had some advice. I thought I’d write it down so that others could benefit (or chime in). These are resources I’ve found insightful.

This is a great first hand account by patio11 of selling a software business (it wasn’t SaaS, rather a one time digital product sale, but I think there are a number of common themes). He mentions the broker he uses, the due diligence process, and what you can do now to set yourself up for success (have separate accounts, for one).

I can’t even talk about SaaS without telling folks to raise their prices. It’s a reflex now. Amy Hoy has two great posts on this: grandfathering and new features (with a lot of communication mixed in). I experienced this myself at a previous startup, where we almost doubled our monthly subscription price in 18 months.

Finally, here’s an interesting post from a venture capitalist about how private equity is a new exit option for SaaS companies. In that vein, I chatted briefly with one such PE firm, SureSwift Capital, about part time work a year or so ago. I don’t know how they are to work with (the position wasn’t a fit) but at the time they were focused on acquiring SaaS companies with good MRR and helping them grow.


What’s New With Ruby?

Red Rubyish DiamondFor the past couple of months I’ve been doing a short segment at the beginning of the Boulder Ruby Meetup called “What’s happened in Rubyland?”

I basically look at 3-4 blogs and google searches and see what is happening. Of course, far more than what I can collate is happening (I don’t look at any major gem releases, for example) but this gives quick insight into major happenings.

Here are the past three editions. Enjoy the starkness of my presentation.

PS We’re always on the lookout for speakers. Let me or tweet the organizers if you’re interested.

 



© Moore Consulting, 2003-2017 +