I’m planning to do a personal challenge the month of December. I’m going to write a blog post every Monday, Tuesday, Wednesday, Thursday and Friday of the month (excepting any holidays). Most of the topics will be technical, but some may focus on leadership.
I’m writing this blog post and publishing it in late November to give you a ‘heads up’. Some of you receive my blog via email, and I wanted to give you a warning and let you unsubscribe if this flurry of posts wasn’t intriguing.
So, I was looking to add a simple widget to collect net promoter scores (often called NPS). I was astonished at the dearth of options for standalone NPS tracking. I assume that most customer relationship software has an NPS tracker embedded in it, but we didn’t want to use anything other than a simple standalone widget.
Two options that popped up: Delighted and Murm. I did a quick spike to evaluate each of these tools. At the time I reviewed it, I couldn’t get Murm to work, and they didn’t respond to my customer support request. Delighted, on the other hand, did so. They let me have access to web display, which was a beta feature at the time and worked with us on price. It was trivial to install, though I’m still a bit unclear how the form determines when to display. Highly recommended.
The nice thing about having a NPS tracker on your website is you can get direct feedback from your users. This has led to numerous useful conversations and feature requests, as someone who is using our software for the first time brings new clarity to confusing features or user interface. Plus, it is a great number to track.
When creating an AML system, there are three places where you can transform your data. Data transformation and representation are very important for an effective AML system. I’d suggest watching about five minutes of this re:Invent video (from 29:14 on) to see how they leveraged Redshift to transform purchase data from a format that AML had a hard time “understanding” to one that was “easier” for the system to grok.
The first time to transform your data is before the data ever gets to an AML datasource like s3 or redshift. You can preprocess the data with whatever technology you want (Redshift/SQL, as above, EMR, bash, python, etc). Some sample transformations might be:
turning dates into integer values (from a given ‘day zero’)
condensing or expanding categorical values (turning an age into an age group–someone who is 31 goes into the 30-35 group)
At this step you have tremendous flexibility, but it requires staging your data. That may be an issue depending on how much data you have, and may affect which technology you use to do the preprocessing.
The next place you can modify the data is at datasource creation. You can omit features (but only using the API by providing your own schema with an ‘excludedAttributeNames’ value, not the AWS console), which could speed up processing and lower the total model size. It could also protect sensitive data. You do want to provide AML with as much data as you can, however.
As long as a feature is valid in both types, you can create multiple data sources with different data types for a feature. The only type of feature that I know of that is a valid in multiple AML datatypes is an integer number, which, as long as it only has N values (like human age), could be represented as either a numeric value or a categorical value.
The final place you can modify your data before the model sees it is in the ML recipe. You have about ten or so functions that AML provides that you can apply to your data as it is read from the data source and fed to the model. You can also create intermediate representations and make the available to your model (lowercase a string feature, for example).
Using a recipe allows you to modify your data before the model sees it, but requires no staging on your part of the source or destination data. However, the number of transformations is relatively limited.
You can of course combine all three of these methods when building AML models, to give you maximum flexibility. As always, it’s best to try different options and test the results.