Skip to content

Using a lambda function to make AML real time predictions

When you are making real time AML predictions against an endpoint, you can run the prediction code (sample code) locally.  However, leveraging AWS Lambda can let you build a system that accesses predictions without any servers.  This system will likely be cheaper and scale better than running on your own servers, and you can also trigger predictions on a wide variety of events without writing any polling code.

Here’s a scenario.  You have a model that predicts income level based on user data, which you are going to use for marketing purposes.  The user data is place on S3 by a different process at varying intervals. You want to process each record as it comes in and generate a prediction.  (If you didn’t care about near real time processing, you could run a periodic batch AML job.  This job could also be triggered by lambda.)

So, the plan is to set up a lambda function to monitor the S3 location and whenever a new object is added, run the prediction.

A record will obviously depend on what your model expects.  For one of the the models I built for my AML video course, the record looks like this (data format and details):

22, Local-gov, 108435, Masters, 14, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 80, United-States

You will need to enable a real time endpoint for your model.

You also need to create IAM policies which allow access to cloudwatch logs, readonly access to s3, and access to the AML model, and associate all of these with an IAM role which your lambda function can assume.  Here are the policies I have associated with my lambda function (the two describe policies can be found in my github repo):

You then create a lambda function which will be triggered when a new file is added to S3.  Here’s a SAML file which defines the trigger (you’ll have to update the reference to the role you created and your bucket name and path).  More about SAML.

Then, you need to write the lambda function which will pull the file content from S3 and run a prediction. Here’s that code.  It’s similar to prediction code that you might run locally, except how it gets the value string.  We read the values string from the S3 object on line 31.

This code is prototype quality only.  The sample code accesses the prediction and then writes to stdout.  That is fine for sample code, but in a real world scenario you’d obviously want to take further actions.  Perhaps have the lambda function update a database row, add another file to S3 or call an API. You’d also want to have some error handling in case the data in the S3 file wasn’t in the format the model expected.  You’d also want to lock down the S3 access allowed (the policy above allows readonly access to all S3 resources, which is not a good idea for production code).

Finally, S3 is one possible source of input, both others like SNS or Kinesis might be a better fit.  You could also tie into the AWS API Gateway and leverage the features of that service, including authentication, throttling and billing.  In that case, you’d want the lambda function to return the prediction value to the end user.

Using AWS Lambda to access your real time prediction AML endpoint allows you to make predictions against single records in near real time without running any infrastructure of your own.

How can you ignore the feed?

Joel Spolsky has a post up about how the design of software affects society, which has some great points about how ignoring Twitter and Facebook and other feeds of information that are constantly coming at him makes us happier. He’s not alone in thinking that the design of software intimately affects the people that use it. Here’s a great post from 2003 on social software by Clay Shirky.

Joel says:

I gave up on the feeds because they were making me angry. A lot of times I was angry because of politics, but even on non-political things, the feeds seemed like they were full of conflict and stress.

I can’t tell you how much happier I am without them. Am I the only one that hated reading feeds? Do they make everybody unhappy? And if they make people unhappy why are they so popular?

And then goes on to examine how these companies have leveraged human behavior and technology to keep us coming back.

I have had issues with this myself (I’m no snowflake).  For Facebook, it’s “I wonder what happened to <old acquaintance>?”.  For Twitter it’s “what are people talking about now?”.  What has worked for me?

First, take the applications off my phone.  The phone is ever present, and if I have access, I will look at the feed.  “Oh, I wonder who has posted something interesting to Twitter.”  Yes, I should limit my access to my phone too.

Then, I changed my password to something hard to remember, that I have to look up someplace (from a password manager).  This means that I can’t login on a whim, but have to take the extra step of looking up my password.

I use the applications for limited purposes, not for general entertainment.  For Twitter, I limit idle scrolling and really focus on the ability to communicate with anyone anywhere, as well as friends I have on Twitter.  Rather than logging in to post something, I’ve set up several zaps to push content from other sites to Twitter.  I also have stagnated at about 500 followers, so if you are looking to be a Twitter influencer, don’t ask me for advice.  For Facebook I’m even more careful.  I stay logged out of Facebook, and only login when I have specific tasks–share an article or contact someone for whom I have no email address.

I never allow the applications to send me notifications.  The emails they send to pull me back in are bad enough.

These platforms have tremendous value, but if I am not careful I get sucked in and waste time and brainpower.  There’s a great book, Feed, and a primary plot driver is how humans will act when we have access to the wonders of the Internet embedded in our brains.

It’s a dystopian novel.

Stripe Connect And Refunds Initiated by Connected Standard Accounts

We are using Stripe Connect to handle our payments.  Our sellers (kitchens) have standard accounts, which means they have full access to the Stripe dashboard (I believe these used to be referred to as ‘standalone accounts’).  We are using destination charges so charges run against the platform account and are then immediately transferred to the kitchens, less any application fees (this is not entirely accurate, but good enough for this post).  All well and good.

Some of the kitchens noticed they could refund a charge via their Stripe dashboard.  Refunds happen for a lot of reasons–maybe the food business was inadvertently charged or they had some issues in the kitchen.  So, the kitchens refunded, and the food businesses waited.  And waited.

Meanwhile, when The Food Corridor ran numbers looking at revenue, we saw aberrations.  There was extra money in our bank account.  Now, every business wants more revenue, but they tend to like to know the source.  So, we wanted to figure out where the extra revenue was coming from.

I looked into this and, after some investigation and emails with Stripe support, determined that the kitchens were refunding money to us.  However, it wasn’t flowing through to to the food business from whence it came.

Here is the flow of funds when a charge happens:

food business -> platform -> kitchen

When a refund happens, if initiated by the platform:

kitchen -> platform -> food business

When a refund happens, initiated by the kitchen

kitchen -> platform

Tracking down exactly which charges were being refunded by a kitchen was tedious, but easy to do if we knew which kitchen had performed the refund.  If we didn’t know that, we’d have to search through all the connected accounts for the matching charge.  It was far easier to contact Stripe directly and ask them to hunt it down with some internal tools.  Providing the date of the transfer, the amount and the id was helpful to Stripe support.

After we knew which kitchen had performed the refund, it was easy enough to find the charge, and then refund it to the customer, completing the loop.  Here’s the flow of funds for that:

platform -> food business

If you are using Stripe Connect and are using the destination charge method, and your sellers have standalone accounts, make sure they know they can’t issue refunds from their Stripe dashboards.

What can a senior developer do that a junior developer can’t do?

I saw this post about senior vs junior lawyers a while ago and it sparked some thoughts about the senior/junior divide.  I have also seen companies interested in hiring only senior developers. This is not a post hating on junior developers–the only path to a senior developer is starting as a junior developer.

Anyway, I think that senior developers will help you in a number of ways:

  • Senior developers have made their mistakes on someone else’s dime.  We all make mistakes.  One time I took 4 hours to figure out that the reason I couldn’t connect to an Amazon RDS instance was because the security group was too locked down.  But people integrate their mistakes into their experience and don’t make the same mistake twice.  When you hire a senior developer, you’re getting all their previous experience as well as their current effort.
  • Senior developers can own a larger portion of a project than junior can.  This can include non coding tasks like system setup, estimation, requirements definition and customer interaction.  This gives you leverage.
  • They are able to understand interactions between various pieces of your system.
  • They have an intuitive feel for performance impacts, again, probably because of previous mistakes.
  • They map your current solutions and problems into their existing knowledge: “Oh, this is how we handled caching at my <$JOB – 2>.”
  • They bring best practices from other companies (and can comment on worst practices).
  • They can get up to speed more quickly.
  • They can mentor other developers in areas of expertise.
  • They know the power of automation and knowledge sharing.
  • They have previously been confronted with difficult, complex problems and have an idea of how to attack new, difficult complex problems.
  • They understand that almost all problems are people problems, and that communication is a key part of any project’s success.

Junior developers, in my experience, bring value for the money as well.

  • They have more malleable viewpoints, especially with respect to tooling and technology.
  • They are typically very eager to learn.
  • When you are explaining systems to them, they can’t leverage existing knowledge.  This forces you to speak from first principles and may highlight erroneous assumptions.
  • Because they don’t have the knowledge of existing systems, they question why things are done a certain way.  Systems and processes evolve, but sometimes tasks are done only because they’ve “always been done that way” which isn’t necessarily the most efficient.  It’s hard to see that without the “eyes of a beginner”.
  • They are less cynical.

All good developers in my mind are:

  • Eager to learn.
  • Unwilling to say “that’s not my job”.
  • Accept ownership of the success of the project and the business.

If I had to sum up the differences between junior and senior developers in one word, it is would be “scope”.  A senior developer should be able to handle a wider scope of projects, responsibilities and problems than a junior developer.  How wide that scope is depends on the developer, their experience, and the problem space, but there are certain aspects of scope that are the same across domains (non coding tasks).

The Rails Low Level Cache

I think the Rails low level cache is the bees knees. It lets you store complicated values easily, manage TTLs, and improve the performance of your application.

I’d be remiss in stating that you shouldn’t cache until you need to.  Caches will introduce additional complexity into your application, make troubleshooting harder, and in general can be confusing.  How do you know if you need to introduce a cache?  Profile.  Using rack-mini-profiler is a great quick and dirty way to profile your code.

The rails low level cache is fairly simple to configure (especially if you’re using a PaaS like Heroku–you can drop in a managed memcached service easily).

From there, you need to build a cache key.  This is a string, up to 250 characters in length, that uniquely identifies whatever you’re trying to cache.  Basically anything that would cause a change in the contents of the cached object should be included in this key.  However, since you are going to calculate this key every time the value is requested, you don’t want the cache key to be expensive to calculate, otherwise the value of the cache will be degraded.  Good things for the key: timestamps, max updated_at values for a collection, user ids or roles.

Then, putting a value in the cache is as easy as:

def pull_from_cache
  cache_key = 'mykeyval'
  myval = Rails.cache.fetch(cache_key, expires_in: 1.hours) do
      @service.build_expensive_object()
  end
end

In the case above, the cached value will automatically expire in 1 hour.  If you don’t set the expiration time, then the cached value will eventually be removed via LRU.  How long that is depends on the size of your cache and what else you are putting in there.

If you want to test that something is being put in the cache, you can run a unit test and see how many times build_expensive_object() is called.

#...
object.service = @service
expect(@service).to receive(:build_expensive_object).once
object.pull_from_cache  
object.pull_from_cache
#...

In a production troubleshooting situation, you may need to evict something from the cache.  You can do so from the rails console by finding the cache key and running Rails.cache.delete('key').

Using the low level cache is a great way to push read heavy hot spots of your rails application (database queries or other complicated calculations) into memory and make your application faster.

Online teaching tips for synchronous classrooms

I’ve been teaching AWS courses for the past year or so.  Many have been with an online teaching environment.  This opens up the class to more people–there’s less cost in taking a course from your office or living room, as compared to flying an instructor out for an on-site.  However, this learning environment does have challenges.  Below is my set of best practices, some suggested by other instructors.

Pre class:

  • Set up your physical environment.  This includes making sure you have a fast internet connection, a room with no noise, and that your computer and audio equipment are set up.
  • Set up the virtual room.  Load the materials, set up any links or other notes.  I like to run virtual courses entirely with chat (audio conferences are really hard with 20 people) so I make a note about that.
  • Test your sound.  This includes having a friend login and listen to you beforehand.  This run through can help make sure your voice (which is your primary engagement tool) is accessible to your students.
  • Email a welcome message to all the students, 2-3 days before class starts.  Include when the class is happening, how to get materials, etc.  I’ve definitely had interactions with students from these emails that led to a better outcome for everyone.

During class:

  • Calculate your latency.  Ask an initial question that requires a response as soon as the question is asked.  Something easy like “where are you from?” or “how many years of AWS experience do you have?”  Note the latency and add it into the time you wait before asking for questions.
  • Ask for questions.   How often can vary based on previous AWS experience, but every 5-10 slides is a good place to start.
  • Answer questions honestly.  If you don’t know, say so.  But then say, “I’ll find out for you.”  (And then, of course, find out.)
  • Allow time for students to read the slide.  At least 15 seconds for each slide.
  • You, however, should not read the slide.
  • Draw or use the pointer tool to help engage the students and pull them into the material.
  • Find out what students want out of the class.  Try to angle some of the content toward those desires.  You may be constrained by knowledge or time or presentation material, but you can at least try.
  • Engage your students.  I like to make corny jokes (“have you heard the one about the two hard problems in computers science?“), refer back to technologies they mention having familiarity with, and talk about internet kitten pictures.
  • Remember your voice and energy are the only things keeping these students engaged.

After class:

  • Follow up on any loose ends.  This could be questions you didn’t get answered or more mundane items like how they can get a certificate of completion.  I had one student who couldn’t get access to the materials and it took a few weeks of bugging customer service reps across organizations before he did.  Not a lot of time on my end, but a big deal for him.

Note that I didn’t cover the content or particular technology at all.  They aren’t really relevant.

 

Useful Rails Gems: Pretender

I’m constantly amazed at how productive you can be with rails. It simply lets you work on typical webapp problems at a much higher level. At 8z, we had a web application and a customer support team. Occasionally the customer support person had to ‘impersonate’ a normal user to troubleshoot an issue. We built a piece of software that let them assume that role. (We called it ‘sudo‘, obviously.) It’s been a few years, but as I recall it was complicated and error prone, lived on a different domain and wasn’t fully functional.

I needed to add similar functionality to a rails web app, and was able to find a couple of gems that looked useful. I selected pretender, mostly on the basis of documentation and google search results placement. I followed the instructions, tweaked a few settings and was off to the races in about an hour.  (Note this isn’t a fair apples to apples comparison of the underlying technologies, due to the differences in available open source libraries between the mid 2000s and the late 2010s.)

Now, all this gem does is what it says it does. It lets a certain user or set of users in your application pretend to be another user. It doesn’t handle auditing or anything else you might want with an elevated privilege system.

But it does do what it does well.

Ask the hard questions

Do you know that moment which happens at almost every meeting or conference call, when a participant refers to a concept that you don’t quite understand?  That moment happens to me often, and has throughout my career.

“The marketing funnel is part of the sales process.”

“We can just flurbuzz the bazznod.”

“We have plenty of runway.”

That is the moment when you can ask the hard question.

“Why is that connected?”

“What does that mean?”

“I’m sorry, I don’t understand.  Could you repeat that?  How do you define runway?”

Asking these questions is important.  Otherwise you and the other parties will be talking past each other, which will only lead to pain down the line when the misunderstanding is crystallized in people, process or code.

I’ve been asking these kinds of questions my entire career.  When I worked at a web consulting company in the early 2000s, there were often company wide conference calls discussing our precarious financial state.  I got a reputation as “the question guy” because I wasn’t afraid of asking the awkward question, even of the CEO in front of the entire company.  I was interested in hearing as real of an answer (as they could share).

If you’re interested in asking hard questions, here are some tips:

  • Pay attention beforehand.  If you are asking a question about something that was just mentioned, or that you should have known, you won’t have credibility.
  • Don’t worry about looking dumb (except if you weren’t paying attention, see above).  If it is a fuzzy concept, chances are others in the room are wondering what it is.  Besides, the goal is to increase your knowledge.  Check your ego.
  • Ask the question from a place of humility and make it about you.  Maybe you just really don’t understand.  I always like the phrase: “I’m sorry, I don’t understand what you just said.”
  • Approach with positive intention.  Don’t ask gotcha questions or try to prove you are smarter than the speaker.
  • If the answer is a bit fluffy or you don’t understand it, ask a second time.  “Thanks, but I’m afraid I still don’t get it.  Could you explain it to me again?”
  • If the speaker doesn’t answer or hedges again, offer to take it offline.  Depending on who is in the meeting, you don’t want to waste everyone’s time.  But then make sure you follow up.
  • Recognize if the topic is sensitive (financial matters) you may not get a clear answer.  At that point, getting the speaker to define terms but perhaps omit numbers is a win.

Again, the goal here is not to ‘get’ the speaker, it’s to help get everyone on the same page.  Asking tough questions to pin down some of the nebulous concepts we work with every day can help everyone make better decisions.