Publish java artifacts to s3 using maven

If you don’t want to run a maven repository server like Nessus, you can use AWS S3 for your maven repository for your java artifacts.  You can make the repository public in the same way as you’d make a website public.  However, it’s more likely that you’ll want to make it private and authenticate using AWS IAM.  Here are some step by step instructions that appear useful.

(Note there’s another S3 maven wagon, but it does not appear to support authentication.)

Why do this?  It’s one more piece of infrastructure that you don’t have to maintain, update, and run.


Book Review: Beforelife

I really enjoyed Beforelife. It’s a whimsical stroll through a fantasy land that exists after death.  The main character, Ian, arrives in the afterlife with an unusual trait: perfect recollection of his life before death.  This leads him to be institutionalized, where he meets some madcap associates, including multiple people who think they are Napoleon and someone who thinks he was a dog.

The rest of the book is all about how this group helps Ian discover what’s special about him.

I enjoyed the plot twists, which I won’t outline because spoilers.  There are several other historical characters, some named (Socrates, Isaac Newton), others hinted at.  I also enjoyed the world construction.  The world after death is populated by immortals, with interesting ramifications for areas of study like science and social constructs like policing.

If you’re looking for a light hearted jaunt and you liked Douglas Adams or Terry Pratchett, you’ll enjoy Beforelife.

“The future is already here, but it’s only available as a managed AWS service”

This entire post about how Kubernetes could become the distributed operating system of choice is worth reading.  But one statement really struck me:

Well, as they say, the future is already here, but it’s only available as an AWS managed service.

The “they” in this is apparently not William Gibson, as I thought.  More details here.

For the past couple of years the cloud providers have matured and moved from offering infrastructure as a service (disk, compute) to platform as a service offerings (sqs, which is a managed message queue like activemq, or kinesis, a managed data ingestion system like kafka, etc).  Whenever you think about installing a proprietary or open source package, you should include the cloud provider offerings in your evaluation matrix.  Of course, the features you need may not be there, or the cost may be prohibitive, but including them in an evaluation makes sense because of the speed of deployment and the scaling available.

If you think a system architecture can benefit from a message queuing system, do you want to spend time setting up and maintaining such a system, or do you want to spin up an SQS queue in a few minutes?

And the cost may not be prohibitive, depending on the skillset of your internal team and your team’s desire to run such plumbing services.  It can be really hard to estimate running costs of infrastructure services, though you can estimate it by looking at internal teams and seeing similar services they run and how much money it takes.  The nice thing about cloud services is that the costs are very transparent.  The kinesis data streams pricing example walks through a scenario and concludes:

For $1.68 per day, we have a fully-managed streaming data infrastructure that enables us to continuously ingest 4MB of data per second, or 337GB of data per day in a reliable and elastic manner.

Another AWS instructor made the point that AWS and other cloud services invert the running costs of IT infrastructure.  In a typical enterprise, the running costs of your data center and infrastructure are like an iceberg–10% is explicit (server costs, electricity, etc) and 90% is implicit (payroll, time spent upgrading and integrating systems).  In the cloud world those numbers are reversed and far more of your infrastructure cost is explicitly laid out for you and your organization.

Truly the future.

RSS Pick: Calculated Risk

If you are interested in keeping a finger on the pulse of national real estate, I can’t recommend Calculated Risk highly enough.  I’ve been following Bill McBride since the mid 2000s, when he was sounding the alarm about NINJA loans and subprime mortgages.  He was also anonymous for years, which lent a sense of mystery.  His co-blogger at the time, Tanta, wrote an especially illuminating series of posts on the mechanics of mortgages.  She makes it fascinating with her knowledge and tone.

His specialty is graphs and commentary on those graphs.   He covers a disparate set of data sets (the hotel industry, the trucking industry, restaurants, architectural billings, demographics), as well as more traditional ones (Fed reports, Case Shiller, housing starts).  He also posts a weekly schedule of upcoming economic reports and events.  The two problems I have with the blog are: there are a lot of ads on the site (which you can avoid by using an RSS reader) and that he’s too prolific and that you have to either commit a substantial portion of time or skim along all the data he’s made available.


Boulder Blockchain Meetup

I went to the Boulder Blockchain Meetup a few days ago. It was fascinating. The entire room was full to standing, and they went around and asked everyone to do a quick intro. Then we separated into three grouos:

  • beginners
  • developers
  • everyone else

The beginners group, where I went, was about 10ish folks in a room discussing all different aspects of the blockchain, from who might be interested in using it to what a particular coin might be used for to ‘buying the dip’. I was surprised at how many non developers were there (40-60%). There was a lot of talk about ‘trading’ crypto currencies. To be honest, it felt a bit like the wild west, with plenty of interesting work and some scams all mixed together.

However it was interesting enough to me to take a deeper look into Ethereum (there are so many crypto currencies, but this seems like a good one to investigate, if you are a developer). This looked useful, as did this.

Finally, if you’d like a two minute intro into why this is worth investigating, here’s a video from the Meetup website (otherwise, you should totally check out the next meetup):

“Are you Twitter or Square?”

This question was asked recently by someone that has seen a lot more startups than I have. The context: coffee shop business applications. In scenario, Square is absolutely essential–if you can’t take money, then you can’t serve customers (well, you can take cash, but you’ll lose a lot of business by not being able to take credit cards). Twitter, on the other hand, is mildly useful to help get more people in the door–if Twitter went away, the coffee would continue to flow.

For other types of businesses or users, the importance of these two apps is reversed. I don’t know many news journalists, but I’ve read that Twitter is absolutely crucial for keeping on top of the news and connecting with both audiences and newsmakers. Square being down wouldn’t matter at all to them (except as a story).

The point is not to praise Square or denigrate Twitter, the point is when thinking about service to sell, are you selling crucial functionality or are you selling something that is “nice to have”. If it is the former case, selling will be easier and customers will be more loyal than if it is the latter.

I’m definitely not the first to talk about these ideas, see “Is your product a vitamin or a painkiller”, but I thought the Twitter/Square/coffee shop was a nice vignette and wanted to share.

Automating one off deployment tasks

When I am deploying a rails application, there are sometimes one off tasks that I need done at release time, but not before. I’ve handled such tasks in the past by:

  • adding a calendar entry (if I know when the release is happening)
  • add a task or a story for the release and capture the tasks there
  • writing a ‘release checklist’ document that I have to remember to check on release.

Depending on release frequency, application complexity and team size, the checklist may be small or it may have many tasks on it. Some tasks on a checklist can include:

  • sending a communication (an email or slack message) to the team detailing released features
  • restarting an external service to pick up configuration or code changes
  • notifying a customer that a bug they reported has been fixed
  • kicking off an external process via an API call now that a required dependency has been released

What these all have in common is that they:

  • are not regular occurrences; they don’t happen every deploy
  • are affecting entities (users or software) beyond code and database
  • may or may not require human interaction

after_party is a gem helps with these tasks. This gem is similar in functionality to database migrations because each after_party task is run once per environment (this is done by checkpointing the task timestamp in the database). However, after_party tasks are arbitrary rake tasks, rather than the database manipulation DSL of migrations.

You add this call to your deployment scripts (the example is for heroku, feel free to replace heroku run with bundle exec):

heroku run rake after_party:run --app appname

This rake task will will run every time, but if an after_party task has already been run and successfully recorded in the database, it will not be run again.

You can generate these tasks files: rails generate after_party:task my_task --description="desc"

Here’s what an after_party task looks like:

namespace :after_party do
  desc 'Deployment task: notify customer about bug fix'
  task notify_issue_111: :environment do
    puts "Running deploy task 'notify_issue_111'"

    TfcAdminNotesMailer.send_release_update_notification("customer X", "issue 111").deliver_now

    AfterParty::TaskRecord.create version: '20180113164945'
  end  # task :notify_issue_111
end  # namespace :after_party

This particular task sends a release update notification email to the customer service user reminding them that we should notify customer X that issue #111 has been resolved. I couldn’t figure out how to make a rake task send email directly, hence the ActionMailer. In general you will want to push all of your logic from the rake task to POROs or other testable objects.

I could have sent the message directly to the customer rather than to customer service. However, I was worried that if a rollback happened, the customer might be informed incorrectly about the state of the application. I also thought it’d be a nice touchpoint, since issues are typically reported to customer service initially. The big win is that ten of these can be added over a period of weeks, and I could have gone on vacation during the release, and the customer update reminders would still be sent.

All is not puppies and rainbows, however. This gem doesn’t appear to be maintained. The last release was over two years ago, though there are forks that have been updated more recently. It works fine with my rails4 app. The main alternative that I’m aware of is hijacking a database migration and calling a rake task or ruby code in the ‘up’ migration clause. That seems non intuitive to me. I wouldn’t expect a database migration to touch anything outside of, well, the database.

Another thing to be aware of is that there is no rollback/roll forward functionality with after_party. Once a task is run, it won’t get run again (unless you run the rake task manually or modify the task_records database table).

This gem will help you automate one off deployment tasks, and I hope you enjoy it.

Founding Engineer or CTO, translated

If you wanted to re-read my post about the various technical roles a co-founder can take as a startup grows in Russian, you can, here.  Thanks to Vlad for the translation.

Bonus!  Here’s discussion on Hacker News, including my favorite comment:

Having been on both sides of the situation a couple times, the distinction is pretty simple for me: do you have a seat on the board?

If you have a board seat, great, you’re a real founder/CTO!

If you’re not interested in the board seat or you’re aware that you don’t bring enough to the table to earn it, you’re a good candidate for a founding engineer. You should be happy!

If the other founder(s) want you to be a founder/CTO without the board seat, run! They’re just using the prestige of the title to pay you less, and will revoke it at the first convenience.


My “getting paid for the work” story

Consulting is about getting the work, doing the work, and getting paid for the work.

This is my “getting paid for the work” story.

I was a contractor helping build out an ecommerce site for a startup.  I had been introduced to this client by a colleague, and felt like I had a good relationship with the technical lead, “Bob”.  We were making progress on getting the site built out and I’d worked a couple of months with them–they were my primary client.  I believe I was billing every semi-monthly.

One fall day, I got a note from “Bob” that he’s leaving, and I should send all my future invoices to “Joe”, from accounting.  I seem to recall that the project was over budget and was being shut down.  I had one outstanding invoice for about $4,000.

“Joe” wasn’t very interested in making me whole.  He probably was interested in trying to keep the company afloat and keep cash in the company’s pockets.  I was, however, interested in collecting that money.

I didn’t have much leverage since the project was shut down and my primary contact had moved on.  What I did have was persistence.  And I was also able to get “Joe”‘s skype handle.

Every two weeks I would re-send the invoice, always with the same format:


I just wanted to send you this invoice for work I’ve done previously.  It was due on XX/XXXX.

Please let me know if you have any questions or concerns.


And then I’d ping “Joe” on skype to see if he had received the invoice.  Needless to say, it didn’t take long before “Joe” wasn’t on skype very often.  I still continued to send the invoice to his email address.

Every year I give holiday gifts to my clients as a way of saying “thank you”.  I gave a box of chocolates to the ecommerce startup that year. Even though they were stiffing me for thousands of dollars, I still appreciated the money they’d paid me and the work they’d let me do.

Within two weeks, I was paid in full.

Using a lambda function to make AML real time predictions

When you are making real time AML predictions against an endpoint, you can run the prediction code (sample code) locally.  However, leveraging AWS Lambda can let you build a system that accesses predictions without any servers.  This system will likely be cheaper and scale better than running on your own servers, and you can also trigger predictions on a wide variety of events without writing any polling code.

Here’s a scenario.  You have a model that predicts income level based on user data, which you are going to use for marketing purposes.  The user data is place on S3 by a different process at varying intervals. You want to process each record as it comes in and generate a prediction.  (If you didn’t care about near real time processing, you could run a periodic batch AML job.  This job could also be triggered by lambda.)

So, the plan is to set up a lambda function to monitor the S3 location and whenever a new object is added, run the prediction.

A record will obviously depend on what your model expects.  For one of the the models I built for my AML video course, the record looks like this (data format and details):

22, Local-gov, 108435, Masters, 14, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 80, United-States

You will need to enable a real time endpoint for your model.

You also need to create IAM policies which allow access to cloudwatch logs, readonly access to s3, and access to the AML model, and associate all of these with an IAM role which your lambda function can assume.  Here are the policies I have associated with my lambda function (the two describe policies can be found in my github repo):

You then create a lambda function which will be triggered when a new file is added to S3.  Here’s a SAML file which defines the trigger (you’ll have to update the reference to the role you created and your bucket name and path).  More about SAML.

Then, you need to write the lambda function which will pull the file content from S3 and run a prediction. Here’s that code.  It’s similar to prediction code that you might run locally, except how it gets the value string.  We read the values string from the S3 object on line 31.

This code is prototype quality only.  The sample code accesses the prediction and then writes to stdout.  That is fine for sample code, but in a real world scenario you’d obviously want to take further actions.  Perhaps have the lambda function update a database row, add another file to S3 or call an API. You’d also want to have some error handling in case the data in the S3 file wasn’t in the format the model expected.  You’d also want to lock down the S3 access allowed (the policy above allows readonly access to all S3 resources, which is not a good idea for production code).

Finally, S3 is one possible source of input, both others like SNS or Kinesis might be a better fit.  You could also tie into the AWS API Gateway and leverage the features of that service, including authentication, throttling and billing.  In that case, you’d want the lambda function to return the prediction value to the end user.

Using AWS Lambda to access your real time prediction AML endpoint allows you to make predictions against single records in near real time without running any infrastructure of your own.

© Moore Consulting, 2003-2017 +