Restoring a single table from an Amazon RDS backup

material-icon-1307676_640When you use SQL, how do you write delete statements at the database prompt?

A delete statement typically looks like this: delete from table_name where column_name = 'foo';. I usually write it in this order:

  1. delete
  2. delete where column_name = 'foo';
  3. delete from table_name where column_name = 'foo';

Even though this is a pain because you have to move back and forth (I really need to look into vi keybindings for mysql), it prevents you from making sending this command by accident: delete from table_name; which deletes all the data in your table.  (Another alternative is to never use the interactive client and always write out your delete statements in a file and run that file to delete data.)

But, recently, I did exactly that, because I forgot.  I deleted all the data from one table in our production database.  It was billing data, so rather important.  Luckily, I am using Amazon RDS and had set up backup retention.

I wanted to outline what I did to recover from this.

  • I took a deep breath.
  • I wrote a message on the slack channel documenting what had happened and the possible customer impact.
  • Depending on which data is removed, it’s possible you will want to put the application in maintenance mode and/or inform your customers of the issues.  What I deleted was used rarely enough that I didn’t have to take these steps.
  • I looked at how to restore an Amazon RDS backup.
  • I restored the missing data.
  • I communicated that things were back to normal to internal stakeholders.

Unfortunately, it wasn’t clear how to restore a single table.  I’m used to being able to download a .sql file and hand edit it, but that’s not an option.  Stackoverflow wasn’t super helpful.   But if there’s anytime you want clarity, it’s when you are restoring production data.  You don’t want to compound the problem by screwing up something else.

So, here’s how to restore a single table from an Amazon RDS backup:

  • Note the time just before you deleted the data.  (Another reason the slack message is nice.  chatops ftw.)
  • Start up another instance from that moment.  I named it something obvious like ‘has-data-from-tablename’.
  • Twiddle your thumbs anxiously while the new instance starts up.
  • The instance is put into your default security group (as of this writing) which probably doesn’t allow mysql access.  Make sure you modify this security group to allow access.
  • When the instance is up, do a dump of the table you need: mysqldump -t --ssl-ca=./amazon-rds-ca-cert.pem -u user -ppassword -h -P3306 database_name tablename > restore-table_name.sql; (-t omits the create database/table statements.)
  • If your table is has had writes since you deleted everything, you may need to manually pull down the current data from the production system and merge it into restore-table_name.sql; I was able to avoid this step.
  • Load the data using mysql mysql --ssl-ca=./amazon-rds-ca-cert.pem -u user -ppassword -h -P3306 database_name < restore-table_name.sql;
  • Review to make sure the data is correct.
  • Test the application.
  • Update the slack channel, and do any other notifications you need to (customers, internal contacts, etc).
  • Revoke the default security group access you allowed above.
  • Delete the ‘has-data-from-tablename’ instance.

Note this only works if you caught your mistake within the backup retention window. (Make sure you set that up.)  We aren’t multi AZ or clustered, so I’m not sure how that would affect things.

Happy deep breathing!

Bare minimum of ops tasks for heroku

Awesome, you are a CTO or founding engineer of a newborn startup.  You have an web app up on Heroku and someone is paying you money for it!  Nice job.

Now, you need to think about supporting it.  Heroku makes things way easier (no racking and stacking, no purchasing hardware, no configuring apache) but you still to set up some operations

Here is the bare minimum you need to do to make sure you can sleep at night.  (Based on a couple of years of heroku projects, and being really really cheap.)

  • Have a staging environment
    • You don’t want to push code direct to prod, do you?
    • This can be a free dyno, depending on the complexity of your app.
    • Pipelines are nice, as is preboot.
    • Cost: free
  • Have a one line deploy.
    • Or, if you like CD/CI, an automatic deploy or a one click deploy.  But make it really easy to deploy.
    • Have a deploy script that goes straight to production for emergencies.
    • Cost: free
  •  Backups
    • User data.  If you aren’t using a shared object store like S3, make sure you are doing a backup.
    • Database.  Both heroku postgresql and amazon RDS have point and click solutions.  All you have to do is set them up.  (Test them, at least once.)
    • Cost: freeish, depending on the solution.  But, user data is worth spending money on.
  • Alerting
    • Heroku has options if you are running professional dynos.
    • Uptimerobot is a great free third party service that will check ports every 5 minutes and has a variety of alert options.  If you want SMS, you have to pay for it, but it’s not outrageous.
    • Cost: free
  • Logging
    • Use a logging framework (like slf4j or the rails logger, and mark error conditions with a string that will be easy to search for.
    • Yes, you can use heroku logs but having a log management solution will make you much happier.  Plus, it’s free for 2 days of logfiles.
    • Set up alerts with papertrail as well.  These can be more granular.
    • Cost: free
  • Create a list of third party dependencies.
    • Sign up for status alerts from these.  If you have pro slack, you can have them push an email to a channel.  If you don’t, create an alias that receives them.  You want to be the person that tells your clients about outages, not the other way around.
    • Cost: free
  • Communication
    • Internal
      • a devops_alert slack channel is my preferred solutions.  All deploys and other alerts go there.
    • External
      • create a mailing list for your clients so you can inform them of issues easily.  Google groups is fine.  Don’t use an alias in your email–you’ll forget to add new clients.
      • do not use this mailing list for marketing purposes.
      • do make sure when you gain or lose clients you keep this up to date
    • Run through a disaster in your mind and make notes on how you would communicate the issue, both internally and externally.  How often do you update your team?  How often do you update your clients?  What about an internal issue (some of your code screwed up) vs an external issue.  This doesn’t need to be exhaustive, but thinking about it ahead of time and making some notes will help you in the crisis.
    • Cost: free

All of this is probably a four hour project, max.

But once this is done, you’ll rest easier at night, knowing you have what you need to troubleshoot and recover from production issues.

Heroku drains

drain photoSo, I’ve learned a lot more than I wanted to about heroku drains. These are sinks to which heroku applications can write.  After the logs are out of heroku, you analyze these logs just as you would in any other application living outside of a PaaS.  Logs are very useful to see long term trends, debug, etc.  (I’ve worked both on a rails3 app and a java spring/camel app that are deploying to heroku.)

Here are some things I’ve learned:

  • Heroku drains are well documented.
  • You want definitely want them for any production application, because only 1500 lines of heroku logs are retained at any one time.
  • They can go to either syslog (great for applications with a lot of other infrastructure) or https (great for applications without as much infrastructure support).
  • They can’t do any kind of authorization.
  • You can’t know what ip address the logs are coming from, so you can’t limit access by IP.
  • There are third party extensions you can pay for to avoid dealing with drains at all (I’ve heard good things about papertrail.)
  • You can use logstash to pull heroku logs from a syslog drain into elastic search.
  • There are numerous github projects that can drain to databases, etc.  There’s even one that, with echos of Ouroboros, drains to another heroku app.
  • Drains have intelligent behavior if your listener (or listeners) fails.  From heroku support: “The short answer is yes, the drain will drop logs when the sink is not responsive, but this isn’t really the full story. There are a number of undocumented limits and backoff retries that happen when a drain connection is lost.”  And then they go on to explain how the backoff behaviour happens.  I’m not going to cut and paste their entire answer because I assume it is undocumented for a reason (maybe it changes, maybe they don’t want to commit to supporting this behavior).  Ask them yourself 🙂
  • A simple drain can be as easy as <?php error_log(file_get_contents('php://input'), 3, "/var/log/logfile.log"); ?>, but make sure you rotate that log file.
  • You can use puppet to manage drains if you are bringing servers up and down, using the heroku toolbelt and CLI authentication.

If you are deploying anything beyond a toy app on heroku, don’t forget the ops folks and make sure you set up your drain!

Masterless puppet and CloudFormation

I’ve had some experience with CloudFormation in the past, and recently gained some puppet expertise.  I thought it’d be great to combine the two, working on a new project to set up the ELK stack for a client.

Basically, we are creating an ec2 instance (or a number of them) from a vanilla image using a CloudFormation template, doing a small amount of initialization via the UserData section and then using puppet to configure them further.  However, puppet is used in a masterless context, where the intelligence (of knowing which machine should be configured which way) isn’t in the manifest file, but rather in the code that checks out the modules and manifests. Here’s a great example of a project set up to use masterless puppet.

Before I dive into more details, other solutions I looked at included:

  • doing all the machine setup in UserData
    • This is a bad idea because it forces you to set up and tear down machines each time you want to make a configuration change.  Leads to a longer development cycle, especially at first.  Plus bash is great for small configurations, but when you have dependencies and other complexities, the scripts can get hairy.
  • pulling a bash script from s3/github in UserData
    • puppet is made for configuration management and handles more complexity than a bash script.  I’ll admit, I used puppet with an eye to the future when we had more machines and more types of machines.  I suppose you could do the same with bash, but puppet handles more of typical CM tasks, including setting up cron jobs, making sure services run, and deriving dependencies between services, files and artifacts.
  • using a different CM tool, like ansible or chef
    • I was familiar with puppet.  I imagine the same solution would work with other CM tools.
  • using a puppet master
    • This presentation convinced me to avoid setting up a puppet master.  Cattle not pets.
  • using cloud-init instead of UserData for initial setup
    • I tried.  I couldn’t figure out cloud-init, even with this great post.  It’s been a few months, so I’m afraid I don’t even remember what the issue was, but I remember this solution not working for me.
  • create an instance/AMI with all software installed
    • puppet allows for more flexibility, is quicker to setup, and allows you to manage your configuration in a VCS rather than a pile of different AMIs.
  • use a container instead of AMIs
    • isn’t docker the answer to everything? I didn’t choose this because I was entirely new to containerization and didn’t want to take the risk.

Since I’ve already outlined how the solution works, let’s dive into details.

Here’s the UserData section of the CloudFormation template:

          "Fn::Base64": {
            "Fn::Join": [
                "#!/bin/bash \n",
                "exec > /tmp/part-001.log 2>&1 \n",
                "date >> /etc/ \n",
                "yum install puppet -y \n",
                "yum install git -y \n",
                "aws --region us-west-2 s3 cp s3://s3bucket/auth-files/id_rsa/root/.ssh/id_rsa && chmod 600 /root/.ssh/id_rsa \n",
                "# connect once to github, so we know the host \n",
                "ssh -T -oStrictHostKeyChecking=no \n",
                "git clone \n",
                "puppet apply --modulepath repo/infra/puppet/modules pure-spider/infra/puppet/manifests/",
                { "Ref" : "Environment" },
                "/logstash.pp \n",
                "date >> /etc/\n"

So, we are using a bash script, but only for a little bit.  The second line (starting with exec) stores output into a logfile for debugging purposes.  We then store off the date and install puppet and git.  The aws command pulls down a private key stored in s3.  This instance has access to s3 because of an IAM setup elsewhere in the CloudFormation template–the access we have is read-only and the private key has already been added to our github repository.  Then we connect to github via ssh to ‘get to know the host’.  Then we clone the repository containing the infrastructure code.  Finally, we apply the manifest, which is partially determined by a parameter to the CloudFormation template.

This bash script will run on creation of the EC2 instance.  Once this script is solid, if you are testing adding additional puppet modules, you only have to do a git pull and puppet apply to add more functionality to the modules.  (Of course, at the end you should stand up and tear down via CloudFormation just to test end to end.)  You can also see how it’d be easy to have the logstash.conf file be a parameter to the CloudFormation template, which would let you store your configuration for web servers, database servers, etc, in puppet as well.

I’m happy with how flexible this solution is.  CloudFormation manages the machine creation as well as any other resources, puppet manages the software installed in those machines, and git allows you to maintain all that configuration in one place.

Gluecon 2015 takeaways

Is it too early to write a takeaway post before a conference is over? I hope not!

I’m definitely not trying to write an exhaustive overview of Gluecon 2015–for that, check out the agenda. For a flavor of the conversations, check out the twitter stream:

Here are some of my longer term takeaways:

  • Better not to try to attend every session. Make time to chat with random folks in the hallway, and to integrate other knowledge. I attended a bitcoin talk, then tried out the API. (I failed at it, but hey, it was fun to try.)
  • Talks on microservices were plentiful. Lots of challenges there, and the benefits were most clearly espoused by Adrian Cockroft: they make complexity explicit. But they aren’t a silver bullet and require a certain level of organizational and business model maturity before it makes sense.
  • Developer hiring is hard, and it will get worse before it gets better. Some solutions propose starting at the elementary school level with with tools like Scratch. I talked to a number of folks looking to hire, and at least one presenter mentioned that as well at the end of his talk. It’s not quite as bad as 2000 because the standards are still high, but I didn’t talk to anyone who said “we have all the developers we need”. Anecdata, indeed.
  • The Denver Boulder area is a small tech community–I had beers last night with two folks that were friends of friends, and both of them knew and were working with former colleagues of mine. Mind that when thinking of burning that bridge.

To conclude, I’m starting to see repeat folks at Gluecon and that’s exciting. It’s great to have such a thought provoking conference which looks at both the forest and the trees of large scale software development.

Thoughts on Amazon CloudFormation

cloud formation photo

Photo by eschipul

I recently set up Amazon CloudFormation for a fairly complicated application in AWS.  For those unfamiliar with this service, it allows you specify a number of AWS resources in a declarative way in a JSON document, create them all at once (it’s called a ‘stack’), manage them as one entity, and destroy them.  You are billed just as you would be if you created the resources by hand.  But it’s a versionable, replicable way to create resources.

The distributed application for which I was creating the stack had the following components:

  • queues (SQS)
  • databases (dynamodb, including secondary indices)
  • compute (EC2)
  • alarms (Cloudwatch)
  • storage (S3)
  • a VPC and Subnets
  • event logging (kinesis)
  • hadoop (Elastic Map Reduce)

The last four items were not configured by the CloudFormation template I wrote.  S3, VPC and subnets because I leveraged existing resources, and Kinesis and EMR because they are not supported by CloudFormation.  (Kinesis has some support, but CloudFormation doesn’t allow you to specify a name of a stream, which makes it pretty useless when you want to post or read from a specific stream.)  However, while it would be preferable to have everything specified in CloudFormation, partial stack creation was useful–I just documented the other requirements in the CloudFormation template–because:

  • resource configuration like queue timeouts, names, read throughput, etc can be applied uniformly–consistency is enforced.
  • the infrastructure is defined and documented in one place, allowing a new developer to get up to speed quickly.
  • tags can be applied uniformly.
  • CloudFormation supports parameters, so that you can preface every resource with a deployment environment specific variable (‘stage’, ‘dan-dev’, etc), or have different DynamoDB throughput for different deployments.
  • if different configuration needs to be tested, you can stand up a new stack in minutes and test it.
  • the template can be stored in your version control system, allowing someone to see how things changed over time.  Yay, commit logs!

There were some other possible benefits I just didn’t have time to explore fully before the project wound down.

  • autoscaling groups seemed like they’d be extremely useful.  These aren’t a CloudFormation only tool, but CloudFormation seems an ideal way to define and use them.
  • the ability to create and delete stacks opened up the possibility of creating developer specific environments for debugging issues.

If you are going to start with CloudFormation, I highly recommend setting up an initial environment by hand, and then running CloudFormer, a small application written by Amazon which reads from your existing AWS infrastructure and generates a CloudFormation template.  I used CloudFormer to create a template for everything in our AWS account, and then picked and chose what was pulled over to the new template.  There were a few issues with this though:

  • There was a bug in the CloudFormation documentation for DynamoDB schemas.  You want to use this syntax: "KeySchema": { "HashKeyElement": { "AttributeName": "attrname", "AttributeType": "S" }, ... }.  CloudFormer generated them correctly, however.
  • CloudFormer coerces names of some resources resources including VPCs and subnets to strings, and I had to back those out when I wanted to use existing resources.

Other than not being able to fully define an application (because of dependencies on unsupported AWS tools like Kinesis and EMR), what other downsides does CloudFormation have?

  • it locks you into AWS.  Openstack Heat is an alternative that works across clouds, or so I read.  And, really, once you decide on AWS, is a infrastructure creation script going to be the one thing that keeps you from moving?
  • it is tied to infrastructure creation (though there is resource by resource support for in place updates).  If you want to modify one queue setting, you have to tear down and create anew the entire stack.  I found this to be relatively quick (15 min or so).
  • you are still writing scripts in the UserData section of the EC2 definition to set up your server environment.

After this experience, and reviewing my thoughts above, I believe the sweet spot of CloudFormation is setting up dev and QA environments quickly, and documenting infrastructure choices when you are committed to AWS.

Observations on a Writing a Custom Report with Java, Quickbooks, Jasper Reports, Google Spreadsheets and Google Drive

A recently released project is using java and spring to pull data from quickbooks, a mysql database and google spreadsheets, munging the data in various ways, and using jasper reports and jfreechart to generate a good looking report and a CSV of transactions that will give our brokers weekly updates on how they are doing compared to their goals for the year. I then upload it to Google Drive and send an email notifying each realtor that they have a new file.  It’s always nice to release useful software–feels like a new day dawns.

A few observations from this project:

  • The tech was interesting, but it was actually more interesting to see how the needs of tech drove the business to ‘tighten up’ their processes. Whether that was making sure there was one place to look for user type data, or defining exactly what constituted achieving a goal, or making sure that any new realtors who joined created business goals (and committed them to writing), the binary nature of software forced the business (or, more accurately, people in the business) to make decisions. Just another example of business process crystallization. This also meant deferring some software development. Where the business couldn’t answer a question definitively, rather than force them to do so, we chose to defer development.
  • I’m glad that jasper reports makes it so easy to generate PDFs–you basically create an XML file (I was unable to find a spec, but there were plentiful examples) and then put tokens for dynamic content. Then you compile the XML file, give it a map of said tokens and values (which can be text, numbers, dates or images), and then export the object to PDF. Note that I was not using Jasper in a typical way–reporting from large amounts of similar data via a data connection–because I wanted different PDFs for each user. Perhaps there was a way to do this elegantly, but I was just trying to get stuff done. Creating a grid in jasper was interesting, though.
  • JFreechart had a very weird issue where on stage the graph labels were bolded and italicized, but not on production. Since we make every effort to keep these two environments in sync and they were running exactly the same code, this was a mystery. Finally solved it when we discovered java was different (same version, different vendors: openjdk vs sun java). Had been running this way for years. Oops.
  • Interacting with google spreadsheets is great for the business but a pain in the butt for developers. It’s great for our business because it is extremely easy for someone who is not a programmer to create a ‘database’ that is versioned, backed up, almost always accessible and helps them ‘get stuff done’ in a measured way. It’s a pain for developers because it is a ‘database’ and not a database–no referential integrity or data typing. Also, google provides cell based access and row based access, forcing you to choose. And the java libraries are old. Beats excel though–at least it is accessible from a server. We ended up writing a library to wrap Google’s Java SDK to make some common operations easier.
  • Pushing to google drive is interesting–I alluded to this in my last post but you have to be ready for failure. I ended up using a BlockingQueue for this–I throw files (a data structure defining the file, actually) to be uploaded on the queue, then consumers executing in a different thread each take one off, try to upload it, and if it fails, put it back on. I considered using a third party durable queue like IronMQ, but thought it was overkill.
  • Using the Quickbooks SDK, with all the accounting data exposed, lets you build some pretty powerful graphs that are useful to the business. But the docs are jumbled, with a lot of them aimed at developers who are building integrations to sell to Quickbooks users. Support is OK for standard operations, but for things like renewing your token, you have to drop down to the REST API (see my SO question) This article does a good job of outlining the various projects but as a dev you’ll have to ignore certain sets of information–never fun when getting up to speed.
  • We do a lot of backend processing and spring and maven and a custom assembler that generates a tarball when using ‘maven install’ have been great. I also finally figured out how to work with maven to use ‘release:prepare’ and ‘release:perform’ for releasing libraries, as opposed to going my own way, and that has made things much much easier. Learn your tools, folks!
  • I’m once again astounded by the richness of the java library ecosystem. There doesn’t seem to be very much that I can think of doing that doesn’t have at least one, and probably three, java implementations.

Review of Emergent One

A few weeks ago I sat down with some folks at Emergent One and got a demo of their product.  The reason I reached out to them is because I heard great things about their demo at GlueCon.  My company is considering building a mobile app (who isn’t, right!) and I thought that what EmergentOne offered was a great way to accelerate the server side development of that app.

We are slightly outside of their target market–our API would be solely for internal use of a small team, while it seems like they are aiming at companies who want to make an API available to a larger audience (either external or a larger internal development staff).

Regardless of target market, they have a slick product.  They have a self service web application which can generate APIs directly from database tables.  You allow the app select access to your database.  (I believe only mysql and postgresql are supported at the moment, but I know things are moving fast over there as well.)  You then work within the app to build an API based on the tables in your database.  You can have derived fields as well as fields that map to columns in your database directly.  You can filter your data (so if you only want to expose a subset of your data, you can create an endpoint that only displays that: “users over 40”, for example).  You can also add comments to fields.

After you define as many API endpoints as you want, you can manage access to them with application keys or usernames and passwords.  Automatic documentation with datatypes and whatever comments you have added is generated, and there is a developer portal where it is easy to play around with the APIs and see what you missed.

What they showed me is great, but this product is still in private beta.  That means there are some rough edges.  The biggest hole (a fix for which the demoer promised was coming very soon in their development plan) is that you can’t search against the API.  So, if you have an API exposing your pets table, you can create an endpoint to retrieve all pets, and you can retrieve one pet based on id.  You can create a ‘dogs-only’ endpoint, and get all dogs or one dog.  But you can’t query the dogs-only endpoint for dogs that weigh over 25 pounds and have short hair (or any other type of querying).  I only played with read-only APIs, so I’m not sure how the write-access APIs work.

There’s also always the issue of introducing another vendor into the system.  Since we are looking at this for mobile apps, performance is very important.  It seemed like the demoer was well aware of this issue.  He mentioned an SLA would be likely when they went public, and also talked about some of the steps they are taking to make sure their app uses indexes and other metadata about the tables being exposed to execute as quickly as possible.

I haven’t built an API with any of the other tools out there, so I can’t compare the ease of Emergent One with, say, tools like jboss resteasy that work with a java layer, or usergrid, which autogenerates an API but requires moving data into it.  But I can say that this was a very easy way to go database to API in less than an hour (with iptables troubleshooting mixed in!).  If they get searching right–making it easy to use and performant–this will be a fantastic product.

Using Google Apps to generate a simple site from a form

I was blue skying with a friend the other day and came up with checklist to build a simple canned website from a form submission, using Google Apps and Google Apps Scripting.

  1. User goes to a form.  (More on creating a form)
  2. User submits images, video links, text, email, background color, etc.
  3. That all goes into a google spreadsheet, which triggers the building of a site.  Here’s trigger info.
  4. You can automatically build a google site.  Here’s the api: site creation, page creation.  If you want your own domain name ( I think you can do that, but you’d need to integrate with a DNS service to set that up–this part bears a little more investigating.
  5. If desired, you can generate the QR code for the site.
  6. If you need to tell the form submitter that the site is created, you can email them a link to the QR code so they can print it out.
  7. You can even remove content from the site via cron (see triggers link up above), though you can’t automatically delete the site.
  8. It’s conceivable you could even use google checkout to charge users beforehand (though this is a bit less supported).

What’s great is that you don’t need to have any tool other than a browser to build this entire stack.  Google is truly creating a toolkit/macro language in the cloud.

Image Recognition Automation Fail

I had a friend who asked me to take a quick look at a business problem he was having.  He had a set of photos (of vinyl record albums) that he was looking at to identify the artist.  After finding the artist, he’d do some additional categorization work and then push the image and metadata to a ecommerce site he is running.

He wanted a way to more quickly identify the artist and album, preferably without his intervention.

The first thing I suggested was Mechanical Turk, as this seemed like a great example of a Human Interaction Task.  However, my friend tried this and found it to be more work (mostly proofing, I think) than it was help.

He also pointed out that Google Image Search does exactly what he wants.  You can post an image or URL to it (you have to use the camera in the search box), and it will give you back like images, best guess for the image matching, and links related to the image.  Pretty cool!

However, there is no API for the new Google Image Search.  Sure, you have the old, deprecated Images API, but it doesn’t have access to post an image or a URL, just keywords.  A bit of looking around revealed this StackOverflow post, which deconstructs the new Image Search parameters.  However, the deconstructed URL gives back javascript which needs to be executed by a browser.  I looked briefly at Selenium and webdriver to do that, but couldn’t figure it out.

I also looked at Kooaba’s API, but they didn’t get back to me when I signed up for a free developer account, and their API only covers books, CDs and DVDs.  Also from a StackOverflow post, I looked at MacroGlossa and IQEngines.  Neither of them seemed to work–MacroGlossa wanted a category (and, shocker, vinyl record albums was not a category) and IQEngines let me submit an image wasn’t successful in identifying it.

I had to admit I was defeated.

© Moore Consulting, 2003-2017 +