Skip to content

Heroku drains

drain photoSo, I’ve learned a lot more than I wanted to about heroku drains. These are sinks to which heroku applications can write.  After the logs are out of heroku, you analyze these logs just as you would in any other application living outside of a PaaS.  Logs are very useful to see long term trends, debug, etc.  (I’ve worked both on a rails3 app and a java spring/camel app that are deploying to heroku.)

Here are some things I’ve learned:

  • Heroku drains are well documented.
  • You want definitely want them for any production application, because only 1500 lines of heroku logs are retained at any one time.
  • They can go to either syslog (great for applications with a lot of other infrastructure) or https (great for applications without as much infrastructure support).
  • They can’t do any kind of authorization.
  • You can’t know what ip address the logs are coming from, so you can’t limit access by IP.
  • There are third party extensions you can pay for to avoid dealing with drains at all (I’ve heard good things about papertrail.)
  • You can use logstash to pull heroku logs from a syslog drain into elastic search.
  • There are numerous github projects that can drain to databases, etc.  There’s even one that, with echos of Ouroboros, drains to another heroku app.
  • Drains have intelligent behavior if your listener (or listeners) fails.  From heroku support: “The short answer is yes, the drain will drop logs when the sink is not responsive, but this isn’t really the full story. There are a number of undocumented limits and backoff retries that happen when a drain connection is lost.”  And then they go on to explain how the backoff behaviour happens.  I’m not going to cut and paste their entire answer because I assume it is undocumented for a reason (maybe it changes, maybe they don’t want to commit to supporting this behavior).  Ask them yourself 🙂
  • A simple drain can be as easy as <?php error_log(file_get_contents('php://input'), 3, "/var/log/logfile.log"); ?>, but make sure you rotate that log file.
  • You can use puppet to manage drains if you are bringing servers up and down, using the heroku toolbelt and CLI authentication.

If you are deploying anything beyond a toy app on heroku, don’t forget the ops folks and make sure you set up your drain!

“Wave a magic wand”

wand photoThat was what a previous boss said when I would ask him about some particularly knotty, unwieldy issue. “What would the end solution look like if you could wave a magic wand and have it happen?”

For instance, when choosing a vendor to revamp the flagship website, don’t think about all the million details that need to be done to ensure this is successful. Don’t think about who has the best process. Certainly don’t think about the technical details of redirects, APIs and integrations. Instead, “wave a magic wand” and envision the end state–what does that look like? Who is using
it? What do they do with it? What do you want to be able to do with the site? What do you want it to look like?

Or if an employee is unhappy in their role, ask them to “wave the magic wand” and talk about what role they’d rather be in. With no constraints you find out what really matters to them (or what they think really matters to them, to be more precise).

When you think about issues through this lens, you focus on the ends, not the means.  It lets you think about the goal and not the obstacles.

Of course, then you have to hunker down, determine if the goal is reachable, and if so, plan how to reach it. I like to think of this as projecting the vector of the ideal solution into the geometric plane of solutions that are possible to you or your organization–the vector may not lie in the plane, but you can get as close as possible.

“Waving a magic wand” elevates your thinking. It is a great way to think about how to solve a problem not using known methods and processes, but rather determining the ideal end goal and working backwards from there to the “hows”.

Masterless puppet and CloudFormation

puppet photo
Photo by nevil zaveri (thank you for 10million+ views 🙂

I’ve had some experience with CloudFormation in the past, and recently gained some puppet expertise.  I thought it’d be great to combine the two, working on a new project to set up the ELK stack for a client.

Basically, we are creating an ec2 instance (or a number of them) from a vanilla image using a CloudFormation template, doing a small amount of initialization via the UserData section and then using puppet to configure them further.  However, puppet is used in a masterless context, where the intelligence (of knowing which machine should be configured which way) isn’t in the manifest file, but rather in the code that checks out the modules and manifests. Here’s a great example of a project set up to use masterless puppet.

Before I dive into more details, other solutions I looked at included:

  • doing all the machine setup in UserData
    • This is a bad idea because it forces you to set up and tear down machines each time you want to make a configuration change.  Leads to a longer development cycle, especially at first.  Plus bash is great for small configurations, but when you have dependencies and other complexities, the scripts can get hairy.
  • pulling a bash script from s3/github in UserData
    • puppet is made for configuration management and handles more complexity than a bash script.  I’ll admit, I used puppet with an eye to the future when we had more machines and more types of machines.  I suppose you could do the same with bash, but puppet handles more of typical CM tasks, including setting up cron jobs, making sure services run, and deriving dependencies between services, files and artifacts.
  • using a different CM tool, like ansible or chef
    • I was familiar with puppet.  I imagine the same solution would work with other CM tools.
  • using a puppet master
    • This presentation convinced me to avoid setting up a puppet master.  Cattle not pets.
  • using cloud-init instead of UserData for initial setup
    • I tried.  I couldn’t figure out cloud-init, even with this great post.  It’s been a few months, so I’m afraid I don’t even remember what the issue was, but I remember this solution not working for me.
  • create an instance/AMI with all software installed
    • puppet allows for more flexibility, is quicker to setup, and allows you to manage your configuration in a VCS rather than a pile of different AMIs.
  • use a container instead of AMIs
    • isn’t docker the answer to everything? I didn’t choose this because I was entirely new to containerization and didn’t want to take the risk.

Since I’ve already outlined how the solution works, let’s dive into details.

Here’s the UserData section of the CloudFormation template:


          "Fn::Base64": {
            "Fn::Join": [
              "",
              [
                "#!/bin/bash \n",
                "exec > /tmp/part-001.log 2>&1 \n",
                "date >> /etc/provisioned.date \n",
                "yum install puppet -y \n",
                "yum install git -y \n",
                "aws --region us-west-2 s3 cp s3://s3bucket/auth-files/id_rsa/root/.ssh/id_rsa && chmod 600 /root/.ssh/id_rsa \n",
                "# connect once to github, so we know the host \n",
                "ssh -T -oStrictHostKeyChecking=no git@github.com \n",
                "git clone git@github.com:client/repo.git \n",
                "puppet apply --modulepath repo/infra/puppet/modules pure-spider/infra/puppet/manifests/",
                { "Ref" : "Environment" },
                "/logstash.pp \n",
                "date >> /etc/provisioned.date\n"
              ]
            ]

So, we are using a bash script, but only for a little bit.  The second line (starting with exec) stores output into a logfile for debugging purposes.  We then store off the date and install puppet and git.  The aws command pulls down a private key stored in s3.  This instance has access to s3 because of an IAM setup elsewhere in the CloudFormation template–the access we have is read-only and the private key has already been added to our github repository.  Then we connect to github via ssh to ‘get to know the host’.  Then we clone the repository containing the infrastructure code.  Finally, we apply the manifest, which is partially determined by a parameter to the CloudFormation template.

This bash script will run on creation of the EC2 instance.  Once this script is solid, if you are testing adding additional puppet modules, you only have to do a git pull and puppet apply to add more functionality to the modules.  (Of course, at the end you should stand up and tear down via CloudFormation just to test end to end.)  You can also see how it’d be easy to have the logstash.conf file be a parameter to the CloudFormation template, which would let you store your configuration for web servers, database servers, etc, in puppet as well.

I’m happy with how flexible this solution is.  CloudFormation manages the machine creation as well as any other resources, puppet manages the software installed in those machines, and git allows you to maintain all that configuration in one place.

Making my Twitter feed richer with Zapier and hnrss

twitter photo
Photo by marek.sotak

I read Hacker News, a site for startups and technologies, and occasionally post as well.  A few months back, I realized that the items that I post to HN, I want to tweet as well.  While I could have whipped something up with the HN RSS feed and the Twitter API (would probably be easier than Twitversation), I decided to try to use Zapier (which I’ve loved for a while).  It was dead simple to set up a Zap reading from my HN RSS feed and posting to my Twitter feed.  Probably about 10 minutes of time, and now I doubled my posts to Twitter.

Of course, this misses out on one of the huge benefits of Twitter–the conversational nature of the app.  When my auto posts happen, I don’t have a chance to follow up, or to cc: the authors, etc.

However, the perfect is the enemy of the good, and I figured it was better to engage in Twitter haphazardly and imperfectly than not at all.

Good time had by all at the HN Meetup

Where can you talk about super-capacitors vs batteries, whether you should rewrite your app (hint, you shouldn’t), penetration testing of well known organizations, cost of living, and native vs cross platform mobile apps?  All while enjoying a cold drink and the best fried food the Dark Horse can offer?

At the Boulder Hacker News Meetup, that’s where.  We had our inaugural meetup today and had a good showing.  Developers, startup owners, FTEs, contractors, backend folks, front end devs and penetration testers all showed up, and, as the Meetup page suggests, ate, drank and chatted.

Hope to see you at the next one.

The power of SQL–WP edition

lightning photoI’ve noticed a lot of comment spam lately, and had been dealing with it piecemeal. I have all comments moderated, so it wasn’t affecting the user experience, but I was getting email reminders to moderate lots of comments that were not English and/or advertising something.

WordPress lets you turn off comments for a post, but it is tedious through the GUI. So, I dove into the wordpress database, and ran this SQL query:

update wp_posts set comment_status = 'closed' where post_date < '2014-01-01';

This sets comments to closed for all posts from 2013 or earlier (over 1000 posts).  The power of the command line is the ability to repeat yourself effortlessly.  (The drawback of the command line is its obscurity.)

I should probably write a cron job to do that every year, as posts older than two years old aren’t commented on much. To be honest, the comments section of ‘Dan Moore!’ isn’t used much at all–I know some blogs (AVC) have vibrant comments areas, but I never tried to foster a community.

Using basic authentication and Jetty realms to protect Apache Camel REST routes

So, just finished up looking at authentication mechanisms for a REST service that runs on Camel for a client.  I was using Camel 2.15.2 and the default Jetty component, which is Jetty 8.  I was ready to use Oauth2 until I read this post which makes the very valid point that unless you have difficulty exchanging a shared secret with your users, SSL + basic auth may be enough.  Since these APIs will be used by apps that are built for the same client, and there is no role based access, it seemed like SSL _ basic auth would be good enough.

This example is different from a lot of other Camel security posts, which are all about securing routes within camel.  This example secures the external facing REST routes defined using the REST DSL. I’m pretty sure the spring security example could be modified to protect REST DSL routes as well.

So, easy enough.  I looked at the Camel Jetty security configuration docs, and locking things down with basic auth seemed straightforward.  But, they were out of date.  Here’s an updated version.  (I’ve applied for the ability to edit the Camel website and will update the docs when I can.)

First off, secure the rest configuration in your camel context:

<camelContext ...>
   <restConfiguration component="jetty" ...>
      <endpointProperty key="handlers" value="securityHandler"></endpointProperty>
   </restConfiguration>
...
</camelContext>

Also, make sure you have the camel-jetty module imported your pom.xml (or build.gradle, etc).

Now, create the security context. The below is done in spring XML, but I’m sure it can be converted to Spring annotation configuration easily. This example uses a HashLoginService, which requires a properties file. I didn’t end up using it because this was POC code, but here’s how to configure the HashLoginService.

<!-- Jetty Security handling -->
<bean id="constraint" class="org.eclipse.jetty.util.security.Constraint"> <!-- this class's package changed -->
   <property name="name" value="BASIC"/>
   <property name="roles" value="XXXrolenameXXX"/>
   <property name="authenticate" value="true"/>
</bean>

<bean id="constraintMapping" class="org.eclipse.jetty.security.ConstraintMapping">
   <property name="constraint" ref="constraint"/>
   <property name="pathSpec" value="/*"/>
</bean>

<bean id="securityHandler" class="org.eclipse.jetty.security.ConstraintSecurityHandler">
   <property name="loginService">
      <bean class="org.eclipse.jetty.security.HashLoginService" />
   </property>
   <property name="authenticator">
      <bean class="org.eclipse.jetty.security.authentication.BasicAuthenticator"/>
   </property>
   <property name="constraintMappings">
      <list>
         <ref bean="constraintMapping"/>
      </list>
   </property>
</bean>

As mentioned above, I ended up writing a POC authentication class which had hardcoded values. This post was very helpful to me, as was looking through the jetty8 code on using grepcode.

public class HardcodedLoginService implements LoginService {
  
        // matches what is in the constraint object in the spring config
        private final String[] ACCESS_ROLE = new String[] { "rolename" };
        ...

	@Override
	public UserIdentity login(String username, Object creds) {
		
        UserIdentity user = null;
        
        
        // HERE IS THE HARDCODING
		boolean validUser = "ralph".equals(username) && "s3cr3t".equals(creds);
		if (validUser) {
			Credential credential = (creds instanceof Credential)?(Credential)creds:Credential.getCredential(creds.toString());

		    Principal userPrincipal = new MappedLoginService.KnownUser(username,credential);
		    Subject subject = new Subject();
		    subject.getPrincipals().add(userPrincipal);
		    subject.getPrivateCredentials().add(creds);
		    subject.setReadOnly();
		    user=identityService.newUserIdentity(subject,userPrincipal, ACCESS_ROLE);
		    users.put(user.getUserPrincipal().getName(), true);
		}

	    return (user != null) ? user : null;
	}

	...
}

How, when you retrieve your rest routes, you need to pass your username and password or the request will fail with a 401 error: curl -v --user ralph:s3cr3t http://localhost:8080/restroute

Note that the ACCESS_ROLE variable in the login class must match the roles property of the constraint object, or you’ll get this message:

Problem accessing /restroute. Reason:
!role

You can find a working example on github (with the hardcoded login provider).

Thanks to my client Katasi for letting me publish this work.

Gluecon 2015 takeaways

Is it too early to write a takeaway post before a conference is over? I hope not!

I’m definitely not trying to write an exhaustive overview of Gluecon 2015–for that, check out the agenda. For a flavor of the conversations, check out the twitter stream:


Here are some of my longer term takeaways:

  • Better not to try to attend every session. Make time to chat with random folks in the hallway, and to integrate other knowledge. I attended a bitcoin talk, then tried out the API. (I failed at it, but hey, it was fun to try.)
  • Talks on microservices were plentiful. Lots of challenges there, and the benefits were most clearly espoused by Adrian Cockroft: they make complexity explicit. But they aren’t a silver bullet and require a certain level of organizational and business model maturity before it makes sense.
  • Developer hiring is hard, and it will get worse before it gets better. Some solutions propose starting at the elementary school level with with tools like Scratch. I talked to a number of folks looking to hire, and at least one presenter mentioned that as well at the end of his talk. It’s not quite as bad as 2000 because the standards are still high, but I didn’t talk to anyone who said “we have all the developers we need”. Anecdata, indeed.
  • The Denver Boulder area is a small tech community–I had beers last night with two folks that were friends of friends, and both of them knew and were working with former colleagues of mine. Mind that when thinking of burning that bridge.

To conclude, I’m starting to see repeat folks at Gluecon and that’s exciting. It’s great to have such a thought provoking conference which looks at both the forest and the trees of large scale software development.

Five rules for troubleshooting an unfamiliar system

trouble photo
Photo by Ken and Nyetta

A few weeks ago, I engaged with a client who had a real issue.  They sold a variety of goods via a website (if this was the 90s, they would have been called an ‘e-tailer’), and had been receiving intermittent double orders through their ecommerce system.  Some customers were charged two times for one order.  This led, as you can imagine, to very unhappy customers.  This had been happening for a while and, unfortunately, due to some external obstacles, internal staff were not available to investigate the issue–they had their hands full with an existing higher priority project.

I was called in to see if I could solve this issue.  I had absolutely no familiarity with the system.  But in less than ten hours of time, I was able to find the issue and resolve it.  How I approached the situation can be summed up in five rules:

Number one: define the problem.  Ask questions, and capture the answers.  What is the exact undesired behavior?  When is the undesired behavior happening?  What seems to trigger it?  When did it start?  Were there any changes that happened recently?  Does the client have reproduction steps?

I gathered as much information as I could, but keep it high level.  I asked for architecture and system diagrams.  For the history of the application.  For access to all systems that could possibly be relevant (this will save you time in the future).  For locations of log files, source repositories, configuration files.  For database credentials and credentials for third party systems like CC processors.  It is important at this time to resist the temptation to dive in–at this point the job is to get a high level understanding so I can be efficient in the next steps.

You will get speculation about what the solution is when you are asking about the problem.  Feel free to capture that, but don’t be influenced by it.

Number two–find the finish line.  After getting a clear definition of the problem, I looked in the orders database and find out if the double orders were showing up there.  They were, which was a clue as to which part of the system was malfunctioning, but more importantly let me see the effectiveness of any changes I was making.  It also lets the customer know the objective end goal, which can be important if this is a t&m project, and it let me know the end state to which I was headed–important for morale.  (BTW, don’t do fixed bids for this type of project–overruns will be unpleasant, and there will be overruns.)

I was able to write a SQL script to find double orders over a given time frame.  I ended up writing a script which emailed the results of this query to myself and the client nightly, as an easy way to track progress.  The results of this query were a quantifiable, objective measure of the problem.

Number three–start where you are familiar.  I could have dove in and looked at the codebase, but due to my problem definition, I knew that there had been no changes to the checkout portion of the code base for years.  I also was unfamiliar with the particular software that managed the ecommerce site and could have wasted a lot of time getting up to speed on the control flow.  Instead, once I had the SQL query, I could find users that had been double charged, and look at their sessions in the web server logs.  I’ve been looking at apache http logs for over a decade and was very familiar with this piece of the system.

Number four–follow your nose. I followed a few of the user sessions using grep and noticed some weirdness in the logs.  There were an awful lot of messages that indicated the server had been restarted, and all the double orders I looked at had completed 5-6 seconds after the minute changed.  (It’s hard to define weirdness explicitly, which is why it behooved me to start with a portion of the system that I was experienced with–it made the “weirdness” more obvious.)  From here, I ended up looking at why or how the server was being restarted regularly.  Ended up finding an errant cron job which was restarting the server often enough that the ecommerce system was getting confused and double booking orders–once before the restart and once after.  This was easily fixed by commenting out the cron job.

Number five–know when to stop.  This ecommerce system obviously had a logic flaw–after all, restarting the web server shouldn’t cause an order to be entered twice, whether you restart it every hour or once a year.  I could have dug through the code to find that out.  But instead, I commented out the cron job, let the system run for a week or so and waited for more double orders.  There were none, indicating that the site was low traffic enough that whatever flaw was present didn’t get exercised often, if at all.  I confirmed with the client that this situation met his expectations of completeness, and called it good.

Being thrown into a new system, especially when troubleshooting, is a difficult task.  I am thankful the client was relatively responsive to my questions, and that pressure, while present, wasn’t intense.  These five steps should help you, if you are put in any troubleshooting situation.