Skip to content

All posts by moore - 33. page

Step back and review the problem: SSL Edition

I spent the better half of a week recently getting an understanding of how SSL works and how to get client certificates, self signed certificates and java all playing nicely.  However, in the end, taking a big step back and examining the problem led to an entirely different solution.

The problem in question is access to a secure system from a heroku application.  The secure system is protected by an IP based firewall, basic authentication over SSL, and client certificates that were signed by a non standard certificate authority (the owner of the secure system).

We are using HttpClient from Apache for all HTTP communication, and there’s a nice example (but, depending on your version of the client, run loadKeyMaterial too, as that is what ends up sending the client cert).  When I ran it from my local machine, after setting the IP to be one of the static IPs, I ended up seeing a wide variety of errors.  Oh, there are many possible errors!  But finally, after making sure that the I could import the private key into the store, was using the correct cipher/java version combo, writing a stripped down client that didn’t interface with any other parts of the system, learning about the javax.net.debug=ssl switch, making sure the public certificates were all in the store, and that I had IP based access to the secure system, I was able to watch the system go through a number of the steps outlined here:

SSL Message flow
Image from the JSSE Guide

But I kept seeing this exception: error: javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure. I couldn’t figure out how to get past that. From searching there were a number of issues that all manifested with this error message, and “guess and check” wasn’t working. Interestingly, I saw this error only when running on my local machine with the allowed static IP. When I ran on a different computer that was going through a proxy which provided a static IP, the client certificate wasn’t being presented at all.

After consulting with the other organization and talking with members of the team about this roadblock, I took a step back and validated the basics. I read the JSSE Reference guide. I used the openssl tools to verify that the chain of certificates was valid. (One trick here is that if you have intermediate certs in between your CA and the final cert, you can add only one -untrusted switch. So if you have multiple certs in between, you need to combine the PEM files into one file.) I validated that the final certificate in the keystore matched the CSR and the private key. Turns out I had the wrong key, and that was the source of the handshake issue. Doh!

After I had done that, I took a look at the larger problem. Heroku doesn’t guarantee IP addresses, not even a range. There are add on solutions (proximo, quotaguard, fixie) that do provide a static IP address, and that was our initial plan. However, all of these are proxy based solutions. A quick search turns up the unpleasant reality that proxies can’t pass client certificates. The post talks about a reverse proxy, but it applies as well to regular proxies. From the post:

Yes, because a client can only send its certificate by using encrypted and
SIGNED connection, and only the client can sign the certifikate (sic) so server
can trust it. The proxy does not know the clients private key, otherwise the
connection would not be secure (or not in the way most people know that).

SSL is made up to avoid man-in-the-middle attack, and the reverse proxy IS
the man-in-the-middls. Either you trust it (and accept what it sends) or
don’t use it.

All my work on having the java code create the client certificate was a waste. Well, not a total waste because now I understood the problem space in a way I hadn’t before, so I could perform far better searches.

I opened a support request with our proxy provider, but it was pretty clear from the internet, the support staff and the docs that this was a niche case. I don’t know if any static IP proxy providers support client certificates, but I wasn’t able to find one.

Instead, we were able to use AWS elastic IP and nginx to set up our own proxy. Since we controlled it, we could install the client certificate and key on it, and have the heroku instance connect to that proxy. (Tips for those instructions–make sure you download the openssl source, as nginx wants to compile it into the web server. And use at least version 1.9 of the community software.)

So, I made some mistakes in this process, and in a personal retro, I thought it’d be worth talking about them.

  • I jumped into searching too quickly. SSL and private certs is a complicated system that has a lot of moving pieces, and it would have been worth my time looking at an overview.
  • While I was focusing on accessing the system from the java code, there were hints about the proxy issue. I didn’t consider the larger picture and was too focused on solving the immediate issue.

How to get a YouTube channel as a podcast for $15/month

Want to create a podcast from a YouTube channel, but don’t have access to the original video or don’t have the time to set up a podcast?

Here’s how to set this up, in 6 easy steps.

  1. Find the YouTube channel URL, something like: https://www.youtube.com/channel/UCsjtSWdw9t1XlBnyrV7mAOQ
  2. Note the part after channel: UCsjtSWdw9t1XlBnyrV7mAOQ
  3. Download and install YouCast.  This is a Windows only program, so if you don’t have access to Windows, or you want to have this accessible across the world without leaving your home PC on, I’d recommend signing up for Azure hosting.  You can have a decent server (an A0) for ~$15/month.  I just used a standard Windows 2008 server, but I did have to upgrade .NET.  This and this will be helpful for setup.
  4. Start up YouCast, and set the channel id to what you have above.  Select the audio output format, and generate the URL.  You’ll get a funky looking URL like http://hostname:22703/FeedService/GetUserFeed?userId=UCsjtSWdw9t1XlBnyrV7mAOQ&encoding=Audio&maxLength=0&isPopular=False
  5. Set YouCast up as a service.  I used NSSM.  Otherwise when your server reboots (as it occasionally will), you won’t have access to your podcast.
  6. Add the URL to your podcast catcher (I like Podcast Addict).

For bonus points, use a service like FeedBurner or RapidFeeds to capture stats about the podcast and make the URL nicer.speaker photo

I did this for one of my favorite YouTube channels, the Startup Therapist.  (I’ve asked Jeff at least once to start a podcast, and I hope he does soon.)

Not sure exactly what the issue is, but even though the podcast RSS feed has all the episodes in it, I can only download three podcasts per channel at a time.  I’ve no idea why–whether it is a limit of Podcast Addict, YouTube, YouCast or some combination.  But apart from that this solution works nicely.

 

 

 

How to maintain motivation when blogging

clock photoAnother year slipped by! They seem to come faster and faster, just as promised by all the old men in the comic strips I read when growing up.

I recently had a couple of conversations about blogging: how to start, why to do it, how to maintain it. I thought I’d capture some of my responses.

After over twelve years of blogging (that’s correct, in 2016 my blog is a teenager!), here are the three reasons that I keep at it.

  • Writing crystallizes the mind. Writing a piece, especially a deep technical piece, clarifies my understanding of the problem (it’s similar to writing an email to the world, in some ways). Sometimes it will turn up items I hadn’t considered, or other questions to search on. It’s easy to hold a fuzzy concept in my mind, but when written down, the holes in my knowledge become more evident.
  • Writing builds credibility. I have received a number of business inquiries from my writing. (I suspect there’d be more if my blog were more focused. The excellent “How to start blogging” course from John Sonmez is worth signing up for.  The number one thing to have a successful blog is subject matter focus. But I have a hard time limiting myself to a single topic. Maybe I’m building credibility as a generalist?) And I’ve had a few people interview me for positions and mention they found this blog. It’s easy to say “I know technology XXX” in an interview or consulting situation, but I have found it to be powerful and credible to say “Ah yes, I’ve seen technology XXX before. I wrote a post about it six months ago. Let me send that to you.”
  • Writing helps others. I have had friends mention that they were looking for solutions for something and stumbled across my blog. In fact, I’ve been looking for solutions to issues myself and stumbled onto a post from my blog, so even my future self thanks me for blogging.  I don’t have many comments (real ones, at least. The spam, oh, the spam), but the ones that are left often thank me for helping them out. And I know I have been helped tremendously by posts written by others, so writing pays this help forward.

Of course, these reasons apply to almost all writing–whether magazine, comments on social networks, twitter, medium, answers on stack overflow or something else.  So why continue to write on “Dan Moore!”?  Well, I did try medium recently, and am relatively active on Twitter, HackerNews and StackOverflow, and slightly less active on other social sites like Reddit and Lobste.rs.  All these platforms are great, but my beef with all of them is the same–you are trading control for audience.  As long as I pay my hosting bill and keep my domain registered, my content will be ever-present.  In addition, my blog can weave all over the place as my available time and interests change.

If you blog, I’d love to hear your reasons for doing so.  If you don’t, would love to hear what is keeping you from doing so.

Year in review, aka what did I ship in 2015

What did I ship (or help ship) in 2015?

(I did this a few years ago, and then became an employee.  Though it is probably even more important to think about what you ship as an employee, when I am a contractor it is easier to publicize it.)

  • Rewrote my farm share directory to support multiple states, numerous bugfixes and a new feature to let folks add reviews.
  • Sent seven newsletters for said farm share directory.
  • An email course to educate consumers about farm shares.
  • Helped take a video to structured data project from failure to success.  I was brought in as a senior engineer to a team and helped with porting an admin app from one environment to another, reviewed and fixed python program which took video and generated images, managing datasets for training, writing java microservices around C libraries, documenting processes, and coordinating with an overseas team as needed.
  • Set up secor to pull logging from kafka to s3, as well as setting up java processes to log to kafka.
  • Helped integrate Activiti into a custom workflow engine, and promoted a test first culture on the team I worked with.
  • Dropped in and helped troubleshoot an e-commerce system with which I was totally unfamiliar during an emergency.
  • Learned enough ruby on rails and Stripe to add an online order form to an existing Heroku website.
  • Helped build a backend system to monitor phone and car locations to prevent texting and driving.  My role on this small team varied between devops, java development, QA, code review, defining process and documentation.
  • Installed and tuned an elk stack used for business intelligence and developer debugging.
  • Took my wife’s writing and turned it into a book (best surprise ever).
  • Wrote 34 blog posts.

Of course, there were other personal milestones too, like camping with the kiddos, getting solar installed, road trips, and date nights with the wife.  All in all, a great year.  Here’s to 2016.

When you encounter a PropertyReferenceException using SpringData and MongoDb

If you are using spring data and mongodb, you can use these magic methods called “derived queries”, which make writing simple queries very easy.

However, you may run into a PropertyReferenceException: No property module found for <type> message, with an exception similar to the one below.

This means you have a typo in your derived query (miscapitalized word, misspelled word, etc), so take a long look at that Repository interface.


[mongod output] 10:09:58.421 [main] WARN o.s.c.a.AnnotationConfigApplicationContext [AbstractApplicationContext.java:487] - Exception encountered during context initialization - cancelling refresh attempt 
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'tripDAO': Invocation of init method failed; nested exception is org.springframework.data.mapping.PropertyReferenceException: No property module found for type Trip!
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1574) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:539) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:476) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:303) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:299) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:194) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:736) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:757) ~[spring-context-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:480) ~[spring-context-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.context.annotation.AnnotationConfigApplicationContext.(AnnotationConfigApplicationContext.java:84) [spring-context-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at com.katasi.decision_engine.AbstractMongoDbTest.createApplicationContext(AbstractMongoDbTest.java:49) [test-classes/:na] 
at org.apache.camel.testng.CamelSpringTestSupport.doPreSetup(CamelSpringTestSupport.java:71) [camel-testng-2.15.2.jar:2.15.2]
at com.katasi.decision_engine.processor.CreateTripTest.doPreSetup(CreateTripTest.java:66) [test-classes/:na]
at org.apache.camel.testng.CamelTestSupport.setUp(CamelTestSupport.java:215) [camel-testng-2.15.2.jar:2.15.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_31]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_31]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_31]
at java.lang.reflect.Method.invoke(Method.java:483) ~[na:1.8.0_31] 
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:85) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeConfigurationMethod(Invoker.java:552) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:215) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeMethod(Invoker.java:636) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:882) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1189) [testng-6.8.21.jar:na]
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:124) [testng-6.8.21.jar:na]
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:108) [testng-6.8.21.jar:na]
at org.testng.TestRunner.privateRun(TestRunner.java:767) [testng-6.8.21.jar:na]
at org.testng.TestRunner.run(TestRunner.java:617) [testng-6.8.21.jar:na]
at org.testng.SuiteRunner.runTest(SuiteRunner.java:348) [testng-6.8.21.jar:na]
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:343) [testng-6.8.21.jar:na]
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:305) [testng-6.8.21.jar:na]
at org.testng.SuiteRunner.run(SuiteRunner.java:254) [testng-6.8.21.jar:na]
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) [testng-6.8.21.jar:na]
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) [testng-6.8.21.jar:na]
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1224) [testng-6.8.21.jar:na]
at org.testng.TestNG.runSuitesLocally(TestNG.java:1149) [testng-6.8.21.jar:na]
at org.testng.TestNG.run(TestNG.java:1057) [testng-6.8.21.jar:na]
at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:77) [surefire-testng-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeMulti(TestNGDirectoryTestSuite.java:159) [surefire-testng-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:99) [surefire-testng-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:106) [surefire-testng-2.12.4.jar:2.12.4]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_31]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_31] 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_31]
at java.lang.reflect.Method.invoke(Method.java:483) ~[na:1.8.0_31] 
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) [surefire-api-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) [surefire-booter-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) [surefire-booter-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) [surefire-booter-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) [surefire-booter-2.12.4.jar:2.12.4]
Caused by: org.springframework.data.mapping.PropertyReferenceException: No property module found for type Trip!
at org.springframework.data.mapping.PropertyPath.(PropertyPath.java:75) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mapping.PropertyPath.create(PropertyPath.java:327) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mapping.PropertyPath.create(PropertyPath.java:307) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mapping.PropertyPath.from(PropertyPath.java:270) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mapping.PropertyPath.from(PropertyPath.java:241) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.Part.(Part.java:76) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.PartTree$OrPart.(PartTree.java:235) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.PartTree$Predicate.buildTree(PartTree.java:373) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.PartTree$Predicate.(PartTree.java:353) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.PartTree.(PartTree.java:87) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mongodb.repository.query.PartTreeMongoQuery.(PartTreeMongoQuery.java:53) ~[spring-data-mongodb-1.7.2.RELEASE.jar:na]
at org.springframework.data.mongodb.repository.support.MongoRepositoryFactory$MongoQueryLookupStrategy.resolveQuery(MongoRepositoryFactory.java:128) ~[spring-data-mongodb-1.7.2.RELEASE.jar:na]
at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.(RepositoryFactorySupport.java:367) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.core.support.RepositoryFactorySupport.getRepository(RepositoryFactorySupport.java:190) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.initAndReturn(RepositoryFactoryBeanSupport.java:239) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.afterPropertiesSet(RepositoryFactoryBeanSupport.java:225) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mongodb.repository.support.MongoRepositoryFactoryBean.afterPropertiesSet(MongoRepositoryFactoryBean.java:108) ~[spring-data-mongodb-1.7.2.RELEASE.jar:na]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1633) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1570) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
... 50 common frames omitted

Guide to Reindexing ElasticSearch data input with Logstash

I ran into an issue where I set up logstash to load data that was numeric as a string. Then, later on when we wanted to do visualizations with it, they were off. So, I needed to re-index all the data.

Total pain, hope this guide helps.  (Here’s some additional elastic search documentation: here and here.)

If you don’t care about your old data, just:

  • shut down logstash
  • deploy the new logstash filter (with mutates)
  • close all old indices
  • turn on logstash
  • send some data through to logstash
  • refresh fields in kibana–you’ll lose popularity

Now, if you do care about your old data, well, that’s a different story. Here are the steps I took:

First, modify the new logstash filter file, using mutate and deploy it. This takes care of the logstash indexes going forward, but will cause some kibana pain until you convert all the past indexes (because some indexes will have fields as strings and others as numbers).

Install jq: https://stedolan.github.io/jq/manual/v1.4/ which will help you transform your data (jq is magic, I tell you).

Then, for each day/index you care about (logstash-2015.09.22in this example ), you want to follow these steps.


# get the current mapping
curl -XGET 'http://localhost:9200/logstash-2015.09.22/_mapping?pretty=1' > mapping

#back it up
cp mapping mapping.old

# edit mapping, change the types of the fields that are strings to long, float, or boolean.  I used vi

# create a new index with the new mapping 
curl -XPUT 'http://localhost:9200/logstash-2015.09.22-new/' -d @mapping

# find out how many rows there are.  If there are too many, you may want to use the scrolled search.  
# I handled indexes as big as 500k documents with the below approach
curl -XGET 'localhost:9200/logstash-2015.09.22/_count'

# if you are modifying an old index, no need to stop logstash, but if you are modifying an index with data currently going to it, you need to stop logstash at this step.

# change size below to be bigger than the count.
curl -XGET 'localhost:9200/logstash-2015.09.22/_search?size=250000'> logstash-2015.09.22.data.orig

# edit data, just get the array of docs without the metadata
sed 's/^[^[]*\[/[/' logstash-2015.09.22.data.orig |sed 's/..$//' > logstash-2015.09.22.data

# run jq to build a bulk insert compatible json file ( https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html )

# make sure to correct the _index value. in the line below
jq -f jq.file logstash-2015.09.22.data |jq -c '\
{ index: { _index: "logstash-2015.09.22-new", _type: "logs" } },\
.' > toinsert

# where jq.file is the file below

# post the toinsert file to the new index
curl -s -XPOST localhost:9200/_bulk --data-binary "@toinsert"; echo

# NOTE: depending on the size of the toinsert file, you may need to split it up into multiple files using head and tail.  
# Make sure you don't split the metadata and data line (that is, each file should have an even number of lines), 
# and that files are all less than 1GB in size.

# delete the old index
curl -XDELETE 'http://localhost:9200/logstash-2015.09.22'

# add a new alias with the old index's name and pointing to the new index
curl -XPOST localhost:9200/_aliases -d '
{
   "actions": [
       { "add": {
           "alias": "logstash-2015.09.22",
           "index": "logstash-2015.09.22-new"
       }}
   ]
}'

# restart logstash if you stopped it above.
sudo service logstash restart

# refresh fields in kibana--you'll lose popularity

Here’s the jq file which converts specified string fields to numeric and boolean fields.


#
# this is run with the jq tool for parsing and modifying json

# from https://github.com/stedolan/jq/issues/670
def translate_key(from;to):
  if type == "object" then . as $in
     | reduce keys[] as $key
         ( {};
       . + { (if $key == from then to else $key end)
             : $in[$key] | translate_key(from;to) } )
  elif type == "array" then map( translate_key(from;to) )
  else .
  end;

def turn_to_number(from):
  if type == "object" then . as $in
     | reduce keys[] as $key
         ( {};
       . + { ($key )
             : ( if $key == from then ($in[$key] | tonumber) else $in[$key] end ) } )
  else .
  end;

def turn_to_boolean(from):
  if type == "object" then . as $in
     | reduce keys[] as $key
         ( {};
       . + { ($key )
             : ( if $key == from then (if $in[$key] == "true" then true else false end ) else $in[$key] end ) } )
  else .
  end;

# for example, this converts any values with this field to numbers, and outputs the rest of the object unchanged
# run with jq -c -f  
.[]|._source| turn_to_number("numfield")

Rinse, wash, repeat.

Kibana Visualizations that Change With Browser Reload

I ran into a weird problem with Kibana recently.  We are using the ELK stack to ingest some logs and do some analysis, and when the Kibana webapp was reloaded, it showed different results for certain visualizations, especially averages.  Not all of them, and the results were always close to the actual value, but when you see 4.6 one time and 4.35 two seconds later on a system under light load and for the exact same metric, it doesn’t inspire confidence in your analytics system.

I dove into the issue.  By using Chrome Webtools, I noticed that the visualizations that were most squirrely were loaded last.  That made me suspicious that there was some failure causing missing data, which caused the average to change. However, the browser API calls weren’t failing, they were succeeding.

I first looked in the Elastic and Kibana configuration files to see if there was any easy timeout configuration values that I was missing.  But I didn’t see any.

I then tried to narrow down the issue.  When it was originally noted, we had about 15 visualizations working on about a months worth of data.  After a fair bit of URL manipulation, I determined that the discrepancies appeared regularly when there were about 10 visualizations, or when I cut the data down to four hours worth.  This gave me more confidence in my theory that some kind of timeout or other resource constraint was the issue. But where was the issue?

I then looked in the ElasticSearch logs.  We have a mapping issue, related to a scripted field and outlined here, which caused a lot of white noise, but I did end up seeing an exception:


org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@3c26b1f5
        at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
        at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:79)
        at org.elasticsearch.search.action.SearchServiceTransportAction.execute(SearchServiceTransportAction.java:551)
        at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:228)
        at org.elasticsearch.action.search.type.TransportSearchCountAction$AsyncAction.sendExecuteFirstPhase(TransportSearchCountAction.java:71)
        at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:176)
        at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:158)
        at org.elasticsearch.action.search.type.TransportSearchCountAction.doExecute(TransportSearchCountAction.java:55)
        at org.elasticsearch.action.search.type.TransportSearchCountAction.doExecute(TransportSearchCountAction.java:45)
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
        at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108)
        at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
        at org.elasticsearch.action.search.TransportMultiSearchAction.doExecute(TransportMultiSearchAction.java:62)
        at org.elasticsearch.action.search.TransportMultiSearchAction.doExecute(TransportMultiSearchAction.java:39)
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
        at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:98)
        at org.elasticsearch.client.FilterClient.execute(FilterClient.java:66)
        at org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopyClient.execute(BaseRestHandler.java:92)
        at org.elasticsearch.client.support.AbstractClient.multiSearch(AbstractClient.java:364)
        at org.elasticsearch.rest.action.search.RestMultiSearchAction.handleRequest(RestMultiSearchAction.java:66)
        at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:53)
        at org.elasticsearch.rest.RestController.executeHandler(RestController.java:225)
        at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:170)
        at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
        at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
        at org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:329)
        at org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:63)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler.messageReceived(HttpPipeliningHandler.java:60)
        at org.elasticsearch.common.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:108)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
        at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
        at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

Which led me to this StackOverflow post.  Which led me to run this command on my ES instance:


$ curl -XGET localhost:9200/_cat/thread_pool?v
host            ip           bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected
ip-10-253-44-49 10.253.44.49           0          0             0            0           0              0             0            0               0
ip-10-253-44-49 10.253.44.49           0          0             0            0           0              0             0            0           31589

And as I ran that command repeatedly, I saw the search.rejected number getting larger and larger. Clearly I had a misconfiguration/limit around my search thread pool. After looking at the CPU and memory and i/o on the box, I could tell it wasn’t stressed, so I decided to increase the queue size for this pool. (I thought briefly about modifying the search thread pool size, but this article warned me off.)

This GH issue helped me understand how to modify the threadpool briefly so I could test the theory.

After making this configuration change, search.rejected went to zero, and the visualization aberrations disappeared. I will modify the elasticsearch.yaml file to make this persist across server restarts and re-provisions, but for now, the issue seems to be addressed.

The Deployment Age

If you haven’t read The Deployment Age (and its follow on post), you should go read it right now.

The premise is that we’re entering a technology super cycle with the Internet and PC where the technology will become far more integrated and invisible and the chief means of financing will be internal company resources.  The focus will be on existing markets, not creating new ones, and refinement rather than innovation.

If you work in technology and are interested in the big picture, it is worth a read:

Some things we’ve learned over the past 30 years–that novelty is more important than quality; that if you’re not disrupting yourself someone else will disrupt you; that entering new markets is more important than expanding existing markets; that technology has to be evangelized, not asked for by your customers–may no longer be true. Almost every company will continue to be managed as if these things were true, probably right up until they manage themselves out of business. There’s an old saying that generals are always fighting the last war, it’s not just generals, it’s everyone’s natural inclination.

Go read it: The Deployment Age.

Launching

Yesterday, I launched a partial rewrite of a long running side project connecting people to farm shares.  Over the past few months I’d fixed a few pressing bugs and overhauled the software so that a site and data model that previously supported only cities and zips as geographic features now supported states as well.

For the past week I’d been hemming and hawing about when to release–fixing “just one more thing” or tweaking one last bit.

But yesterday I finally bit the bullet and released.  Of course, there were a couple of issues that I hadn’t addressed that only showed up in production.  But after squashing those bugs, I completed a few more tasks: moving over SEO (as best as I can) and social accounts, making sure that key features and content hadn’t been thrown overboard, and checking the logs to make sure that users are finding what they need.

Why was I so reticent to release?  It’s a big step, shipping something, but in some ways it’s the only step that matters.  I was afraid that I’d forget something, screw things up, make a mistake.  And of course, I didn’t want to take a site that helped a couple thousand people a month find local food options that work for them and ruin it.  I wasn’t sure I’d have time to support the new site.  I wanted the new site to have as close to feature parity as possible.  I worked on the old site for five years, and a lot of features crept in (as well as external dependencies, corners of the site that google and users have found, but I had forgotten).

All good reasons to think before I leapt.

But you can only plan for so much.  At some point you just have to ship.

Heroku drains

drain photoSo, I’ve learned a lot more than I wanted to about heroku drains. These are sinks to which heroku applications can write.  After the logs are out of heroku, you analyze these logs just as you would in any other application living outside of a PaaS.  Logs are very useful to see long term trends, debug, etc.  (I’ve worked both on a rails3 app and a java spring/camel app that are deploying to heroku.)

Here are some things I’ve learned:

  • Heroku drains are well documented.
  • You want definitely want them for any production application, because only 1500 lines of heroku logs are retained at any one time.
  • They can go to either syslog (great for applications with a lot of other infrastructure) or https (great for applications without as much infrastructure support).
  • They can’t do any kind of authorization.
  • You can’t know what ip address the logs are coming from, so you can’t limit access by IP.
  • There are third party extensions you can pay for to avoid dealing with drains at all (I’ve heard good things about papertrail.)
  • You can use logstash to pull heroku logs from a syslog drain into elastic search.
  • There are numerous github projects that can drain to databases, etc.  There’s even one that, with echos of Ouroboros, drains to another heroku app.
  • Drains have intelligent behavior if your listener (or listeners) fails.  From heroku support: “The short answer is yes, the drain will drop logs when the sink is not responsive, but this isn’t really the full story. There are a number of undocumented limits and backoff retries that happen when a drain connection is lost.”  And then they go on to explain how the backoff behaviour happens.  I’m not going to cut and paste their entire answer because I assume it is undocumented for a reason (maybe it changes, maybe they don’t want to commit to supporting this behavior).  Ask them yourself 🙂
  • A simple drain can be as easy as <?php error_log(file_get_contents('php://input'), 3, "/var/log/logfile.log"); ?>, but make sure you rotate that log file.
  • You can use puppet to manage drains if you are bringing servers up and down, using the heroku toolbelt and CLI authentication.

If you are deploying anything beyond a toy app on heroku, don’t forget the ops folks and make sure you set up your drain!