How to maintain motivation when blogging

clock photoAnother year slipped by! They seem to come faster and faster, just as promised by all the old men in the comic strips I read when growing up.

I recently had a couple of conversations about blogging: how to start, why to do it, how to maintain it. I thought I’d capture some of my responses.

After over twelve years of blogging (that’s correct, in 2016 my blog is a teenager!), here are the three reasons that I keep at it.

  • Writing crystallizes the mind. Writing a piece, especially a deep technical piece, clarifies my understanding of the problem (it’s similar to writing an email to the world, in some ways). Sometimes it will turn up items I hadn’t considered, or other questions to search on. It’s easy to hold a fuzzy concept in my mind, but when written down, the holes in my knowledge become more evident.
  • Writing builds credibility. I have received a number of business inquiries from my writing. (I suspect there’d be more if my blog were more focused. The excellent “How to start blogging” course from John Sonmez is worth signing up for.  The number one thing to have a successful blog is subject matter focus. But I have a hard time limiting myself to a single topic. Maybe I’m building credibility as a generalist?) And I’ve had a few people interview me for positions and mention they found this blog. It’s easy to say “I know technology XXX” in an interview or consulting situation, but I have found it to be powerful and credible to say “Ah yes, I’ve seen technology XXX before. I wrote a post about it six months ago. Let me send that to you.”
  • Writing helps others. I have had friends mention that they were looking for solutions for something and stumbled across my blog. In fact, I’ve been looking for solutions to issues myself and stumbled onto a post from my blog, so even my future self thanks me for blogging.  I don’t have many comments (real ones, at least. The spam, oh, the spam), but the ones that are left often thank me for helping them out. And I know I have been helped tremendously by posts written by others, so writing pays this help forward.

Of course, these reasons apply to almost all writing–whether magazine, comments on social networks, twitter, medium, answers on stack overflow or something else.  So why continue to write on “Dan Moore!”?  Well, I did try medium recently, and am relatively active on Twitter, HackerNews and StackOverflow, and slightly less active on other social sites like Reddit and Lobste.rs.  All these platforms are great, but my beef with all of them is the same–you are trading control for audience.  As long as I pay my hosting bill and keep my domain registered, my content will be ever-present.  In addition, my blog can weave all over the place as my available time and interests change.

If you blog, I’d love to hear your reasons for doing so.  If you don’t, would love to hear what is keeping you from doing so.


Year in review, aka what did I ship in 2015

What did I ship (or help ship) in 2015?

(I did this a few years ago, and then became an employee.  Though it is probably even more important to think about what you ship as an employee, when I am a contractor it is easier to publicize it.)

  • Rewrote my farm share directory to support multiple states, numerous bugfixes and a new feature to let folks add reviews.
  • Sent seven newsletters for said farm share directory.
  • An email course to educate consumers about farm shares.
  • Helped take a video to structured data project from failure to success.  I was brought in as a senior engineer to a team and helped with porting an admin app from one environment to another, reviewed and fixed python program which took video and generated images, managing datasets for training, writing java microservices around C libraries, documenting processes, and coordinating with an overseas team as needed.
  • Set up secor to pull logging from kafka to s3, as well as setting up java processes to log to kafka.
  • Helped integrate Activiti into a custom workflow engine, and promoted a test first culture on the team I worked with.
  • Dropped in and helped troubleshoot an e-commerce system with which I was totally unfamiliar during an emergency.
  • Learned enough ruby on rails and Stripe to add an online order form to an existing Heroku website.
  • Helped build a backend system to monitor phone and car locations to prevent texting and driving.  My role on this small team varied between devops, java development, QA, code review, defining process and documentation.
  • Installed and tuned an elk stack used for business intelligence and developer debugging.
  • Took my wife’s writing and turned it into a book (best surprise ever).
  • Wrote 34 blog posts.

Of course, there were other personal milestones too, like camping with the kiddos, getting solar installed, road trips, and date nights with the wife.  All in all, a great year.  Here’s to 2016.


When you encounter a PropertyReferenceException using SpringData and MongoDb

If you are using spring data and mongodb, you can use these magic methods called “derived queries”, which make writing simple queries very easy.

However, you may run into a PropertyReferenceException: No property module found for <type> message, with an exception similar to the one below.

This means you have a typo in your derived query (miscapitalized word, misspelled word, etc), so take a long look at that Repository interface.


[mongod output] 10:09:58.421 [main] WARN o.s.c.a.AnnotationConfigApplicationContext [AbstractApplicationContext.java:487] - Exception encountered during context initialization - cancelling refresh attempt 
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'tripDAO': Invocation of init method failed; nested exception is org.springframework.data.mapping.PropertyReferenceException: No property module found for type Trip!
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1574) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:539) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:476) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:303) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:299) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:194) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:736) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:757) ~[spring-context-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:480) ~[spring-context-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.context.annotation.AnnotationConfigApplicationContext.(AnnotationConfigApplicationContext.java:84) [spring-context-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at com.katasi.decision_engine.AbstractMongoDbTest.createApplicationContext(AbstractMongoDbTest.java:49) [test-classes/:na] 
at org.apache.camel.testng.CamelSpringTestSupport.doPreSetup(CamelSpringTestSupport.java:71) [camel-testng-2.15.2.jar:2.15.2]
at com.katasi.decision_engine.processor.CreateTripTest.doPreSetup(CreateTripTest.java:66) [test-classes/:na]
at org.apache.camel.testng.CamelTestSupport.setUp(CamelTestSupport.java:215) [camel-testng-2.15.2.jar:2.15.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_31]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_31]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_31]
at java.lang.reflect.Method.invoke(Method.java:483) ~[na:1.8.0_31] 
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:85) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeConfigurationMethod(Invoker.java:552) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:215) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeMethod(Invoker.java:636) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:882) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1189) [testng-6.8.21.jar:na]
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:124) [testng-6.8.21.jar:na]
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:108) [testng-6.8.21.jar:na]
at org.testng.TestRunner.privateRun(TestRunner.java:767) [testng-6.8.21.jar:na]
at org.testng.TestRunner.run(TestRunner.java:617) [testng-6.8.21.jar:na]
at org.testng.SuiteRunner.runTest(SuiteRunner.java:348) [testng-6.8.21.jar:na]
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:343) [testng-6.8.21.jar:na]
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:305) [testng-6.8.21.jar:na]
at org.testng.SuiteRunner.run(SuiteRunner.java:254) [testng-6.8.21.jar:na]
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) [testng-6.8.21.jar:na]
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) [testng-6.8.21.jar:na]
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1224) [testng-6.8.21.jar:na]
at org.testng.TestNG.runSuitesLocally(TestNG.java:1149) [testng-6.8.21.jar:na]
at org.testng.TestNG.run(TestNG.java:1057) [testng-6.8.21.jar:na]
at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:77) [surefire-testng-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeMulti(TestNGDirectoryTestSuite.java:159) [surefire-testng-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:99) [surefire-testng-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:106) [surefire-testng-2.12.4.jar:2.12.4]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_31]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_31] 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_31]
at java.lang.reflect.Method.invoke(Method.java:483) ~[na:1.8.0_31] 
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) [surefire-api-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) [surefire-booter-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) [surefire-booter-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) [surefire-booter-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) [surefire-booter-2.12.4.jar:2.12.4]
Caused by: org.springframework.data.mapping.PropertyReferenceException: No property module found for type Trip!
at org.springframework.data.mapping.PropertyPath.(PropertyPath.java:75) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mapping.PropertyPath.create(PropertyPath.java:327) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mapping.PropertyPath.create(PropertyPath.java:307) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mapping.PropertyPath.from(PropertyPath.java:270) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mapping.PropertyPath.from(PropertyPath.java:241) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.Part.(Part.java:76) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.PartTree$OrPart.(PartTree.java:235) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.PartTree$Predicate.buildTree(PartTree.java:373) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.PartTree$Predicate.(PartTree.java:353) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.PartTree.(PartTree.java:87) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mongodb.repository.query.PartTreeMongoQuery.(PartTreeMongoQuery.java:53) ~[spring-data-mongodb-1.7.2.RELEASE.jar:na]
at org.springframework.data.mongodb.repository.support.MongoRepositoryFactory$MongoQueryLookupStrategy.resolveQuery(MongoRepositoryFactory.java:128) ~[spring-data-mongodb-1.7.2.RELEASE.jar:na]
at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.(RepositoryFactorySupport.java:367) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.core.support.RepositoryFactorySupport.getRepository(RepositoryFactorySupport.java:190) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.initAndReturn(RepositoryFactoryBeanSupport.java:239) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.afterPropertiesSet(RepositoryFactoryBeanSupport.java:225) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mongodb.repository.support.MongoRepositoryFactoryBean.afterPropertiesSet(MongoRepositoryFactoryBean.java:108) ~[spring-data-mongodb-1.7.2.RELEASE.jar:na]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1633) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1570) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
... 50 common frames omitted

Guide to Reindexing ElasticSearch data input with Logstash

I ran into an issue where I set up logstash to load data that was numeric as a string. Then, later on when we wanted to do visualizations with it, they were off. So, I needed to re-index all the data.

Total pain, hope this guide helps.  (Here’s some additional elastic search documentation: here and here.)

If you don’t care about your old data, just:

  • shut down logstash
  • deploy the new logstash filter (with mutates)
  • close all old indices
  • turn on logstash
  • send some data through to logstash
  • refresh fields in kibana–you’ll lose popularity

Now, if you do care about your old data, well, that’s a different story. Here are the steps I took:

First, modify the new logstash filter file, using mutate and deploy it. This takes care of the logstash indexes going forward, but will cause some kibana pain until you convert all the past indexes (because some indexes will have fields as strings and others as numbers).

Install jq: https://stedolan.github.io/jq/manual/v1.4/ which will help you transform your data (jq is magic, I tell you).

Then, for each day/index you care about (logstash-2015.09.22in this example ), you want to follow these steps.


# get the current mapping
curl -XGET 'http://localhost:9200/logstash-2015.09.22/_mapping?pretty=1' > mapping

#back it up
cp mapping mapping.old

# edit mapping, change the types of the fields that are strings to long, float, or boolean.  I used vi

# create a new index with the new mapping 
curl -XPUT 'http://localhost:9200/logstash-2015.09.22-new/' -d @mapping

# find out how many rows there are.  If there are too many, you may want to use the scrolled search.  
# I handled indexes as big as 500k documents with the below approach
curl -XGET 'localhost:9200/logstash-2015.09.22/_count'

# if you are modifying an old index, no need to stop logstash, but if you are modifying an index with data currently going to it, you need to stop logstash at this step.

# change size below to be bigger than the count.
curl -XGET 'localhost:9200/logstash-2015.09.22/_search?size=250000'> logstash-2015.09.22.data.orig

# edit data, just get the array of docs without the metadata
sed 's/^[^[]*\[/[/' logstash-2015.09.22.data.orig |sed 's/..$//' > logstash-2015.09.22.data

# run jq to build a bulk insert compatible json file ( https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html )

# make sure to correct the _index value. in the line below
jq -f jq.file logstash-2015.09.22.data |jq -c '\
{ index: { _index: "logstash-2015.09.22-new", _type: "logs" } },\
.' > toinsert

# where jq.file is the file below

# post the toinsert file to the new index
curl -s -XPOST localhost:9200/_bulk --data-binary "@toinsert"; echo

# NOTE: depending on the size of the toinsert file, you may need to split it up into multiple files using head and tail.  
# Make sure you don't split the metadata and data line (that is, each file should have an even number of lines), 
# and that files are all less than 1GB in size.

# delete the old index
curl -XDELETE 'http://localhost:9200/logstash-2015.09.22'

# add a new alias with the old index's name and pointing to the new index
curl -XPOST localhost:9200/_aliases -d '
{
   "actions": [
       { "add": {
           "alias": "logstash-2015.09.22",
           "index": "logstash-2015.09.22-new"
       }}
   ]
}'

# restart logstash if you stopped it above.
sudo service logstash restart

# refresh fields in kibana--you'll lose popularity

Here’s the jq file which converts specified string fields to numeric and boolean fields.


#
# this is run with the jq tool for parsing and modifying json

# from https://github.com/stedolan/jq/issues/670
def translate_key(from;to):
  if type == "object" then . as $in
     | reduce keys[] as $key
         ( {};
       . + { (if $key == from then to else $key end)
             : $in[$key] | translate_key(from;to) } )
  elif type == "array" then map( translate_key(from;to) )
  else .
  end;

def turn_to_number(from):
  if type == "object" then . as $in
     | reduce keys[] as $key
         ( {};
       . + { ($key )
             : ( if $key == from then ($in[$key] | tonumber) else $in[$key] end ) } )
  else .
  end;

def turn_to_boolean(from):
  if type == "object" then . as $in
     | reduce keys[] as $key
         ( {};
       . + { ($key )
             : ( if $key == from then (if $in[$key] == "true" then true else false end ) else $in[$key] end ) } )
  else .
  end;

# for example, this converts any values with this field to numbers, and outputs the rest of the object unchanged
# run with jq -c -f  
.[]|._source| turn_to_number("numfield")

Rinse, wash, repeat.


Kibana Visualizations that Change With Browser Reload

I ran into a weird problem with Kibana recently.  We are using the ELK stack to ingest some logs and do some analysis, and when the Kibana webapp was reloaded, it showed different results for certain visualizations, especially averages.  Not all of them, and the results were always close to the actual value, but when you see 4.6 one time and 4.35 two seconds later on a system under light load and for the exact same metric, it doesn’t inspire confidence in your analytics system.

I dove into the issue.  By using Chrome Webtools, I noticed that the visualizations that were most squirrely were loaded last.  That made me suspicious that there was some failure causing missing data, which caused the average to change. However, the browser API calls weren’t failing, they were succeeding.

I first looked in the Elastic and Kibana configuration files to see if there was any easy timeout configuration values that I was missing.  But I didn’t see any.

I then tried to narrow down the issue.  When it was originally noted, we had about 15 visualizations working on about a months worth of data.  After a fair bit of URL manipulation, I determined that the discrepancies appeared regularly when there were about 10 visualizations, or when I cut the data down to four hours worth.  This gave me more confidence in my theory that some kind of timeout or other resource constraint was the issue. But where was the issue?

I then looked in the ElasticSearch logs.  We have a mapping issue, related to a scripted field and outlined here, which caused a lot of white noise, but I did end up seeing an exception:


org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@3c26b1f5
        at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
        at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:79)
        at org.elasticsearch.search.action.SearchServiceTransportAction.execute(SearchServiceTransportAction.java:551)
        at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:228)
        at org.elasticsearch.action.search.type.TransportSearchCountAction$AsyncAction.sendExecuteFirstPhase(TransportSearchCountAction.java:71)
        at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:176)
        at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:158)
        at org.elasticsearch.action.search.type.TransportSearchCountAction.doExecute(TransportSearchCountAction.java:55)
        at org.elasticsearch.action.search.type.TransportSearchCountAction.doExecute(TransportSearchCountAction.java:45)
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
        at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108)
        at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
        at org.elasticsearch.action.search.TransportMultiSearchAction.doExecute(TransportMultiSearchAction.java:62)
        at org.elasticsearch.action.search.TransportMultiSearchAction.doExecute(TransportMultiSearchAction.java:39)
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
        at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:98)
        at org.elasticsearch.client.FilterClient.execute(FilterClient.java:66)
        at org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopyClient.execute(BaseRestHandler.java:92)
        at org.elasticsearch.client.support.AbstractClient.multiSearch(AbstractClient.java:364)
        at org.elasticsearch.rest.action.search.RestMultiSearchAction.handleRequest(RestMultiSearchAction.java:66)
        at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:53)
        at org.elasticsearch.rest.RestController.executeHandler(RestController.java:225)
        at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:170)
        at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
        at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
        at org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:329)
        at org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:63)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler.messageReceived(HttpPipeliningHandler.java:60)
        at org.elasticsearch.common.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:108)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
        at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
        at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

Which led me to this StackOverflow post.  Which led me to run this command on my ES instance:


$ curl -XGET localhost:9200/_cat/thread_pool?v
host            ip           bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected
ip-10-253-44-49 10.253.44.49           0          0             0            0           0              0             0            0               0
ip-10-253-44-49 10.253.44.49           0          0             0            0           0              0             0            0           31589

And as I ran that command repeatedly, I saw the search.rejected number getting larger and larger. Clearly I had a misconfiguration/limit around my search thread pool. After looking at the CPU and memory and i/o on the box, I could tell it wasn’t stressed, so I decided to increase the queue size for this pool. (I thought briefly about modifying the search thread pool size, but this article warned me off.)

This GH issue helped me understand how to modify the threadpool briefly so I could test the theory.

After making this configuration change, search.rejected went to zero, and the visualization aberrations disappeared. I will modify the elasticsearch.yaml file to make this persist across server restarts and re-provisions, but for now, the issue seems to be addressed.


The Deployment Age

If you haven’t read The Deployment Age (and its follow on post), you should go read it right now.

The premise is that we’re entering a technology super cycle with the Internet and PC where the technology will become far more integrated and invisible and the chief means of financing will be internal company resources.  The focus will be on existing markets, not creating new ones, and refinement rather than innovation.

If you work in technology and are interested in the big picture, it is worth a read:

Some things we’ve learned over the past 30 years–that novelty is more important than quality; that if you’re not disrupting yourself someone else will disrupt you; that entering new markets is more important than expanding existing markets; that technology has to be evangelized, not asked for by your customers–may no longer be true. Almost every company will continue to be managed as if these things were true, probably right up until they manage themselves out of business. There’s an old saying that generals are always fighting the last war, it’s not just generals, it’s everyone’s natural inclination.

Go read it: The Deployment Age.


Launching

Yesterday, I launched a partial rewrite of a long running side project connecting people to farm shares.  Over the past few months I’d fixed a few pressing bugs and overhauled the software so that a site and data model that previously supported only cities and zips as geographic features now supported states as well.

For the past week I’d been hemming and hawing about when to release–fixing “just one more thing” or tweaking one last bit.

But yesterday I finally bit the bullet and released.  Of course, there were a couple of issues that I hadn’t addressed that only showed up in production.  But after squashing those bugs, I completed a few more tasks: moving over SEO (as best as I can) and social accounts, making sure that key features and content hadn’t been thrown overboard, and checking the logs to make sure that users are finding what they need.

Why was I so reticent to release?  It’s a big step, shipping something, but in some ways it’s the only step that matters.  I was afraid that I’d forget something, screw things up, make a mistake.  And of course, I didn’t want to take a site that helped a couple thousand people a month find local food options that work for them and ruin it.  I wasn’t sure I’d have time to support the new site.  I wanted the new site to have as close to feature parity as possible.  I worked on the old site for five years, and a lot of features crept in (as well as external dependencies, corners of the site that google and users have found, but I had forgotten).

All good reasons to think before I leapt.

But you can only plan for so much.  At some point you just have to ship.


Heroku drains

drain photoSo, I’ve learned a lot more than I wanted to about heroku drains. These are sinks to which heroku applications can write.  After the logs are out of heroku, you analyze these logs just as you would in any other application living outside of a PaaS.  Logs are very useful to see long term trends, debug, etc.  (I’ve worked both on a rails3 app and a java spring/camel app that are deploying to heroku.)

Here are some things I’ve learned:

  • Heroku drains are well documented.
  • You want definitely want them for any production application, because only 1500 lines of heroku logs are retained at any one time.
  • They can go to either syslog (great for applications with a lot of other infrastructure) or https (great for applications without as much infrastructure support).
  • They can’t do any kind of authorization.
  • You can’t know what ip address the logs are coming from, so you can’t limit access by IP.
  • There are third party extensions you can pay for to avoid dealing with drains at all (I’ve heard good things about papertrail.)
  • You can use logstash to pull heroku logs from a syslog drain into elastic search.
  • There are numerous github projects that can drain to databases, etc.  There’s even one that, with echos of Ouroboros, drains to another heroku app.
  • Drains have intelligent behavior if your listener (or listeners) fails.  From heroku support: “The short answer is yes, the drain will drop logs when the sink is not responsive, but this isn’t really the full story. There are a number of undocumented limits and backoff retries that happen when a drain connection is lost.”  And then they go on to explain how the backoff behaviour happens.  I’m not going to cut and paste their entire answer because I assume it is undocumented for a reason (maybe it changes, maybe they don’t want to commit to supporting this behavior).  Ask them yourself :)
  • A simple drain can be as easy as <?php error_log(file_get_contents('php://input'), 3, "/var/log/logfile.log"); ?>, but make sure you rotate that log file.
  • You can use puppet to manage drains if you are bringing servers up and down, using the heroku toolbelt and CLI authentication.

If you are deploying anything beyond a toy app on heroku, don’t forget the ops folks and make sure you set up your drain!


“Wave a magic wand”

wand photoThat was what a previous boss said when I would ask him about some particularly knotty, unwieldy issue. “What would the end solution look like if you could wave a magic wand and have it happen?”

For instance, when choosing a vendor to revamp the flagship website, don’t think about all the million details that need to be done to ensure this is successful. Don’t think about who has the best process. Certainly don’t think about the technical details of redirects, APIs and integrations. Instead, “wave a magic wand” and envision the end state–what does that look like? Who is using
it? What do they do with it? What do you want to be able to do with the site? What do you want it to look like?

Or if an employee is unhappy in their role, ask them to “wave the magic wand” and talk about what role they’d rather be in. With no constraints you find out what really matters to them (or what they think really matters to them, to be more precise).

When you think about issues through this lens, you focus on the ends, not the means.  It lets you think about the goal and not the obstacles.

Of course, then you have to hunker down, determine if the goal is reachable, and if so, plan how to reach it. I like to think of this as projecting the vector of the ideal solution into the geometric plane of solutions that are possible to you or your organization–the vector may not lie in the plane, but you can get as close as possible.

“Waving a magic wand” elevates your thinking. It is a great way to think about how to solve a problem not using known methods and processes, but rather determining the ideal end goal and working backwards from there to the “hows”.


Masterless puppet and CloudFormation

I’ve had some experience with CloudFormation in the past, and recently gained some puppet expertise.  I thought it’d be great to combine the two, working on a new project to set up the ELK stack for a client.

Basically, we are creating an ec2 instance (or a number of them) from a vanilla image using a CloudFormation template, doing a small amount of initialization via the UserData section and then using puppet to configure them further.  However, puppet is used in a masterless context, where the intelligence (of knowing which machine should be configured which way) isn’t in the manifest file, but rather in the code that checks out the modules and manifests. Here’s a great example of a project set up to use masterless puppet.

Before I dive into more details, other solutions I looked at included:

  • doing all the machine setup in UserData
    • This is a bad idea because it forces you to set up and tear down machines each time you want to make a configuration change.  Leads to a longer development cycle, especially at first.  Plus bash is great for small configurations, but when you have dependencies and other complexities, the scripts can get hairy.
  • pulling a bash script from s3/github in UserData
    • puppet is made for configuration management and handles more complexity than a bash script.  I’ll admit, I used puppet with an eye to the future when we had more machines and more types of machines.  I suppose you could do the same with bash, but puppet handles more of typical CM tasks, including setting up cron jobs, making sure services run, and deriving dependencies between services, files and artifacts.
  • using a different CM tool, like ansible or chef
    • I was familiar with puppet.  I imagine the same solution would work with other CM tools.
  • using a puppet master
    • This presentation convinced me to avoid setting up a puppet master.  Cattle not pets.
  • using cloud-init instead of UserData for initial setup
    • I tried.  I couldn’t figure out cloud-init, even with this great post.  It’s been a few months, so I’m afraid I don’t even remember what the issue was, but I remember this solution not working for me.
  • create an instance/AMI with all software installed
    • puppet allows for more flexibility, is quicker to setup, and allows you to manage your configuration in a VCS rather than a pile of different AMIs.
  • use a container instead of AMIs
    • isn’t docker the answer to everything? I didn’t choose this because I was entirely new to containerization and didn’t want to take the risk.

Since I’ve already outlined how the solution works, let’s dive into details.

Here’s the UserData section of the CloudFormation template:


          "Fn::Base64": {
            "Fn::Join": [
              "",
              [
                "#!/bin/bash \n",
                "exec > /tmp/part-001.log 2>&1 \n",
                "date >> /etc/provisioned.date \n",
                "yum install puppet -y \n",
                "yum install git -y \n",
                "aws --region us-west-2 s3 cp s3://s3bucket/auth-files/id_rsa/root/.ssh/id_rsa && chmod 600 /root/.ssh/id_rsa \n",
                "# connect once to github, so we know the host \n",
                "ssh -T -oStrictHostKeyChecking=no git@github.com \n",
                "git clone git@github.com:client/repo.git \n",
                "puppet apply --modulepath repo/infra/puppet/modules pure-spider/infra/puppet/manifests/",
                { "Ref" : "Environment" },
                "/logstash.pp \n",
                "date >> /etc/provisioned.date\n"
              ]
            ]

So, we are using a bash script, but only for a little bit.  The second line (starting with exec) stores output into a logfile for debugging purposes.  We then store off the date and install puppet and git.  The aws command pulls down a private key stored in s3.  This instance has access to s3 because of an IAM setup elsewhere in the CloudFormation template–the access we have is read-only and the private key has already been added to our github repository.  Then we connect to github via ssh to ‘get to know the host’.  Then we clone the repository containing the infrastructure code.  Finally, we apply the manifest, which is partially determined by a parameter to the CloudFormation template.

This bash script will run on creation of the EC2 instance.  Once this script is solid, if you are testing adding additional puppet modules, you only have to do a git pull and puppet apply to add more functionality to the modules.  (Of course, at the end you should stand up and tear down via CloudFormation just to test end to end.)  You can also see how it’d be easy to have the logstash.conf file be a parameter to the CloudFormation template, which would let you store your configuration for web servers, database servers, etc, in puppet as well.

I’m happy with how flexible this solution is.  CloudFormation manages the machine creation as well as any other resources, puppet manages the software installed in those machines, and git allows you to maintain all that configuration in one place.



© Moore Consulting, 2003-2015 +