Skip to content

Google does spreadsheets

Check out spreadsheets.google.com. Limited time look at what javascript can do for a spreadsheet. I took a quick look and it seems to fit large chunks of what I use Excel or calc, the OpenOffice spreadsheet program, for. Just a quick tour of what I such spreadsheet programs for, and what Google spreadsheet supports:

  • cut and paste, of text and formulas
  • control arrow movement and selection
  • formatting of cells
  • merging of cells and alignment of text in cells
  • undo/redo that goes at least 20 deep
  • sum/count
  • can freeze rows
  • share and save the spreadsheet
  • export to csv and xls

On the other hand, no:

  • dragging of cells to increment them (first cell is 45, next is 46, 47…).
  • using the arrows to select what goes into a formula–you can type in the range or use the mouse

Pretty decent for a web based application. And it does have one killer feature–updates are immediatly propagated (I have never tried to do this with a modern version of Excel, so don’t know if that’s standard behaviour). Snappy enough to use, at least on my relatively modern computer. I looked at the js source and it’s 55k of crazy javascript (Update, 6/9: This link is broken.). Wowsa.

I’ve never used wikicalc but it looks more full featured that Google spreadsheets. On the other hand, Google spreadsheets has a working beta version…

This and the acquisition of writely make me wonder if some folks are correct when they doubt that Google will release a software productivity suite. (More here.) Other interesting comments from Paul Kedrosky.

I know more than one person that absolutely depend on gmail for business functionality, which spooks me. And in some ways, I agree with Paul, it appears that Google “…takes a nuclear winter approach wherein it ruins markets by freezing them and then cutting revenues to zero.”

Personally, if I don’t pay for something, I’m always leery of it being taken away. Of course, if I pay, the service can also go away, but at least I have some more leverage with the company–after all, if they take the service away, they lose money.

Bloglines and SQL

I moved from my own personal RSS reader (coded in perl by yours truly) to Bloglines about a year ago. The main reason is that Bloglines did everything my homegrown reader did and was free (in $ and in time to maintain it).

But with over 1 billion articles served as of Jan 2006, I always wondered why Bloglines didn’t do more collaborative filtering. They do have a ‘related feeds’ tab, but it doesn’t seem all that smart (though it does seem to get somewhat better as you have more subscribers). I guess there are a number of possible reasons:

  • It’s easier to find feeds that look like they’d be worth reading (I have 180 feeds that I attempt to keep track of)
  • blogrolls provide much of this kind of filtering at the user level
  • privacy concerns?
  • No demand from users

But this article, one of a series about data management in well known web applications, gives another possible answer: the infrastructure isn’t set up for easy querying. Sayeth Mark Fletcher of bloglines:

As evidenced by our design, traditional database systems were not appropriate (or at least the best fit) for large parts of our system. There’s no trace of SQL anywhere (by definition we never do an ad hoc query, so why take the performance hit of a SQL front-end?), we resort to using external (to the databases at least) caches, and a majority of our data is stored in flat files.

Incidentally, all of the articles in the ‘Database War Stories’ series are worth reading.

Using Grids?

Tim Bray gives a great write up of Grid Infrastructure projects. But he still doesn’t answer Stephen’s question: what is it good for?

I think the question is especially relevant for on demand ‘batch grids’, to use Tim’s terms. A ‘service grid’ has uses that jump to mind immediately; scaling web serving content is one of them. But on demand batch grids (I built an extremely primitive one in college) are good for complicated processes that take a long time. I don’t see a lot of that in my current work–but I’m sure my physics professor would be happy to partake.

Verifying the state of an image download in an javascript event

Well, I was going to write a rant, explaining how as far as I could tell, there was no way to make sure an image was downloaded, or degrade gracefully if it wasn’t–within an event like onclick. But, it all boils down to the fact that there is no Thread.sleep() equivalent in javascript. See this for a fine explication or read on for an overview of what I tried that failed.

The problem is that the only real way to do it in javascript is to use setTimeout (Mozilla docs, IE docs). The problem with setTimeout is that after calling it, your event handling code merrily continues to execute, and that your setTimeout callback will probably not finish before the event code is finished.

The other way I thought of was to loop waiting for a specified number of seconds (like this). Unfortunately, in my tests, the javascript engine in IE6 doesn’t appear to be multithreaded, and while this wait code executes, the image is not being downloaded.

I did not try the modal window approach, or the java applet (which seems a bit like using a sledgehammer to hit a mosquito) outlined here, but I’m not sure that either of those is really production ready (I’m not alone).

Familiarity Breeds Content

With tools, at least.

James Governor raises an interesting question: Is Smalltalk Set for a Renaissance. He discusses some of the new things that are being built on this old language.

However, the most interesting thing to me is his comment, the title of this post, that ‘familiarity breeds content’ for tools. I’ve touched on this 2 years ago, when I wrote why I thought struts would be around for a good while. Incidentally, history has shown me out–currently there are 1958 hits on dice for ‘struts’, compared to 58 for ‘webwork’ and 29 for ‘ruby on rails’. (Past performance is no guarentee of future results…)

Of course, that’s no judgement on the benefits of the tools; badly written cgi scripts are still around too. In fact, part of a developer’s job, I believe, is to at least play around with new tools and options that may make them more productive. The important takeaway is that, just as many users are reluctant to change office suites, even to upgrade, many developers have enough on their plates without learning new tools.

Book Review: What Just Happened

I recently read ‘What Just Happened’, by James Gleick. I’m a big fan of his–I read ‘Chaos’ years ago. This book covered the history of chaos theory; I was engrossed by the fluid writing and deft handling of such a tough subject.

‘What Just Happened’ is not such a book–rather than a coherent look at recent history, this book is a collection of stories spanning that time (from 1992 to 2001). From spam to bugs to online pornography to passwords to email forwards, Mr Gleick covers a number of issues that are still relevant for us today. I will say that the number of forwards I’ve gotten since I left college has fallen dramatically, but the amount of spam has not. The internet still ‘makes it all too easy to fling random illiterate drivel across the planet’.

There are also a number of neat historic references. There is a five page article about Y2K, written in Jan of 1999, where Mr Gleick was already saying that we had nothing to worry about come 1/1/2000. Another suggests ways to ‘make Microsoft for capitalism’, written just around the release of Windows 95. Remember when we thought we could count on the US government to deal with monopolists?

On a personal note, I have to link to Zia Consulting, because one of their principals was mentioned in this book; you could apparently page Bindu Wavell over the Internet in December 1995.

The format of this book makes it a nice bus read. None of the articles are longer than forty pages and many are a good deal shorter. Whether you nod your head in agreement with some of the issues covered that are still present, or are wistfully transported back to the days when you were still interested in checking the status of a Coke machine over the Internet, this book has its moments. If you enjoy pop tech at all, or if you’ve been caught up in the wave the Internet has created over the past 15 years, chances are you’ll enjoy this book.

“What Just Happened” at Amazon.

Jini and JavaSpaces at BJUG

I went to BJUG last week to see a presentation about Jini by someone from GigaSpaces. It was an intensely interesting presentation for a number of reasons. First off, I knew the presenter, Owen Taylor. About 6 years ago, I took a class from him, along with a few other people. The class covered BEA Weblogic and EJBs. I’ve attended (and given a couple) technical presentations in my time including some conferences. I don’t think I’ve ever met someone who was more energetic and practiced at conveying hard concepts than Owen Taylor. Owen! Start blogging!

Another reason it was interesting is that Brian Pontarelli, an old friend, really likes Jini and has told me about some of his experiences. I actually looked into it when Bill DeHora published his entry two classic hardbacks. I downloaded Jini and JavaSpaces (Jini is the framework, JavaSpaces is the tuple space repository.) and started playing with it. The final reason that it was an interesting presentation is that JavaSpaces is something that I have never had a chance to use, and didn’t foresee using in the future. By the end of the presentation, I was convinced that this concept deserved more research, if nothing else.

What follows are my scribbled notes from that meeting, along with a smattering of other comments and thoughts regarding these technologies. More information is here, however no presentation artifacts are available, unfortunately.

The problem with distributed systems is that they move data around a lot. What you really want is for the processing and the data to be at most one step apart. Stored procedures do this, but you can’t change the logic easily.

Jini was originally developed for pervasive computing, but the focus of the presentation was on the enterprise applications that can be built based on that spec. This class of applications has some amazing features, including low latency, extremely high throughput and ‘100%’ uptime capability.

For that reason, many large institutions are looking at replacing or augmenting JEE (nee J2EE) applications with JavaSpace applications. He mentioned that GigaSpaces recruited him with the notion that a laptop could run 3 million events an hour. This kind of blew his mind.

JavaSpaces is the command pattern–code and data are distributed, based on Linda. Orbitz uses the technology and talks about 100% uptime. Anyplace where you are batching, you can now do it in real time. The key is to keep everytihing in memory and use replication for persistance, rather than disk. (Eventually you want to push it to disk, for reporting and auditing purposes, but you can do that asynchronously.) Databases tend to be used as a bus between in memory processes right now, and you can replace that with a JavaSpace.

Jini is composed of discrete objects that can run anywhere; more to the point, they don’t care where they run. It also expects failure, as opposed to many other technologies that simply assume that things will run correctly. Jini is a LAN based technology, though Owen mentioned that there are ways to turn it into a WAN technology and cited several examples. I am not competent to give a general overview of Jini–please check out this tutorial for more information.

One thing that really struck me is that all of the complexity that EJB and other JEE technologies hide (clustering, transaction management, thread management, lifecycle), JavaSpaces revels in. Owen actually mentioned that JavaSpaces brings skills that JEE developers currently rarely need to use, like threading and classloading, back into the toolbox, rather than depending on a vendor. That can be a plus and a minus, right? The whole point of not trusting servlet threading to a business developer is that it allows them to focus on the business logic. The problem with much of JEE is that it hasn’t done a very good job of doing this. Do you remember the ‘deployer’ role?

Jini has only interfaces; the named implementations are shipped around transparently. Ha ha, just like EJB remote calls are transparent. However, one very nice aspect of Jini is that when you register an implementation of an interface, you say how long the implementation is going to be available (the lease length). As a service provider, you can keep track of that lease and re-register yourself when it is near to up. Of course, if the service is no longer available, for whatever reason, it is not provided to clients–there’s no need for the JVM to garbage collect. The clients do need to be a bit smart about things though.

As for licensing, version 2.0 has been released under an apache license, as opposed to the Sun Community Source License, which was the previous license. This should grow the jini.org community significantly.

Configuration of Jini takes place with a java syntax, which can be a bit confusing, since you don’t compile and execute it. The names of the services (reggie, webster) are a bit cutesy. Webster is the web server which serves implementation classes, but shouldn’t be used in a production environment. Use Tomcat.

Spring and JavaSpaces are complementary; work is in progress to integrate them and completion is expected in the next few months. GigaSpaces has scaled implementations (linearly!) to 2000 cpus on 500 machines….

At this point Owen began talking about various architectural patterns that could be used with Jini; he also covered some war stories. However, I didn’t take any notes–you’ll have to see him talk sometime.

Issues include (so my friend says) versioning. Owen mentioned that debugging isn’t a strong suit. And I did some parallel computing for my senior thesis so I know that splitting up problems so they can be parallelized is not always as easy as you’d like. However, the web paradigm is actually rather suited to parallelization, since you do have the request/response model. The problem is, as it so often is, state.

apachebench drops hits when the concurrency switch is used

I’ve used apachebench (or ab), a free load testing tool written in C and often distributed with the Apache Web Server, to load test a few sites. It’s simple to configure and gives tremendous throughput. (I was seeing about 4 million hits an hour over 1 gigabit ethernet. I saw about 10% of that from jmeter on the same machine; however, the tests that jmeter was running were definitely more compex.)

Apachebench is not perfect, though. The downsides are that you can only hit one url at a time (per ab process). And if you’re trying to load test the path through a system (“can we have folks login, do a search, view a product and logout”), you need to map that out in your shell script carefully. Apachebench has no way to create more complicated tests (like jmeter can). Of course, apachebench doesn’t pretend to be a system test tool–it just hits a set of urls as fast as it can, as hard as it can, just like a load tool should.

However, it would be nice to be able to compare hits recieved on the server side and the log file generated by apachebench; the numbers should reconcile, perhaps with some fudge factor for network errors. I have found that these numbers reconcile as long as you only have one client (-c 1, or the default). Once you start adding clients, the server records more hits than apachebench. This seems to be deterministic (that is, repeatable), and worked out to around 4500 extra requests for 80 million requests. As the number of clients approached 1, the discrepancy between the server and apachebench numbers decreased as well.

This offset happened with Tomcat 5 and Apache 2, so I don’t think that the issues is with the server–I think apachebench is at fault. I searched the httpd bug database but wasn’t able to find anything related. Just be aware that apachebench is very useful for generating large HTTP request loads, but if you need to reconcile for accuracy, skip the concurrency offered.