Skip to content

Large varchar columns can lead to huge ESRI exports when using ogr2ogr

I was recently using ogr2ogr to convert, on the fly, some data in a PostGIS to other standard formats (ESRI and MapInfo). The ESRI export in particular had some problems–it took about 4 minutes for the export of an table with 11K rows and 37 columns, and it generated a 700M dbf file. This file was then zipped (with the other config files), and in around 6 minutes was compressed to a 7M zip file, that was sent to the browser. Now, you can imagine how thrilled a user would be to wait 10 minutes for an export. Apache was timing out (the default timeout is 5 min) and I was at a loss as to how to address the performance issue.

I mentioned this to a colleague who has significantly more experience with GIS tools, and he pointed out that in the source table there were several varchar(4000) fields. Now, in PostgreSQL, [i]f the string to be stored is shorter than the declared length … values of type character varying [varchar] will simply store the shorter string. But the ESRI export does not do that–each varchar(4000) field was padded to a length of 4000, even though none of the fields approached that length.

The solution? A few simple select max(length(colname)) from table and alter table statements, and the varchar(4000)columns were decreased in size. The dbf file decreased to a 50M file, uncompressed, and the entire zip file decreased to 5M. As you can guess, the download time was slashed.

Update 2/16: The kind members of the GDAL mailing list pointed me to a document listing all the limitations of the ESRI driver for ogr2ogr.  Check out the “Creation Issues Section”.

[tags]PostGIS, ESRI, ogr2ogr[/tags]