Skip to content

Oracle AMIs for EC2

Many years ago, I did an internship with the database group at the company where I was working. I still have the printout on installing Oracle, and I remember it being at least 40 pages. There was a lot of voodoo with user accounts and kernel settings.

While I’ve worked with Oracle since, I haven’t been responsible for installing it; things may have become easier. But now, they definitely are easier. From the Amazon Developer Newsletter:

Oracle has produced four publicly-available Amazon EC2 AMIs with pre-installed and configured software for Enterprise, Standard or Express editions. In a matter of minutes, developers can have a fully configured Oracle Database computing environment running on Amazon EC2 that includes the web-based management tool Enterprise Manager Database Control and the web-based rapid development tool Applications Express (APEX).

[tags]ec2,oracle,the cloud[/tags]

unescaping a string with PL/SQL

I’ve written about PL/SQL before, but I’ve recently started working on a project that uses it heavily. Given the amount of code written for Oracle databases, I’m rather suprised that there’s not a PL/SQL Cookbook, where, like the Perl Cookbook and the Java Cookbook (more cookbooks from O’Reilly are listed here). There is an Oracle Cookbook, but based on a quick scan of Amazon, it’s is focused, as you’d expect, more on the database design than on PL/SQL programming. (Interestingly, there is a Oracle+PHP cookbook, and a PL/SQL sample code page but neither of those is quite what I’m looking for.)

The reason that I’d like a PL/SQL cookbook is that there are large sets of problems that routinely need to be solved in PL/SQL, but the language is so low level (though they just added some regex support in 10g; bravo!) that doing these routine tasks and making sure they’re correctly implemented can be difficult and tedious. This is especially true when it’s a programmer from a different language who’s used to higher levels of abstraction (like, for example, the good folks who author CPAN modules provide)–it’d be well worth my $70 to make sure that I never had to deal with a problem like, say, unescaping a string.

For that’s the problem I recently had to solve. Essentially, we have a string that looks like this: yellow,apple. This string represents two values, which need to be put in different places by splitting them up into ‘yellow’ and ‘apple’. All well and good until the possiblity of embedded commas arises, for it’s possible that the desired end values were ‘yellow,blue’ and ‘apple,banana’. The answer, of course, is to escape the commas on the way in (turning the second input into something like this: yellow:,blue,apple:,banana, and when processing to unescape those special characters (both the comma and the escape character, which in the example is the colon). That’s what these three functions do. They take a string like the above examples and parse it into a table, to be iterated over at your leisure.

/* ------------------- function splitit ------------------*/
FUNCTION splitit(p_str VARCHAR2, p_del VARCHAR2  := ',',p_idx PLS_INTEGER, p_esc VARCHAR2

:= ':')
RETURN INTEGER
IS
l_idx       PLS_INTEGER;
l_chars_before      VARCHAR2(32767);
l_escape_char       VARCHAR2(1) := p_esc;
l_chars_before_count        PLS_INTEGER := 0;
BEGIN
>
LOOP
l_idx := instr(p_str,p_del, p_idx);
IF l_idx > 0 then
WHILE substr(p_str, l_idx-l_chars_before_count-1, 1) = l_escape_char LOOP
l_chars_before_count := l_chars_before_count +1;
END LOOP;

IF mod(l_chars_before_count, 2) = 0 THEN
-- if chars_before_count is even, then we're at a segment boundary
RETURN l_idx;
ELSE
-- if odd, then we're at an escaped delimiter, want to move past
RETURN splitit(p_str, p_del, l_idx+1, p_esc);
END IF;
l_chars_before_count := 0;
ELSE
RETURN l_idx;
EXIT outer;
END IF;
END LOOP;
END splitit;
/* ------------------- function splitit ------------------*/

/* ------------------- function unescape ------------------*/

FUNCTION unescape(p_str VARCHAR2, p_del VARCHAR2 := ',', p_esc VARCHAR2 := ':')
RETURN VARCHAR2
IS
l_str VARCHAR2(32767);
BEGIN
l_str := replace(p_str, p_esc||p_del, p_del);
l_str := replace(l_str, p_esc||p_esc, p_esc);
RETURN l_str;
END unescape;
/* ------------------- function unescape ------------------*/

/* ------------------- function split ------------------*/

FUNCTION split(p_list VARCHAR2, p_del VARCHAR2 := ',')
RETURN split_tbl
IS
l_idx       PLS_INTEGER;
split_idx   PLS_INTEGER     := 0;
l_list      VARCHAR2(32767) := p_list;
l_chars_before      VARCHAR2(32767);
l_escape_char       VARCHAR2(1) := ':';
l_array split_tbl := split_tbl('','','','','','','','','','');
BEGIN
l_list := p_list;
LOOP
split_idx := split_idx + 1;
IF split_idx > 10 then
EXIT;
END IF;

l_idx := splitit(l_list, p_del, 1, l_escape_char);
IF l_idx > 0 then
l_array(split_idx) := unescape(substr(l_list,1,l_idx-1), p_del,

l_escape_char);
l_list := substr(l_list,l_idx+length(p_del));
ELSE
l_array(split_idx) := l_list;
EXIT;
END IF;
END LOOP;
RETURN l_array;
END split;
/* ------------------- function split ------------------*/

/* in the header file, split_tbl is defined */
TYPE split_tbl IS TABLE of varchar2(32767)

Not all of this code is mine–I built on a solution from a colleague. But I hope this saves one other person from the afternoon I just endured. And if you are a PL/SQL expert and care to critique this solution, please feel free.

Exchanging PostgreSQL for Oracle

I have a client who was building some commercial software on top of PostgreSQL. This plans to be a fairly high volume site, 1.8 million views/hour 500 hits a second. Most of the software seemed to work just fine, but they had some issues with Postgres. Specifically, the backup was failing and we couldn’t figure out why. Then, a few days ago, we saw this message:

ERROR: could not access status of transaction 1936028719
DETAIL: could not open file “/usr/local/postgres/data/pg_clog/0836”: No such file or directory

After a bit of searching, I saw two threads suggesting fixes, which ranged from deleting the offending row to recreating the entire database.

I suggested these to my client, and he thought about it for a couple of days and came up with a solution not suggested on these threads: move to Oracle. Oracle, whose licensing and pricing has been famously opaque, now has a pricing list available online, with prices for the Standard Edition One and Enterprise Edition versions of their database, as well as other software they sell. And my client decided that he could stomach paying for Oracle, given:

1. The prices aren’t too bad.
2. The amount of support and knowledgeable folks available for Oracle dwarfs the community of Postgres.
3. He just wants something to work. The value add of his company is in his service, not in the back end database (as long as it runs).

I can’t fault him for his decision. PostgreSQL is full featured, was probably responsible for Oracle becoming more transparent and reasonable in pricing, and I’ve used it in the past, but he’d had enough. It’s the same reason many folks have Macs or Windows when there is linux, which is a free tank that is “… invulnerable, and can drive across rocks and swamps at ninety miles an hour while getting a hundred miles to the gallon!”.

I’ll let you know how the migration goes.

sqlldr

I’ve been writing SQL*Loader scripts to load a fair bit of data into Oracle. I have a set of load tables with minimal constraints on them, into which SQL*Loader pushes the rows. Then I have written some PL/SQL which pulls from the load tables to the real database.

This architecture was chosen because the PL/SQL procedures can be written to allow incremental as well as full data loads. In the incremental case, it’s conceivable there there’d be a different way of pushing data over to the load tables (via ODBC or JMS, for example). In addition, the load tables can be denormalized, and you can put enough intelligence in the PL/SQL to turn your data structures into something at which a DBA won’t cringe.

Anyway, I thought I’d share a few tips, gleaned through the process. I’m definitely no SQL*Loader guru, but here are some useful links: the sqlldr FAQ, full of good information and recently updated, the Oracle Utilities page which does a great job of explaining all the options of SQL*Loader, and this case study which outlines internationalization with sqlldr. All very useful.

Two other tips: If you are loading delimited character data that is longer that 255 characters, you need to specify the length in your control file (for example, declaring it in the control file as char(4000)), or else you’ll get an aggravating error message warning that the data you’re loading is longer than the column in which you’re trying to load it. I spent some time looking very carefully at the load table trying to see what I was missing before I googled and found out that char fields do have default sizes in sqlldr control files.

And the bindsize and rows parameters are related, in terms of the amount of data that sqlldr can push into a table before it commits. You can make rows very very big, but if bindsize is too small (it defaults to 64k, apparently) the commits will happen sooner than they need to. For more explanation and other perforamance tips, see this page.

Overall, I’ve been very happy with how easy it is to load a fair bit of data, quickly (both in terms of load time and in development time) using sqlldr.

PL/SQL

I recently wrote a basic data transformation program using Java and PL/SQL. I hadn’t used PL/SQL (which is an Oracle-specific procedural language for stored procedures) since writing a basic data layer for my first professional project (a Yahoo! like application written in PL/SQL, perl and Story Server–don’t ask). Anyway, revisiting PL/SQL reminded me of some of the things I liked and disliked about that language.

I like:

Invalidation of dependencies. In PL/SQL, if package A (packages are simply arbitrary, hopefully logical, groups of procedures and functions) calls package B, A depends on B. If the signatures of B are recompiled (you can separate the signatures from the implementations) package A simply won’t run until you recompile it. This is something I really wish other languages would pick up, because it at least lets you know when something you depend on has changed out from under you.

I dislike:

The BEGIN and END blocks, which indicate boundaries for loops and if statements, are semantically no different than the { and } which I’ve grown to love in perl and Java. But for some reason, it takes me back to my pascal days and leaves a bit of a bad taste in my mouth.

I’m unsure of:

The idea of putting business logic in a database. Of course, schemas are intimately tied to the business layer (ask anyone trying to move to a different one) and anyone who pretends that switching databases in a java applications is a simple matter of changing a configuration file is smoking crack, but the putting chunks of business logic in the data layer introduces a few problems. Every different language that you use increases the complexity of a project–and to debug problems with the interface between them, you need to have someone who knows both. Also, stored procedures don’t fit very well into any of the object relational mapping tools and pretty much force you to use jdbc.