Manipulating tabular data with SQLite

stephan beal (stephan@s11n.net), 7 March 2006

========================================================================

System administrators and webmasters often have to work with "tables" of data, where we mean "table" in the classical Unix sense - tabular data files. Some of the more glaring examples of tables include /etc/passwd, /etc/fstab, and /etc/hosts. Such data sets are normally relatively small and are well suited for use with a myriad of Unix tools like sed, cut, grep, awk, perl, etc. Aside from the classical tools, however, if you're a fan of tablular data, or simply can't avoid using it, SQLite may offer a powerful addition to your toolbox.

As an enlightenting, though admittedly not terribly useful, example, let's create a full-fledged database from our /etc/passwd:

me@host:~> sqlite3 mypasswd
sqlite> create table pw (name,pass,uid,gid,gecko,home,shell);
sqlite> .separator :
sqlite> .import /etc/passwd pw
sqlite> select * from pw where name like '%oo%';
root|x|0|0|root|/root|/bin/bash

When we exit sqlite3 (with Ctrl-D) our database will be saved to the file mypasswd, which we can later load by simply calling 'sqlite3 mypasswd'. If we provide no filename on the command line then all work is done directly in memory, requiring no intermediary database file. SQLite is marketed as a "zero-administration database", and indeed it is. We don't have to set up any users, network ports, nor anything else normally associated with setting up a database.

As is typical for Unix tools, sqlite3 reads input from standard input and writes to standard output, so it can be used in command chains, like so:


echo "delete from pw where gid>200" | sqlite3 mypasswd

The passwd example is admittedly lame, but it makes a good starting example because that file is particularly well suited for import by sqlite3, not requiring any special data massaging by the user. Most systems don't have more than a few users, so most admins won't get much use out of a mypasswd database. They might, however, find it interesting to know that sqlite3 really shines when working with time/date data. Let's look at an example...

We have an arbitrary log file in which the input lines look something like:


[2006-02-18 11:21:30] Event text...

Now sqlite3 starts to look a lot more interesting. While the log is not in a format we can directly import into sqlite3, it is a good starting point. From there, a one-line perl script can strip out the extra brackets for us:

perl -ne \
     'next unless m|^\[(.+)\]\s+(.*)|; print $1."|0|".$2,"\n";' \
     < mylog > massaged

The astute reader may notice that the print statement uses the pipe character as a separator. We do this because that is sqlite3's default separator, though one can change the separator with with aptly-named -separator command-line argument or the built-in '.separator' command. You may also notice that we inserted a field with the value 0 in the data. We will see in a moment what that is for.

Now that our data is massaged, we simply need a couple of lines of SQL to get it into a database. First, we create a table to hold the data:


sqlite3 mydb 'create table log(td,ts,msg)'

Those familiar with SQL will immediately notice the lack of data types in the CREATE TABLE statement. For SQL portability, SQLite does indeed support the inclusion of type information in CREATE TABLE statements, but they are internally ignored so we don't need to specify them. Once we have our table, inserting data into it is trivial:


sqlite3 mydb '.import massaged log'

That reads the file named 'massaged' into the 'log' table. sqlite3 is pedantic about imported data's structure: importing will fail if an import record's field count is not the same as the number of fields in our log table (and it has no "force" option to continue loading on error, like MySQL does). This often makes it necessary to manually edit/normalize some forms of data before importing them.

We now have our log in a database. Using SQL we can search it, edit it, and even transform it into other representations.

Now let's update that extra 0 field we inserted earlier. We named the field 'ts', which is to mean "timestamp" in the Unix sense of the word. sqlite3 comes with a flexible date/time formatting function, strftime(), which we can use to convert our 'td' fields to Unix timestamp values:

sqlite3 mydb 'update log set ts=strftime("%s",td)'
sqlite3 mydb 'select * from log where msg like "%erro%"'
2006-02-17 21:33:06|1140211986|An error...
2006-02-17 23:04:21|1140217461|Another error...
2006-02-17 23:25:42|1140218742|Something erroneous happened here.

Using the timestamp field, we can overstep the bounds of the conventional table-manipulation tools and perform tricks like selecting a range of events based on their timestamp:

select * from log where
       ts > strftime("%s","2006-02-17 21:33:06") and
       ts < strftime("%s","2006-02-17 23:33:06")

Or a bit simpler:

select * from log where
       td > datetime("2006-02-17 21:33:06") and
       td < datetime("2006-02-17 23:33:06")

Or even simpler:

select * from log where
       td > "2006-02-17 21:33:06" and
       td < "2006-02-18 03:33:06"

And simpler still:

select * from log where
       td between '2006-02-16 00:00:00' and
       '2006-02-18 00:00:00';

Anyone familiar with databases might find it desirable to add a unique ID field to each record. In fact, there is no need to do so because in all SQLite tables we automatically get a field named ROWID. It does not normally show up in SELECT results, but we can force it to become visible by explicitely requesting it:


select rowid,* from logs;

While we aren't likely to want to update records in a log database, for the sake of example let's suppose we want to automatically update our ts field when the td field is updated. As fate would have it, SQLite has support for triggers, so:

me@host:~> sqlite3 mydb
sqlite> CREATE TRIGGER t_update_ts AFTER UPDATE OF td ON log
   ...> FOR EACH ROW BEGIN
   ...> UPDATE log SET ts=strftime('%s',NEW.td) WHERE td=NEW.td;
   ...> END;

In this particular case we must take extra care to avoid unduly updating records other than the one the trigger applies to. Ideally our UPDATE statement would look like:


	... WHERE NEW.rowid=OLD.rowid;

Unfortunately, an apparent SQLite bug prohibits us from referencing the rowid field from triggers unless we explicitely define that field when creating our table. To work around this, we limit our UPDATE to records with that same td value. The pseudorecord OLD provides the values of the record before a trigger is fired and the pseudorecord NEW provides the values the record will contain after the trigger is complete.

Let's see if our trigger works:

sqlite> update log set td='1970-01-01 00:00:14' where rowid=1;
sqlite> select td,ts from log where rowid=1;
1970-01-01 00:00:14|14

Since we know that midnight, the 1st of January, 1970 is the start of the Unix epoch, we can surmise that 14 seconds past midnight equates to a ts value of 14, so the trigger appears to be in order.

Aside from triggers, SQLite also has several other aspects which make it an enticing candidate for the utility box:

It supports subselects.
It supports in-memory databases and "attaching" of multiple databases into one virtual database.
It is Public Domain software, completely unhindered by licenses.
Its binary format is endian-neutral, so sqlite3 databases can be copied freely between little- and big-endian platforms.
It ships with bindings for C and TCL. This means you can use or even extend SQLite from your C/C++ or TCL code. Embedding SQLite support into C/C++ applications is straightforward, and the SQLite project maintains fairly good documentation to help you along your way. With relatively little work SQLite can be bound with other programming languages. For example, the SpiderApe project provides a JavaScript binding for SQLite.

And here's one final trick which seems strange but useful: SQLite's in-memory support works with the ATTACH DATABASE SQL statement, so we can create multiple independent in-memory databases:

me@host:~/> sqlite3 MyAppLog.db
SQLite version 3.3.7
Enter ".help" for instructions
sqlite> attach database ':memory:' as mydb;
sqlite> .schema
CREATE TABLE log(ts,msg TEXT);
sqlite> create table mydb.log as select * from log;
sqlite> select * from mydb.log order by ts desc limit 5;
2007-01-18 17:08:24|mdb.LoadElements() read count: 0
2007-01-18 17:08:03|Start read-back...
2007-01-18 17:07:58|Writing complete. Wrote 21 records.
2007-01-18 17:07:56|Starting writing of 21 records...
2007-01-18 17:07:53|Delete all old entries...
sqlite> attach database ':memory:' as twodb;
sqlite> select * from twodb.log order by ts desc limit 5;
SQL error: no such table: twodb.log

With that last statement i wanted only to show that twodb is not merely a reference to mydb, which was attached to the same virtual filename of ":memory:", but to a physically different database.

Maybe in a later article we will investigate how to make use of SQLite from C/C++. Until then, have fun querying /etc/passwd.