DuckDB

DuckDB is a fast in-process analytical database. This means it's highly optimised to deal with queries (rather than updates) and can work incredibly fast on very large datasets.

As more data is rendered in a Lume site, the build can slow down significantly. It may be beneficial under these circumstances to use DuckDB to access data. This is particularly true if the data in question is very large or if it is used to generate very large numbers of pages.

The lume-duck module provides some utilities to integrate DuckDB into Lume.

You can import the library by adding the following line to your _config.ts file.

import { duckDbLoader, resultTable } from "jsr:@dringtech/lume-duck";

The `duckDbLoader` dataLoader

The module includes a loader which converts .sql files in the site _data/ directories into callable functions which execute the included SQL. These are prepared statements which can be called with parameters to customise the calls. Some examples are given below.

To use the loader, add the following code to _config.ts.

site.loadData([".sql"], duckDbLoader());

You will also need to make sure that the DuckDB C/C++ library for your system has been installed. You can find out how to do this on the DuckDB installation documentation page. The library also includes a provisioner, which can be used as follows:

deno run -A jsr:@dringtech/lume-duck/provisioner

The `resultTable` filter

The filter presents query results as a text-formatted response table.

To use the loader, add the following code to _config.ts.

site.filter("resultTable", resultTable);

This can be used as follows:

{{ query() |> resultTable }}

Examples of this are shown below.

Examples

Example: `simple`

The page data item simple was instantiated from simple.sql. It contains the following query:

SELECT 1 AS number;

The query call simple(), returns the following result:

+--------+
| number |
+--------+
| 1      |
+--------+

Example: `params.incrementing`

The page data item params.incrementing was instantiated from params/incrementing.sql. It contains the following query:

SELECT ?::INTEGER AS number, ?::STRING AS string;

The query call params.incrementing(1, Test), returns the following result:

+--------+--------+
| number | string |
+--------+--------+
| 1      | Test   |
+--------+--------+

Example: `params.positional`

The page data item params.positional was instantiated from params/positional.sql. It contains the following query:

SELECT $2::STRING as second, $1::INTEGER as first;

The query call params.positional(1, Test), returns the following result:

+--------+-------+
| second | first |
+--------+-------+
| Test   | 1     |
+--------+-------+

Example: `local.csv`

The page data item local.csv was instantiated from local/csv.sql. It contains the following query:

SELECT name, id
FROM 'data/sample.csv'
ORDER BY name DESC;

The query call local.csv(), returns the following result:

+-------+----+
| name  | id |
+-------+----+
| Two   | 2  |
| Three | 3  |
| One   | 1  |
| Four  | 4  |
+-------+----+

Example: `local.parquet`

The page data item local.parquet was instantiated from local/parquet.sql. It contains the following query:

SELECT pq.*
FROM 'data/sample.parquet'
AS pq;

The query call local.parquet(), returns the following result:

+----+-------+
| id | name  |
+----+-------+
| 1  | One   |
| 2  | Two   |
| 3  | Three |
| 4  | Four  |
+----+-------+

Example: `filtered.parquet`

The page data item filtered.parquet was instantiated from filtered/parquet.sql. It contains the following query:

SELECT name
FROM 'data/sample.parquet'
WHERE id == ?;

The query call filtered.parquet(2), returns the following result:

+------+
| name |
+------+
| Two  |
+------+

Example: `filtered.auto`

The page data item filtered.auto was instantiated from filtered/auto.sql. It contains the following query:

SELECT id, name
FROM 'data/sample.csv'
WHERE id <= ?;

The query call filtered.auto(2n), returns the following result:

+----+------+
| id | name |
+----+------+
| 1  | One  |
| 2  | Two  |
+----+------+

Example: `filtered.cast`

The page data item filtered.cast was instantiated from filtered/cast.sql. It contains the following query:

SELECT id, name
FROM read_csv('data/sample.csv',
  columns = {
    'id': 'INTEGER',
    'name': 'VARCHAR'
  }
)
WHERE id <= ?;

The query call filtered.cast(2), returns the following result:

+----+------+
| id | name |
+----+------+
| 1  | One  |
| 2  | Two  |
+----+------+

Example: `remote.describe`

The page data item remote.describe was instantiated from remote/describe.sql. It contains the following query:

SELECT column_name, column_type
FROM (
  DESCRIBE SELECT *
  FROM read_parquet(
    'https://cdn.statically.io/gh/economic-analytics/edd/main/data/parquet/LMS.parquet'
  )
);

The query call remote.describe(), returns the following result:

+------------------+-------------+
| column_name      | column_type |
+------------------+-------------+
| dataset          | VARCHAR     |
| dates.date       | DATE        |
| dates.freq       | VARCHAR     |
| variable.code    | VARCHAR     |
| value            | DOUBLE      |
| variable.name    | VARCHAR     |
| variable.preunit | VARCHAR     |
| variable.unit    | VARCHAR     |
+------------------+-------------+

Example: `remote.reshape`

The PIVOT statement is not yet supported as a prepared statement by the DuckDB library that we're using. This is currently a workaround defined in remote.reshape.ts.

The query call remote.reshape(), returns the following result:

+---------------+------+------+------+
| period        | ZXDI | ZXDM | ZXDQ |
+---------------+------+------+------+
| 1577836800000 | 5.1  | 13.8 | 5.7  |
| 1580515200000 | 5    | 13.6 | 5.7  |
| 1583020800000 | 5    | 14.4 | 6.5  |
| 1585699200000 | 5.1  | 15.1 | 7.4  |
| 1588291200000 | 5    | 15.3 | 7.6  |
| 1590969600000 | 5    | 16   | 7.6  |
| 1593561600000 | 5    | 16.5 | 7.2  |
| 1596240000000 | 5.1  | 16.7 | 6.8  |
| 1598918400000 | 5.2  | 16.5 | 6.7  |
| 1601510400000 | 5.1  | 16.2 | 6.5  |
+---------------+------+------+------+

Hints and tips

When loading CSV files, it pays to use the read_csv function and explicitly set types for columns. The type inference tends to escalate: so integers are rendered as BIGINT (64-bit) rather than INTEGER (32-bit) values. The major issue with this is that any parameters will need to be provided as JavaScript BigInt numbers (e.g. 12n rather than 12). You can, also use SQL typecasting to deal with this, although it's likely to be less memory-efficient.

The location of data files loaded via SQL is relative to the project root (or at least, the location of the _config.ts file). These should be kept separate from the Lume data files, to prevent Lume loading them into memory and defeating the object of using DuckDB in the first place!

DuckDB

The duckDbLoader dataLoader

The resultTable filter

Examples

Example: simple

Example: params.incrementing

Example: params.positional

Example: local.csv

Example: local.parquet

Example: filtered.parquet

Example: filtered.auto

Example: filtered.cast

Example: remote.describe

Example: remote.reshape