DuckDB
DuckDB is a fast in-process analytical database. This means it's highly optimised to deal with queries (rather than updates) and can work incredibly fast on very large datasets.
As more data is rendered in a Lume site, the build can slow down significantly. It may be beneficial under these circumstances to use DuckDB to access data. This is particularly true if the data in question is very large or if it is used to generate very large numbers of pages.
The lume-duck module provides some utilities to integrate DuckDB into Lume.
You can import the library by adding the following line to your _config.ts
file.
import { duckDbLoader, resultTable } from "jsr:@dringtech/lume-duck";
The duckDbLoader
dataLoader
The module includes a loader which converts .sql
files in the site _data/
directories
into callable functions which execute the included SQL.
These are prepared statements which can be called with parameters to customise the calls.
Some examples are given below.
To use the loader, add the following code to _config.ts
.
site.loadData([".sql"], duckDbLoader());
You will also need to make sure that the DuckDB C/C++ library for your system has been installed. You can find out how to do this on the DuckDB installation documentation page. The library also includes a provisioner, which can be used as follows:
deno run -A jsr:@dringtech/lume-duck/provisioner
The resultTable
filter
The filter presents query results as a text-formatted response table.
To use the loader, add the following code to _config.ts
.
site.filter("resultTable", resultTable);
This can be used as follows:
{{ query() |> resultTable }}
Examples of this are shown below.
Examples
Example: simple
The page data item simple
was instantiated from simple.sql
.
It contains the following query:
SELECT 1 AS number;
The query call
simple()
,
returns the following result:
+--------+ | number | +--------+ | 1 | +--------+
Example: params.incrementing
The page data item params.incrementing
was instantiated from params/incrementing.sql
.
It contains the following query:
SELECT ?::INTEGER AS number, ?::STRING AS string;
The query call
params.incrementing(1, Test)
,
returns the following result:
+--------+--------+ | number | string | +--------+--------+ | 1 | Test | +--------+--------+
Example: params.positional
The page data item params.positional
was instantiated from params/positional.sql
.
It contains the following query:
SELECT $2::STRING as second, $1::INTEGER as first;
The query call
params.positional(1, Test)
,
returns the following result:
+--------+-------+ | second | first | +--------+-------+ | Test | 1 | +--------+-------+
Example: local.csv
The page data item local.csv
was instantiated from local/csv.sql
.
It contains the following query:
SELECT name, id
FROM 'data/sample.csv'
ORDER BY name DESC;
The query call
local.csv()
,
returns the following result:
+-------+----+ | name | id | +-------+----+ | Two | 2 | | Three | 3 | | One | 1 | | Four | 4 | +-------+----+
Example: local.parquet
The page data item local.parquet
was instantiated from local/parquet.sql
.
It contains the following query:
SELECT pq.*
FROM 'data/sample.parquet'
AS pq;
The query call
local.parquet()
,
returns the following result:
+----+-------+ | id | name | +----+-------+ | 1 | One | | 2 | Two | | 3 | Three | | 4 | Four | +----+-------+
Example: filtered.parquet
The page data item filtered.parquet
was instantiated from filtered/parquet.sql
.
It contains the following query:
SELECT name
FROM 'data/sample.parquet'
WHERE id == ?;
The query call
filtered.parquet(2)
,
returns the following result:
+------+ | name | +------+ | Two | +------+
Example: filtered.auto
The page data item filtered.auto
was instantiated from filtered/auto.sql
.
It contains the following query:
SELECT id, name
FROM 'data/sample.csv'
WHERE id <= ?;
The query call
filtered.auto(2n)
,
returns the following result:
+----+------+ | id | name | +----+------+ | 1 | One | | 2 | Two | +----+------+
Example: filtered.cast
The page data item filtered.cast
was instantiated from filtered/cast.sql
.
It contains the following query:
SELECT id, name
FROM read_csv('data/sample.csv',
columns = {
'id': 'INTEGER',
'name': 'VARCHAR'
}
)
WHERE id <= ?;
The query call
filtered.cast(2)
,
returns the following result:
+----+------+ | id | name | +----+------+ | 1 | One | | 2 | Two | +----+------+
Example: remote.describe
The page data item remote.describe
was instantiated from remote/describe.sql
.
It contains the following query:
SELECT column_name, column_type
FROM (
DESCRIBE SELECT *
FROM read_parquet(
'https://cdn.statically.io/gh/economic-analytics/edd/main/data/parquet/LMS.parquet'
)
);
The query call
remote.describe()
,
returns the following result:
+------------------+-------------+ | column_name | column_type | +------------------+-------------+ | dataset | VARCHAR | | dates.date | DATE | | dates.freq | VARCHAR | | variable.code | VARCHAR | | value | DOUBLE | | variable.name | VARCHAR | | variable.preunit | VARCHAR | | variable.unit | VARCHAR | +------------------+-------------+
Example: remote.reshape
The PIVOT statement is not yet supported as a prepared statement by the DuckDB library that we're using.
This is currently a workaround defined in remote.reshape.ts
.
The query call
remote.reshape()
,
returns the following result:
+---------------+------+------+------+ | period | ZXDI | ZXDM | ZXDQ | +---------------+------+------+------+ | 1577836800000 | 5.1 | 13.8 | 5.7 | | 1580515200000 | 5 | 13.6 | 5.7 | | 1583020800000 | 5 | 14.4 | 6.5 | | 1585699200000 | 5.1 | 15.1 | 7.4 | | 1588291200000 | 5 | 15.3 | 7.6 | | 1590969600000 | 5 | 16 | 7.6 | | 1593561600000 | 5 | 16.5 | 7.2 | | 1596240000000 | 5.1 | 16.7 | 6.8 | | 1598918400000 | 5.2 | 16.5 | 6.7 | | 1601510400000 | 5.1 | 16.2 | 6.5 | +---------------+------+------+------+
Hints and tips
When loading CSV files, it pays to use the
read_csv function
and explicitly set types for columns.
The type inference tends to escalate: so integers are rendered as BIGINT
(64-bit)
rather than INTEGER
(32-bit) values.
The major issue with this is that any parameters will need to be provided as JavaScript
BigInt
numbers (e.g. 12n
rather than 12
).
You can, also use SQL typecasting to deal with this, although it's likely to be less memory-efficient.
The location of data files loaded via SQL is relative to the project root
(or at least, the location of the _config.ts
file).
These should be kept separate from the Lume data files, to prevent
Lume loading them into memory and defeating the object of using DuckDB in the first place!