§ Tool · Analytical

DuckDB

An in-process analytical SQL database — "SQLite for analytics."

Stars: 38,260
Forks: 3,243
Latest release: v1.5.2 1mo ago
Last push: yesterday
License: MIT
First released: 2019

duckdb.org → duckdb/duckdb → Self-hostable · Free tier · C++

What it is

DuckDB is an in-process analytical database. Like SQLite, it runs as a library linked into your application — no server, no port. Unlike SQLite, it is columnar and vectorized, designed for analytical SQL over large local datasets. The most common pitch is “SQLite for analytics.”

It was created at CWI Amsterdam (the database research lab that also produced MonetDB) by Hannes Mühleisen and Mark Raasveldt, with first release in 2019 and v1.0 in June 2024. DuckDB Labs, the steward, is based in Amsterdam.

Why people use it

No server. pip install duckdb and run analytical SQL against your Pandas dataframes, Parquet files, CSVs, or remote S3 objects.
Reads everything natively. CSV, Parquet, JSON, Arrow, even direct Postgres or MySQL via extensions. No ingestion step for ad-hoc analysis.
Standard SQL. PostgreSQL-compatible-ish dialect. Window functions, CTEs, full analytical SQL.
Fast. Vectorized execution and columnar storage. For local analytical queries on data up to ~hundreds of GB, DuckDB is competitive with much heavier systems.
Embeddable everywhere. Python, R, JavaScript (including in the browser via Wasm), Rust, Go, Java — first-class bindings.
MIT licensed. Genuinely free, no commercial license clauses.

When to use DuckDB

Data analysis / data science where you currently use Pandas + numpy and want analytical SQL.
Local OLAP over Parquet, CSV, or Arrow files.
ETL pipelines where you don’t want to stand up a server.
Small-medium analytics replacing ClickHouse or Snowflake (think: anything that fits on one machine).
Embedded analytical engines inside applications (BI tools, dashboards, notebook engines).
Running SQL over remote S3 data lakes without spinning up a query engine.

When not to use DuckDB

Multi-user concurrent writes. DuckDB is single-process, single-writer.
Always-on services with thousands of QPS. Use a server-based system.
Petabyte-scale. DuckDB is single-node. Beyond your machine’s memory and disk, you need ClickHouse, BigQuery, or Snowflake.
OLTP. Columnar storage makes single-row inserts and updates relatively slow. Use Postgres.

Notable trade-offs

Single-process. No client/server protocol natively. MotherDuck (the commercial offering) adds remote access, but vanilla DuckDB is in-process.
Memory pressure. Large queries can OOM if intermediate state doesn’t fit. DuckDB has spilling-to-disk for many operators, but not all.
Newer than ClickHouse. Fewer “battle-tested at scale” production deployments. The 1.0 release in 2024 stabilized the format and API; older databases may need migration.
Schema-on-read is convenient but hides issues. Reading a CSV with auto-detection is great until your data has subtle type drift across files.

Ecosystem

MotherDuck. Commercial managed DuckDB with a hybrid local/cloud query model. Founded in 2022 by Jordan Tigani (formerly Google BigQuery).
DuckDB-Wasm. DuckDB compiled to WebAssembly. Runs in the browser, used in tools like Evidence, Observable, and Rill.
pgduckdb. DuckDB as a Postgres extension — query Parquet files from Postgres using DuckDB’s engine.
dbt, Dagster, Streamlit, Evidence. All have first-class DuckDB integrations.
Extensions. DuckDB has its own extension system: spatial, full-text search, JSON, ICU, HTTP filesystem (read directly from S3 / GCS / Azure Blob), and more.

§ 01 · Primary Use Cases

2 listed

Analytics

Aggregations and scans over large datasets.
Embedded

In-process database for mobile, desktop, or CLI apps.

§ 02 · Alternatives

See all →

Alternatives to DuckDB.

§ 03 · Often Used With

Companions, not replacements

§ 04 · Head-to-Head

6 pairs

Comparisons featuring DuckDB.

GitHub data refreshed yesterday. Tool description is editorial, written for the directory.

What it is

Why people use it

When to use DuckDB

When not to use DuckDB

Notable trade-offs

Ecosystem

Analytics

Embedded

Alternatives to DuckDB.

ClickHouse

TimescaleDB

SQLite

PostgreSQL

SQLite

Comparisons featuring DuckDB.