§ Tool · Analytical

ClickHouse

An open-source columnar database for real-time analytics on very large datasets.

Stars: 47,462
Forks: 8,403
Latest release: v26.1.12.23-stable 1w ago
Last push: yesterday
License: Apache 2
First released: 2016

clickhouse.com → ClickHouse/ClickHouse → Self-hostable · Free tier · C++

What it is

ClickHouse is an open-source columnar database engineered for high-throughput analytical queries on large datasets. It was built at Yandex (the Russian search engine) starting around 2009 to power their web analytics product Yandex.Metrica, and open sourced under Apache 2 in 2016. ClickHouse Inc., the commercial company behind it, was founded in 2021 in the US.

The pitch: scan billions of rows per second per node, with 5–15× compression, on commodity hardware. The classic use cases are observability, ad tech, clickstream, and any “real-time dashboard over a firehose of events” workload.

Why people use it

Speed. Vectorized execution over compressed columnar data. Single-node ClickHouse on a modern machine routinely scans hundreds of millions of rows per second. With a cluster, billions per second.
Compression. Typical compression ratios of 5–15× on real-world data. This makes storage cheap and pulls more data into the page cache.
Cost. For many analytical workloads, ClickHouse is one to two orders of magnitude cheaper than Snowflake or BigQuery on a per-query basis.
Materialized views. Incremental refresh on insert. Aggregate views stay up to date without batch refresh jobs.
Streaming ingest. Kafka/Kinesis integrations, native streaming inserts, and a strong story for sub-second-fresh analytics.
Wide ecosystem. Connectors for dbt, Airflow, Grafana, Superset, and every popular BI tool.

When to use ClickHouse

Real-time analytics over event streams (clickstream, ad tech, observability, product analytics).
Time-series at scale beyond what TimescaleDB or InfluxDB can handle.
Replacement for Snowflake / BigQuery when query cost is the bottleneck.
Workloads needing streaming ingest plus sub-second analytical queries.
Logs and metrics platforms (Highlight, PostHog, Grafana Cloud all use ClickHouse internally).

When not to use ClickHouse

OLTP workloads. ClickHouse is not designed for single-row updates or point reads. Use Postgres.
Strong-consistency requirements. Replication is async; ZooKeeper / Keeper coordinate, but you don’t get linearizable writes.
Small datasets. If your data fits in a Postgres or DuckDB instance, ClickHouse is operational overhead you don’t need.
High-concurrency point lookups. ClickHouse is optimized for analytical scans, not thousands of concurrent users hitting the same indexed rows.
Apps that need traditional ACID transactions across many rows. Limited.

Notable trade-offs

SQL dialect quirks. ClickHouse SQL is mostly standard but has its own array functions, custom syntax for some operations, and quirks around NULL handling. Tools that assume Postgres or MySQL SQL won’t always work without adjustments.
Joins are weaker than row stores. Distributed joins exist but are expensive. Schemas tend toward denormalization.
No traditional transactions. Atomic batch inserts, but no multi-statement transactions.
Replication is async. Eventually consistent. For zero data loss, careful configuration is required.
Operationally complex at scale. Multi-node ClickHouse with replication, sharding, and Keeper coordination has real learning curve. Smaller deployments are simpler.
Schema evolution. Adding columns is free; modifying or dropping is more expensive than in row stores.

Ecosystem

ClickHouse Cloud. The managed service from ClickHouse Inc. — serverless with separated compute/storage on S3.
Altinity. Commercial support, Altinity.Cloud, and a strong Kubernetes operator.
Tinybird. Real-time data API built on ClickHouse, popular with engineers who want to expose ClickHouse-backed APIs without the ops work.
chDB. An in-process ClickHouse, similar to DuckDB. Ships as a library.
Self-hosting. Docker images and Kubernetes operators (Altinity, official) make self-hosting realistic. Production deployments still need real ops investment.

§ 01 · Primary Use Cases

3 listed

Analytics

Aggregations and scans over large datasets.
Time-series

Metrics, IoT, financial ticks, monitoring data.
Log storage

High-throughput append-mostly event ingestion.

§ 02 · Alternatives

See all →

Alternatives to ClickHouse.

§ 03 · Often Used With

Companions, not replacements

PostgreSQL

The world's most advanced open-source relational database.

21k ★

§ 04 · Head-to-Head

6 pairs

Comparisons featuring ClickHouse.

GitHub data refreshed yesterday. Tool description is editorial, written for the directory.

What it is

Why people use it

When to use ClickHouse

When not to use ClickHouse

Notable trade-offs

Ecosystem

Analytics

Time-series

Log storage

Alternatives to ClickHouse.

DuckDB

TimescaleDB

PostgreSQL

Comparisons featuring ClickHouse.