ZSTD The Best Compression Algorithm

ZSTD Compression: A Brief Overview

ZSTD is a fast, high-compression algorithm designed for a wide range of use cases. It offers several advantages over older compression methods, making it an attractive choice for many applications.

Advantages of ZSTD:

  • Speed: ZSTD is renowned for its fast decompression and compression speeds, making it suitable for real-time applications where latency is a concern.
  • High Compression Ratio: It achieves excellent compression ratios, reducing file sizes significantly and saving storage space.
  • Versatility: ZSTD can be used effectively for various data types and sizes, from small metadata to large datasets.
  • Low Memory Usage: The algorithm is designed to be memory-efficient, making it suitable for environments with limited resources.
  • Scalability: ZSTD can handle a wide range of compression levels (22), allowing you to balance speed and compression ratio based on your specific needs.

Use Cases for ZSTD:

  • Data Archiving: ZSTD is ideal for long-term data storage, reducing the amount of storage space required.
  • Network Transmission: It can be used to compress data before transmission over networks, improving bandwidth utilization.
  • Database Compression: ZSTD can be employed to compress data within databases i.e. Doris DB, enhancing performance and reducing storage costs.
  • Real-time Applications: Its speed and low latency make it suitable for applications like real-time data analysis.
  • Cloud Storage: ZSTD can be used to compress data before storing it in cloud storage services, reducing storage costs.

Compression Comparison Benchmarks: zstd vs brotli vs pigz vs bzip2 vs xz

In summary, ZSTD’s speed, compression ratio, versatility, and efficiency make it a valuable tool for a wide range of applications. Its ability to balance performance and compression makes it a popular choice for modern data management and transmission tasks.

Version 1.1

Since Doris DB version 1.1, all queries are executed by the vectorized execution engine by default, and the performance is 3-5 times higher than the previous version. On the basis of the original LZ4 compression, the ZSTD compression algorithm was added, further improving the data compression rate while also fixing many performance issues and greatly improving system stability.

The default compression LZ4 is marginally faster than ZSTD, but less flexible. If you want to switch to ZSTD, you can do so by setting the compression property of the CREATE TABLE statement.

CREATE TABLE t1 (
    id int(11) COMMENT "",
    value varchar(8) COMMENT ""
)
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 10
PROPERTIES (
    "compression" = "zstd"
);

If you are interested in the current properties set for a table, simply use the SQL “show create table tb1;”

show create table <tablename>;

...
PROPERTIES (
"replication_allocation" = "tag.location.default: 1", "min_load_replica_num" = "-1", "is_being_synced" = "false", "colocate_with" = "lineitem_orders", "storage_medium" = "hdd", "storage_format" = "V2", "inverted_index_storage_format" = "V1", "light_schema_change" = "true", "disable_auto_compaction" = "false", "enable_single_replica_compaction" = "false", "group_commit_interval_ms" = "10000", "group_commit_data_bytes" = "134217728"
);

If you would like to know more about the origins of ZSTD, this Facebook blog has plenty of information. Happy reading!