设计工具
存储

Boost Ceph block performance

瑞安梅雷迪思 | 2018年7月

Boost Ceph block performance with Rhel 7.5, Ceph Luminous and the 微米 9200 Max NVMe 固态硬盘

嗨,大家好,

Usually a point release in an OS or 存储 solution is no big deal, but this time is different. I tested Red Hat Enterprise Linux 7.5和Ceph Luminous 12.2.5, both point releases from my previous blog on Bluestore vs. Filestore性能, and found a surprising improvement in block performance.

4KB random write IOPS performance increases by 12%, average latency decreases by 10%, and 99.99% tail latency decreases by 24%.

4KB random read IOPS and average latency are similar, and 99.99% tail latency decreases by 20% to 43%.

4KB随机块工作负载 读IOPS 写IOPS 读Avg. 延迟 写Avg. 延迟 读99.99%的延迟 写99.99%的延迟
RHEL 7.4 +赛12.2.4 2.100万年 453k 1.6ms 7.1ms 251ms 89ms
RHEL 7.4 +赛12.2.5 2.200万年 495K 1.4ms 6.5ms 194ms 67ms

This solution is optimized for block performance. Random small block testing using the Rados Block Driver in Linux saturates platinum-level 8168 英特尔珀利处理器 在双插槽存储节点中.

With 4 存储 nodes and 10 drives per 存储 node, this architecture has a usable 存储 capacity of 232TB that can be scaled out by adding additional 1U 存储 nodes.

Reference Design – Hardware

SuperMicro switches, monitor nodes, and 存储 nodes

测试结果及分析

Ceph测试方法

Ceph夜光(12.2.4 & 12.2.5) is configured with Bluestore with 2 OSDs per 微米 9200 MAX NVMe 固态硬盘. RocksDB and WAL data are stored on the same partition as data.

There are 10 drives per 存储 node and 2 OSDs per drive, 80 total OSDs with 232TB of usable capacity.

The Ceph 存储 pool tested was created with 8192 placement groups and 2x replication. Performance is tested with 100 RBD images at 75GB each, providing 7.5TB of data on a 2x replicated pool, 15TB of total data.

4KB random block performance was measured using FIO against the Rados Block Driver. We used 10 load generation 服务器 (Dual-CPU Xeons w/ 50GbE networking) and ran multiple FIO processes per load generation server. Each FIO process accessed a unique RBD image and FIO processes were distributed evenly across the 10 load generation 服务器. For example, the 100 FIO clients test used 10 FIO processes per load generation server.

We are CPU limited in all tests, even with 2x Intel 8168 CPUs per 存储 node. All tests were run 3 times for 10 minutes with a 5-minute ramp up per test.

RBD FIO 4KB Random Write Performance: RHEL 7.4 +赛12.2.4 vs. RHEL 7.5 +赛12.2.5

 

Blue bar graph showing 4KB random write IOPS + average latency

RHEL 7.5 + Ceph夜光.2.5 provides a 12% increase in IOPS and a 10% decrease in average latency.

Blue bar graph showing 4KB random write IOPS + tail latency

Tail latency is improved with RHEL 7.5和Ceph Luminous 12.2.5, decreasing by 25% at 100 FIO clients.

RBD FIO 4KB Random Read Performance: RHEL 7.4 +赛12.2.4 vs. RHEL 7.5 +赛12.2.5

Blue bar graph showing 4KB random read IOPS + average latency

4KB random read performance is similar between RHEL 7.4 + Ceph发光12.2.4和RHEL 7.5 + Ceph夜光.2.5. There’s a slight increase in IOPS with a maximum of 2.2300万IOPs.

Blue bar graph showing Ceph Luminous 4KB random read performance

Tail latency is improved with RHEL 7.5和Ceph Luminous 12.2.5, decreasing by 43% at queue depth 16 and 23% at queue depth 32.

Would You Like to Know More?

Ceph + the 微米 9200 MAX NVMe 固态硬盘 on the Intel Purley platform is super fast. The latest reference architecture for 微米 Accelerated Ceph 存储 Solutions is available now. I presented details about the reference architecture and other Ceph tuning and performance topics during my session at OpenStack峰会2018. 我的谈话录音 可以在这里找到.

Have additional questions about our testing or methodology? 发邮件给我们ssd@shopping-wonder.com.

Director, 存储 Solutions Architecture

瑞安梅雷迪思

瑞安梅雷迪思 is director of Data Center Workload Engineering for 微米's 存储 Business Unit, testing new technologies to help build 微米's thought leadership and awareness in fields like AI and NVMe-oF/TCP, along with all-flash software-defined 存储 technologies.