登录/注册
Why do you only test 2x replication!?!
There are reasons 固态硬盘 guys like me usually test Ceph with 2x replication; 固态硬盘s are more reliable than spinners, performance is better with 2x, 等等....... But what if you absolutely need 3x replication minimum? How does that impact performance on our super fast all-NVMe Ceph reference architecture? 我很高兴你这么问.
This blog is a quick performance review of our new Intel® 来自purley的Ceph RA featuring our fastest NVMe drive, the 微米 9200 MAX (6.4TB).
Our new reference architecture uses Red Hat Ceph 存储.0,基于… Ceph夜光(12.2.1). Testing in the RA is limited to Filestore performance since that is the currently supported storage engine for RHCS 3.0.
Performance is impacted exactly as one would expect when comparing 2x replication to 3x. 4KB random write IOPS decrease by about 35%, reads stay exactly the same, and 70/30 IOPS decrease by around 25%.
块的工作负载 |
2复制IOPS |
3x复制IOPS |
2x Replication Average Latency |
3x Replication Average Latency |
4KB随机读 |
200万年 |
200万年 |
1.6 ms |
1.6 ms |
4KB随机写入 |
363,000 |
237,000 |
5.3ms |
8.1 ms |
4kb 70/30 r / w |
781,000 |
577,000 |
1.4毫秒读取/ 3.5毫秒写入 |
1.7毫秒读取/ 5.4毫秒写入 |
This solution is optimized for block performance. Random small block testing using the Rados块驱动程序 in Linux saturates platinum-level 8168 Intel Purley processors in a 2-socket storage node.
With 10 drives per storage node, this architecture has a usable storage capacity of 232TB that can be scaled out by adding additional 1U storage nodes.
参考设计-硬件
测试结果及分析
Ceph测试方法
Ceph is configured using FileStore with 2 Object 存储 Daemons (OSDs) per 微米 9200MAX NVMe 固态硬盘. A 20GB journal was used for each OSD. With 10 drives per storage node and 2 OSDs per drive, Ceph has 80 total OSDs with 232TB of usable capacity.
The Ceph pools tested were created with 8192 placement groups. The 2x replicated pool in Red Hat Ceph 3.0 is tested with 100 RBD images at 75GB each, providing 7.5TB of data on a 2x replicated pool, 15TB的总数据.
The 3x replicated pool in Red Hat Ceph 3.0 is tested with 100 RBD images at 50GB each, providing 5TB of data on a 3x replicated pool, 15TB的总数据.
4KB random block performance was measured using FIO synthetic load generation tool against the Rados块驱动程序.
RBD FIO 4KB随机读 Performance
4KB Random read performance is essentially identical between a 2x and 3x replicated pool.
RBD FIO 4KB随机写入 Performance
With 3x replication, performance in IOPs is reduced by ~35% over a 2x replicated pool. Average latency is increased by a similar margin.
4KB write performance hits an optimal mix of IOPs and latency at 60 FIO clients, 363k IOPs, 5.3 ms average latency on a 2x replicated pool and 237k IOPS, 8.在3x上的平均延迟为1ms. 此时此刻, the average CPU utilization on the Ceph storage nodes is over 90%, 限制性能.
RBD FIO 4KB Random 70% Read / 30% Write Performance
The 70/30 random R/W workload IOPs performance decreases by 25% when going from a 2x replicated pool to a 3x replicated pool. Read latencies are close, slightly increased for the 3x replicated pool. Write latencies are 50%+ higher for the 3x replicated pool.
你想知道更多吗?
RHCS 3.0 + the 微米 9200 MAX NVMe 固态硬盘 on the Intel Purley platform is super fast. 参见新出版的 微米 / Red Hat / Supermicro Reference Architecture. I will present our RA and other Ceph tuning and performance topics during my session at OpenStack Summit 2018. 接下来会有更多报道. 请继续关注!
Have additional questions about our testing or methodology? Leave a comment below or you can email us ssd@shopping-wonder.com.