iDatam

IN AFRICA

ALBANIA

ARGENTINA

AUSTRALIA

AUSTRIA

AZERBAIJAN

B AND H

BANGLADESH

BELGIUM

BRAZIL

BULGARIA

CANADA

CHILE

CHINA

COLOMBIA

COSTA RICA

CROATIA

CYPRUS

CZECH

DENMARK

ECUADOR

EGYPT

EL SALVADOR

ESTONIA

FINLAND

FOR BACKUP AND STORAGE

FOR DATABASE

FOR EMAIL

FOR MEDIA STREAMING

FRANCE

GEORGIA

GERMANY

GREECE

GUATEMALA

HUNGARY

ICELAND

IN ASIA

IN AUSTRALIA

IN EUROPE

IN NORTH AMERICA

IN SOUTH AMERICA

INDIA

INDONESIA

IRELAND

ISRAEL

ITALY

JAPAN

KAZAKHSTAN

KENYA

KOSOVO

LATVIA

LIBYA

LITHUANIA

LUXEMBOURG

MALAYSIA

MALTA

MEXICO

MOLDOVA

MONTENEGRO

MOROCCO

NETHERLANDS

NEW ZEALAND

NIGERIA

NORWAY

PAKISTAN

PANAMA

PARAGUAY

PERU

PHILIPPINES

POLAND

PORTUGAL

QATAR

ROMANIA

RUSSIA

SAUDI ARABIA

SERBIA

SINGAPORE

SLOVAKIA

SLOVENIA

SOUTH AFRICA

SOUTH KOREA

SPAIN

SWEDEN

SWITZERLAND

TAIWAN

THAILAND

TUNISIA

TURKEY

UK

UKRAINE

UNITED ARAB EMIRATES

URUGUAY

USA

UZBEKISTAN

VIETNAM

How to Configure a High-Availability Ceph Storage Cluster using NVMe Dedicated Servers

Learn how to deploy a High-Availability Ceph storage cluster across three bare-metal NVMe dedicated servers. Build unbreakable, hyper-fast enterprise storage.

Illustration of a high-availability Ceph storage cluster with three interconnected NVMe servers

For enterprise databases, virtualization platforms (like Proxmox), and massive AI datasets, a single point of failure is unacceptable. If a physical drive dies and takes your database offline, your business stops.

The industry standard for unbreakable, scalable, and hyper-fast redundancy is Ceph. Ceph is a distributed storage platform that replicates your data across multiple physical servers. If a drive—or an entire server—goes up in smoke, the cluster heals itself automatically with zero downtime.

However, because Ceph constantly replicates data across the network, deploying it on slow SATA drives or 1Gbps networks results in crippling latency. In this tutorial, we will show DevOps engineers how to link three iDatam bare-metal servers to create an ultra-fast High-Availability (HA) Ceph cluster.

By running this on our Storage Dedicated Servers equipped with PCIe Gen 5 NVMe drives and unmetered 100Gbps internal network uplinks, you guarantee that your data replication happens at the speed of RAM.

What You'll Learn

Prerequisites

To build a true High-Availability cluster, you need:

  • Three (3) Bare-Metal Servers: We will name them ceph-server1, ceph-server2, and ceph-server3.

  • Two Network Interfaces per Server: One for public internet access, and a high-speed (10Gbps or 100Gbps) backend private network for cluster replication.

  • Empty NVMe Drives: At least one raw, unformatted NVMe drive on each server (e.g., /dev/nvme1n1) dedicated purely to Ceph.

  • Ubuntu 22.04 LTS or 24.04 LTS installed on the primary OS drive.

Step 1: Configure Hostnames and Internal Networking

Ceph relies heavily on hostname resolution across the internal network. Connect to all three servers and update their /etc/hosts files.

On all three servers, edit the hosts file:

bash

sudo nano /etc/hosts
                                

Add the private backend IPs of all your servers. It should look like this:

plaintext

10.0.0.11 ceph-server1
10.0.0.12 ceph-server2
10.0.0.13 ceph-server3
                                

Verify that the servers can ping each other via hostname on the private network:

bash

ping -c 3 ceph-server2
                                

Step 2: Install Docker and Cephadm on Server 1

Modern Ceph deployments use cephadm, which deploys Ceph components as containerized services for easier upgrades and management. We will bootstrap the cluster from ceph-server1.

Log into ceph-server1 and install the prerequisites:

bash

sudo apt update && sudo apt upgrade -y
sudo apt install curl docker.io python3 -y
                                

Next, fetch the cephadm standalone script and make it executable:

bash

curl --silent --remote-name --location https://github.com/ceph/ceph/raw/quincy/src/cephadm/cephadm
chmod +x cephadm
sudo ./cephadm add-repo --release quincy
sudo apt update
sudo apt install cephadm ceph-common -y
                                

Step 3: Bootstrap the Ceph Cluster

Now, we initialize the cluster on ceph-server1. You must specify the private IP address of ceph-server1 so Ceph knows which network to use for cluster communication.

bash

sudo cephadm bootstrap --mon-ip 10.0.0.11
                                

This process takes a few minutes. It deploys the initial Monitor (MON) and Manager (MGR) daemons. When it finishes, the terminal will output a success message containing the URL for the Ceph Dashboard and your auto-generated admin password. Save these credentials!

Step 4: Copy SSH Keys and Add Servers 2 & 3

For cephadm to deploy services to ceph-server2 and ceph-server3, it needs passwordless SSH access.

During the bootstrap, cephadm generated a public SSH key. View it and copy it to the other servers:

bash

sudo cat /etc/ceph/ceph.pub
                                

(Copy the output, then log into server 2 and server 3, and add this key to their /root/.ssh/authorized_keys files).

Alternatively, from ceph-server1, you can use ssh-copy-id:

bash

sudo ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-server2
sudo ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-server3
                                

Now, tell the cluster to adopt the new servers:

bash

sudo ceph orch host add ceph-server2 10.0.0.12
sudo ceph orch host add ceph-server3 10.0.0.13
                                

Verify the servers are attached:

bash

sudo ceph orch host ls
                                

Step 5: Provision the NVMe Drives as OSDs

Your cluster is online, but it has no storage capacity. We need to tell Ceph to use the empty, raw NVMe drives on each server as Object Storage Daemons (OSDs).

First, list all available storage devices across the cluster to find the exact drive paths:

bash

sudo ceph orch device ls
                                

Assuming your raw NVMe drive is /dev/nvme1n1 on all three servers, you can add them individually:

bash

sudo ceph orch daemon add osd ceph-server1:/dev/nvme1n1
sudo ceph orch daemon add osd ceph-server2:/dev/nvme1n1
sudo ceph orch daemon add osd ceph-server3:/dev/nvme1n1
                                

(Alternatively, you can tell Ceph to automatically consume all available, unformatted drives using sudo ceph orch apply osd --all-available-devices, but explicit assignment is safer for enterprise setups).

Step 6: Verify Cluster Health

Check the status of your new High-Availability NVMe cluster:

bash

sudo ceph -s
                                

You should see health: HEALTH_OK. You now have a 3-server distributed storage cluster capable of withstanding total server failures without losing a single byte of data.

The Network Bottleneck Warning

Ceph is incredibly powerful, but it generates massive amounts of backend "east-west" network traffic. Every time you write a file to the cluster, Ceph instantly copies it to the other servers over your internal network.

If you attempt this on a standard 1Gbps or 10Gbps shared cloud network, the replication delay will crush your database performance.

iDatam 100Gbps Dedicated Servers

To achieve the sub-millisecond latency required for production databases and VM hosting, deploy your cluster on iDatam's 100Gbps Dedicated Servers. We provide unmetered, non-blocking internal network fabrics so your Ceph cluster can replicate data as fast as your PCIe Gen 5 NVMe drives can write it.

Discover iDatam Dedicated Server Locations

iDatam servers are available around the world, providing diverse options for hosting websites. Each region offers unique advantages, making it easier to choose a location that best suits your specific hosting needs.

Up