For enterprise databases, virtualization platforms (like Proxmox), and massive AI datasets, a single point of failure is unacceptable. If a physical drive dies and takes your database offline, your business stops.

The industry standard for unbreakable, scalable, and hyper-fast redundancy is Ceph. Ceph is a distributed storage platform that replicates your data across multiple physical servers. If a drive—or an entire server—goes up in smoke, the cluster heals itself automatically with zero downtime.

However, because Ceph constantly replicates data across the network, deploying it on slow SATA drives or 1Gbps networks results in crippling latency. In this tutorial, we will show DevOps engineers how to link three iDatam bare-metal servers to create an ultra-fast High-Availability (HA) Ceph cluster.

By running this on our Storage Dedicated Servers equipped with PCIe Gen 5 NVMe drives and unmetered 100Gbps internal network uplinks, you guarantee that your data replication happens at the speed of RAM.

What You'll Learn

The architecture of a minimal HA Ceph cluster (Monitors, Managers, and OSDs).

How to configure internal network hostnames across three Ubuntu servers.

How to use the modern cephadm utility to bootstrap your primary cluster server.

How to add worker servers to the cluster using SSH orchestration.

How to provision raw NVMe drives as Object Storage Daemons (OSDs) for hyper-fast data replication.

Prerequisites

To build a true High-Availability cluster, you need:

Three (3) Bare-Metal Servers: We will name them ceph-server1, ceph-server2, and ceph-server3.
Two Network Interfaces per Server: One for public internet access, and a high-speed (10Gbps or 100Gbps) backend private network for cluster replication.
Empty NVMe Drives: At least one raw, unformatted NVMe drive on each server (e.g., /dev/nvme1n1) dedicated purely to Ceph.
Ubuntu 22.04 LTS or 24.04 LTS installed on the primary OS drive.

Step 1: Configure Hostnames and Internal Networking

Ceph relies heavily on hostname resolution across the internal network. Connect to all three servers and update their /etc/hosts files.

On all three servers, edit the hosts file:

bash


sudo nano /etc/hosts

Add the private backend IPs of all your servers. It should look like this:

plaintext


10.0.0.11 ceph-server1
10.0.0.12 ceph-server2
10.0.0.13 ceph-server3

Verify that the servers can ping each other via hostname on the private network:

bash


ping -c 3 ceph-server2

Step 2: Install Docker and Cephadm on Server 1

Modern Ceph deployments use cephadm, which deploys Ceph components as containerized services for easier upgrades and management. We will bootstrap the cluster from ceph-server1.

Log into ceph-server1 and install the prerequisites:

bash


sudo apt update && sudo apt upgrade -y
sudo apt install curl docker.io python3 -y

Next, fetch the cephadm standalone script and make it executable:

bash


curl --silent --remote-name --location https://github.com/ceph/ceph/raw/quincy/src/cephadm/cephadm
chmod +x cephadm
sudo ./cephadm add-repo --release quincy
sudo apt update
sudo apt install cephadm ceph-common -y

Step 3: Bootstrap the Ceph Cluster

Now, we initialize the cluster on ceph-server1. You must specify the private IP address of ceph-server1 so Ceph knows which network to use for cluster communication.

bash


sudo cephadm bootstrap --mon-ip 10.0.0.11

This process takes a few minutes. It deploys the initial Monitor (MON) and Manager (MGR) daemons. When it finishes, the terminal will output a success message containing the URL for the Ceph Dashboard and your auto-generated admin password. Save these credentials!

Step 4: Copy SSH Keys and Add Servers 2 & 3

For cephadm to deploy services to ceph-server2 and ceph-server3, it needs passwordless SSH access.

During the bootstrap, cephadm generated a public SSH key. View it and copy it to the other servers:

bash


sudo cat /etc/ceph/ceph.pub

(Copy the output, then log into server 2 and server 3, and add this key to their /root/.ssh/authorized_keys files).

Alternatively, from ceph-server1, you can use ssh-copy-id:

bash


sudo ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-server2
sudo ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-server3

Now, tell the cluster to adopt the new servers:

bash


sudo ceph orch host add ceph-server2 10.0.0.12
sudo ceph orch host add ceph-server3 10.0.0.13

Verify the servers are attached:

bash


sudo ceph orch host ls

Step 5: Provision the NVMe Drives as OSDs

Your cluster is online, but it has no storage capacity. We need to tell Ceph to use the empty, raw NVMe drives on each server as Object Storage Daemons (OSDs).

First, list all available storage devices across the cluster to find the exact drive paths:

bash


sudo ceph orch device ls

Assuming your raw NVMe drive is /dev/nvme1n1 on all three servers, you can add them individually:

bash


sudo ceph orch daemon add osd ceph-server1:/dev/nvme1n1
sudo ceph orch daemon add osd ceph-server2:/dev/nvme1n1
sudo ceph orch daemon add osd ceph-server3:/dev/nvme1n1

(Alternatively, you can tell Ceph to automatically consume all available, unformatted drives using sudo ceph orch apply osd --all-available-devices, but explicit assignment is safer for enterprise setups).

Step 6: Verify Cluster Health

Check the status of your new High-Availability NVMe cluster:

bash


sudo ceph -s

You should see health: HEALTH_OK. You now have a 3-server distributed storage cluster capable of withstanding total server failures without losing a single byte of data.

The Network Bottleneck Warning

Ceph is incredibly powerful, but it generates massive amounts of backend "east-west" network traffic. Every time you write a file to the cluster, Ceph instantly copies it to the other servers over your internal network.

If you attempt this on a standard 1Gbps or 10Gbps shared cloud network, the replication delay will crush your database performance.

iDatam 100Gbps Dedicated Servers

To achieve the sub-millisecond latency required for production databases and VM hosting, deploy your cluster on iDatam's 100Gbps Dedicated Servers. We provide unmetered, non-blocking internal network fabrics so your Ceph cluster can replicate data as fast as your PCIe Gen 5 NVMe drives can write it.

iDatam Recommended Tutorials

Discover iDatam Dedicated Server Locations

iDatam servers are available around the world, providing diverse options for hosting websites. Each region offers unique advantages, making it easier to choose a location that best suits your specific hosting needs.

🌐 North America

🌐 South America

🌐 Europe

🌐 Asia

🌐 Australia

🌐 Africa