Heterogeneous AI Homelab Guides

A collection of practical build notes, setup guides, and performance experiments for local AI/ML infrastructure.

This repository documents real-world home-lab cluster setups for training, fine-tuning, inference, distributed systems testing, high-speed networking, storage fabrics, and heterogeneous hardware experimentation.

The current focus is on:

Dell Pro Max GB10 / NVIDIA GB10 cluster setups
Apple Mac Studio Ultra cluster setups
RoCE / RDMA networking
MikroTik 200G / 400G switching
NCCL, MPI, and multi-node communication
Shared NVMe over RDMA
Cross-vendor AI workload placement
KV-cache prefill, caching, and serving experiments

These guides are written from working lab configurations, including the mistakes, caveats, and performance traps discovered along the way.

Purpose

The goal of this project is to build a self-contained local AI infrastructure lab for training, inference, networking, storage, and distributed systems research.

This lab combines different hardware platforms, including Apple Mac Studio Ultra systems and Dell Pro Max GB10 / NVIDIA GB10 systems, to evaluate how heterogeneous compute can be used effectively for real AI workloads.

The goal is not only to benchmark hardware, but to understand how mixed-vendor systems behave under practical workloads:

Distributed training
Fine-tuning
Inference serving
KV-cache prefill
KV-cache reuse and caching
Shared NVMe over RDMA
RoCE networking
Mixed precision and quantization strategies
Cross-vendor workload placement
Local reproducible AI infrastructure

Longer term, this work is about building automation that understands hardware bottlenecks and can optimize workload placement across different systems. That includes deciding where to prefill, where to cache, where to serve, and how to mix precision across weights, activations, and KV cache.

Current Guides

Dell Pro Max GB10 / NVIDIA GB10 Cluster

A setup guide for building a GB10 cluster using ConnectX-7 networking, RoCE, NCCL, OpenMPI, and MikroTik switching.

Covered topics include:

2-node direct-connect setup
2x2 cluster layout
4-node switched fabric setup
ConnectX-7 static IP configuration
Duplicate MAC workaround on GB10 nodes
RoCE device mapping
NCCL rail pinning
OpenMPI launch configuration
MikroTik CRS switch configuration
QSFP-DD 400G to 2x200G breakout behavior
MTU 9000 and L2MTU configuration
VLAN/PVID access-port fabric isolation
Hardware offload validation using the MikroTik H flag
iperf3 validation
NCCL all-reduce validation

Guide:

guides/gb10-cluster-setup.md

Why These Notes Exist

A lot of AI infrastructure documentation assumes cloud clusters, vendor reference architectures, or fully homogeneous systems.

This repo is focused on the messy but useful middle ground:

Local hardware
Mixed vendors
Real switches
Real cabling
Real firmware quirks
Real performance debugging
Repeatable experiments
Configurations that are small enough to own but complex enough to teach useful lessons

Some findings are simple but important. For example:

Link up does not mean the switch is forwarding in hardware.
On MikroTik CRS switches, hw=yes is not enough; you want the H hardware-offload flag.
Separate software bridges can silently cap performance.
A single hardware-offloaded bridge with untagged VLAN/PVID access ports can preserve L2 isolation while keeping traffic in the ASIC.
400G QSFP-DD breakout cables may expose multiple logical 200G interfaces.
NCCL launch host IPs do not automatically guarantee NCCL data path selection.
NCCL_SOCKET_IFNAME, NCCL_IB_HCA, GID index, and OpenMPI interface pinning all matter.
Some GB10 interfaces may expose duplicate MAC addresses and need explicit locally administered MAC overrides.

Repository Layout

Suggested layout:

.
├── README.md
├── guides/
│   ├── gb10-cluster-setup.md
│   ├── mac-studio-ultra-cluster-setup.md
│   └── shared-nvme-rdma-setup.md
├── scripts/
│   ├── nccl/
│   ├── iperf/
│   └── diagnostics/
├── configs/
│   ├── netplan/
│   ├── mikrotik/
│   └── hosts/
├── results/
│   ├── iperf/
│   ├── nccl/
│   └── notes/
└── diagrams/

Tested Hardware

Current tested and/or in-progress hardware includes:

Dell Pro Max GB10 / NVIDIA GB10 systems
NVIDIA ConnectX-7 networking
MikroTik CRS812-8DS-2DQ-2DDQ switch
QSFP-DD 400G to 2x200G QSFP56 DAC breakout cables
Dell 400G DAC QSFP56 cables
Apple Mac Studio Ultra systems

This list will expand as more configurations are validated.

Validation Philosophy

Each setup should be validated in layers:

Physical link state
Interface naming and MAC mapping
MTU and jumbo frame validation
Switch forwarding and hardware offload
Raw TCP throughput with iperf3
RDMA / RoCE visibility
GID index validation
NCCL transport selection
NCCL correctness with #wrong 0
Application-level workload testing

The intent is to avoid guessing. Each layer should prove the next layer is worth debugging.

Example Performance Milestones

Example milestones from the GB10 work:

200G links detected on ConnectX-7 ports
MTU 9000 jumbo ping passing end-to-end
MikroTik switch ports running with MTU 9000 and L2MTU 9570
Switch fabric forwarding in hardware with the H flag
Switched fabric traffic reaching approximately 190+ Gbps per active 200G logical port
NCCL using RoCE instead of socket fallback
NCCL rings connecting successfully
NCCL tests completing with #wrong 0

Coming Guides

Planned or in-progress guides:

Apple Mac Studio Ultra cluster setup
GB10 four-node NCCL and RoCE tuning
Shared NVMe over RDMA for AI cache experiments
KV-cache prefill and serving across different hardware vendors
Cross-vendor inference pipeline experiments
Mixed precision strategy testing
Workload placement automation
Local benchmark reproducibility framework

Caveats

These guides are not vendor-certified reference architectures.

They are practical working notes from real lab setups. Hardware, firmware, drivers, operating systems, and switch software versions can change behavior.

Before copying any configuration into your own environment:

Export your current switch configuration.
Back up your node network configuration.
Use console access when changing switch VLAN filtering.
Validate one link or rail at a time.
Confirm hardware offload before trusting throughput numbers.
Do not assume interface names are identical across systems.
Do not assume GID indexes are identical across rails or nodes.

Contributions

Issues, corrections, test results, and additional hardware notes are welcome.

Useful contributions include:

Confirmed working configurations
Failure cases and fixes
Switch configuration examples
RoCE / NCCL tuning notes
Performance results with hardware details
Diagrams and topology maps
Reproducible benchmark scripts

When sharing results, include:

Hardware model
NIC model and firmware
Switch model and RouterOS / firmware version
Cable type
OS version
Kernel version
NCCL version
CUDA version
Test command
Relevant environment variables
Throughput and correctness output

Disclaimer

Use these configurations at your own risk.

High-speed networking, RDMA, switch VLAN filtering, firmware changes, and storage fabric experiments can break connectivity or cause data loss if applied incorrectly.

These guides are intended for experienced operators, builders, and researchers working in controlled lab environments.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
guides		guides
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heterogeneous AI Homelab Guides

Purpose

Current Guides

Dell Pro Max GB10 / NVIDIA GB10 Cluster

Why These Notes Exist

Repository Layout

Tested Hardware

Validation Philosophy

Example Performance Milestones

Coming Guides

Caveats

Contributions

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Heterogeneous AI Homelab Guides

Purpose

Current Guides

Dell Pro Max GB10 / NVIDIA GB10 Cluster

Why These Notes Exist

Repository Layout

Tested Hardware

Validation Philosophy

Example Performance Milestones

Coming Guides

Caveats

Contributions

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages