1
0
mirror of https://github.com/xmrig/xmrig.git synced 2025-12-06 15:42:38 -05:00
Files
xmrig/doc/RISCV_PERF_TUNING.md
2025-10-22 18:57:20 +02:00

8.5 KiB

RISC-V Performance Optimization Guide

This guide provides comprehensive instructions for optimizing XMRig on RISC-V architectures.

Build Optimizations

Compiler Flags Applied Automatically

The CMake build now applies aggressive RISC-V-specific optimizations:

# RISC-V ISA with extensions
-march=rv64gcv_zba_zbb_zbc_zbs

# Aggressive compiler optimizations
-funroll-loops              # Unroll loops for ILP (instruction-level parallelism)
-fomit-frame-pointer        # Free up frame pointer register (RISC-V has limited registers)
-fno-common                 # Better code generation for global variables
-finline-functions          # Inline more functions for better cache locality
-ffast-math                 # Relaxed FP semantics (safe for mining)
-flto                       # Link-time optimization for cross-module inlining

# Release build additions
-minline-atomics            # Inline atomic operations for faster synchronization

Optimal Build Command

mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j$(nproc)

Expected build time: 5-15 minutes depending on CPU

Runtime Optimizations

1. Memory Configuration (Most Important)

Enable huge pages to reduce TLB misses and fragmentation:

Enable 2MB Huge Pages

# Calculate required huge pages (1 page = 2MB)
# For 2 GB dataset: 1024 pages
# For cache + dataset: 1536 pages minimum
sudo sysctl -w vm.nr_hugepages=2048

Verify:

grep HugePages /proc/meminfo
# Expected: HugePages_Free should be close to nr_hugepages
# Run provided helper script
sudo ./scripts/enable_1gb_pages.sh

# Verify 1GB pages are available
cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
# Should be: >= 1 (one 1GB page)

Update config.json:

{
    "cpu": {
        "huge-pages": true
    },
    "randomx": {
        "1gb-pages": true
    }
}

2. RandomX Mode Selection

Mode Memory Init Time Throughput Recommendation
light 256 MB 10 sec Low Testing, resource-constrained
fast 2 GB 2-5 min* High Production (with huge pages)
auto 2 GB Varies High Default (uses fast if possible)

*With optimizations; can be 30+ minutes without huge pages

For RISC-V, use fast mode with huge pages enabled.

3. Dataset Initialization Threads

Optimal thread count = 60-75% of CPU cores (leaves headroom for OS/other tasks)

{
    "randomx": {
        "init": 4
    }
}

Or auto-detect (rewritten for RISC-V):

{
    "randomx": {
        "init": -1
    }
}

4. CPU Affinity (Optional)

Pin threads to specific cores for better cache locality:

{
    "cpu": {
        "rx/0": [
            { "threads": 1, "affinity": 0 },
            { "threads": 1, "affinity": 1 },
            { "threads": 1, "affinity": 2 },
            { "threads": 1, "affinity": 3 }
        ]
    }
}

5. CPU Governor (Linux)

Set to performance mode for maximum throughput:

# Check current governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

# Set to performance (requires root)
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# Verify
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# Should output: performance

Configuration Examples

Minimum (Testing)

{
    "randomx": {
        "mode": "light"
    },
    "cpu": {
        "huge-pages": false
    }
}
{
    "randomx": {
        "mode": "auto",
        "init": 4,
        "1gb-pages": true
    },
    "cpu": {
        "huge-pages": true,
        "priority": 2
    }
}

Maximum Performance (Production)

{
    "randomx": {
        "mode": "fast",
        "init": -1,
        "1gb-pages": true,
        "scratchpad_prefetch_mode": 1
    },
    "cpu": {
        "huge-pages": true,
        "priority": 3,
        "yield": false
    }
}

CLI Equivalents

# Light mode
./xmrig --randomx-mode=light

# Fast mode with 4 init threads
./xmrig --randomx-mode=fast --randomx-init=4

# Benchmark
./xmrig --bench=1M --algo=rx/0

# Benchmark Wownero variant (1 MB scratchpad)
./xmrig --bench=1M --algo=rx/wow

# Mine to pool
./xmrig -o pool.example.com:3333 -u YOUR_WALLET -p x

Performance Diagnostics

Check if Vector Extensions are Detected

Look for FEATURES: line in output:

 * CPU:       ky,x60 (uarch ky,x1)
 * FEATURES:  rv64imafdcv zba zbb zbc zbs
  • v: Vector extension (RVV) ✓
  • zba, zbb, zbc, zbs: Bit manipulation ✓
  • If missing, make sure build used -march=rv64gcv_zba_zbb_zbc_zbs

Verify Huge Pages at Runtime

# Run xmrig with --bench=1M and check output
./xmrig --bench=1M

# Look for line like:
# HUGE PAGES   100%  1 / 1 (1024 MB)
  • Should show 100% for dataset AND threads
  • If less, increase vm.nr_hugepages and reboot

Monitor Performance

# Run benchmark multiple times to find stable hashrate
./xmrig --bench=1M --algo=rx/0
./xmrig --bench=10M --algo=rx/0
./xmrig --bench=100M --algo=rx/0

# Check system load and memory during mining
while true; do free -h; grep HugePages /proc/meminfo; sleep 2; done

Expected Performance

Hardware: Orange Pi RV2 (Ky X1, 8 cores @ ~1.5 GHz)

Config Mode Hashrate Init Time
Scalar (baseline) fast 30 H/s 10 min
Scalar + huge pages fast 33 H/s 2 min
RVV (if enabled) fast 70-100 H/s 3 min

Actual results depend on CPU frequency, memory speed, and load

Troubleshooting

Long Initialization Times (30+ minutes)

Cause: Huge pages not enabled, system using swap Solution:

  1. Enable huge pages: sudo sysctl -w vm.nr_hugepages=2048
  2. Reboot: sudo reboot
  3. Reduce mining threads to free memory
  4. Check available memory: free -h

Low Hashrate (50% of expected)

Cause: CPU governor set to power-save, no huge pages, high contention Solution:

  1. Set governor to performance: echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
  2. Enable huge pages
  3. Reduce number of mining threads
  4. Check system load: top or htop

Dataset Init Crashes or Hangs

Cause: Insufficient memory, corrupted huge pages Solution:

  1. Disable huge pages temporarily: set huge-pages: false in config
  2. Reduce mining threads
  3. Reboot and re-enable huge pages
  4. Try light mode: --randomx-mode=light

Out of Memory During Benchmark

Cause: Not enough RAM for dataset + cache + threads Solution:

  1. Use light mode: --randomx-mode=light
  2. Reduce mining threads: --threads=1
  3. Increase available memory (kill other processes)
  4. Check: free -h before mining

Advanced Tuning

Vector Length (VLEN) Detection

RISC-V vector extension variable length (VLEN) affects performance:

# Check VLEN on your CPU
cat /proc/cpuinfo | grep vlen

# Expected values:
# - 128 bits (16 bytes) = minimum
# - 256 bits (32 bytes) = common
# - 512 bits (64 bytes) = high performance

Larger VLEN generally means better performance for vectorized operations.

Prefetch Optimization

The code automatically optimizes memory prefetching for RISC-V:

scratchpad_prefetch_mode: 0 = disabled (slowest)
scratchpad_prefetch_mode: 1 = prefetch.r (default, recommended)
scratchpad_prefetch_mode: 2 = prefetch.w (experimental)

Memory Bandwidth Saturation

If experiencing memory bandwidth saturation (high latency):

  1. Reduce mining threads
  2. Increase L2/L3 cache by mining fewer threads per core
  3. Enable cache QoS (AMD Ryzen): cache_qos: true

Building with Custom Flags

To build with custom RISC-V flags:

mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release \
      -DCMAKE_C_FLAGS="-march=rv64gcv_zba_zbb_zbc_zbs -O3 -funroll-loops -fomit-frame-pointer" \
      ..
make -j$(nproc)

Future Optimizations

  • Zbk* (crypto) support detection and usage
  • Optimal VLEN-aware algorithm selection
  • Per-core memory affinity (NUMA support)
  • Dynamic thread count adjustment based on thermals
  • Cross-compile optimizations for various RISC-V cores

References


For further optimization, enable RVV intrinsics by replacing sse2rvv.h with sse2rvv_optimized.h in the build.