mirror of
https://github.com/xmrig/xmrig.git
synced 2025-12-07 07:55:04 -05:00
366 lines
8.5 KiB
Markdown
366 lines
8.5 KiB
Markdown
# RISC-V Performance Optimization Guide
|
|
|
|
This guide provides comprehensive instructions for optimizing XMRig on RISC-V architectures.
|
|
|
|
## Build Optimizations
|
|
|
|
### Compiler Flags Applied Automatically
|
|
|
|
The CMake build now applies aggressive RISC-V-specific optimizations:
|
|
|
|
```cmake
|
|
# RISC-V ISA with extensions
|
|
-march=rv64gcv_zba_zbb_zbc_zbs
|
|
|
|
# Aggressive compiler optimizations
|
|
-funroll-loops # Unroll loops for ILP (instruction-level parallelism)
|
|
-fomit-frame-pointer # Free up frame pointer register (RISC-V has limited registers)
|
|
-fno-common # Better code generation for global variables
|
|
-finline-functions # Inline more functions for better cache locality
|
|
-ffast-math # Relaxed FP semantics (safe for mining)
|
|
-flto # Link-time optimization for cross-module inlining
|
|
|
|
# Release build additions
|
|
-minline-atomics # Inline atomic operations for faster synchronization
|
|
```
|
|
|
|
### Optimal Build Command
|
|
|
|
```bash
|
|
mkdir build && cd build
|
|
cmake -DCMAKE_BUILD_TYPE=Release ..
|
|
make -j$(nproc)
|
|
```
|
|
|
|
**Expected build time**: 5-15 minutes depending on CPU
|
|
|
|
## Runtime Optimizations
|
|
|
|
### 1. Memory Configuration (Most Important)
|
|
|
|
Enable huge pages to reduce TLB misses and fragmentation:
|
|
|
|
#### Enable 2MB Huge Pages
|
|
```bash
|
|
# Calculate required huge pages (1 page = 2MB)
|
|
# For 2 GB dataset: 1024 pages
|
|
# For cache + dataset: 1536 pages minimum
|
|
sudo sysctl -w vm.nr_hugepages=2048
|
|
```
|
|
|
|
Verify:
|
|
```bash
|
|
grep HugePages /proc/meminfo
|
|
# Expected: HugePages_Free should be close to nr_hugepages
|
|
```
|
|
|
|
#### Enable 1GB Huge Pages (Optional but Recommended)
|
|
|
|
```bash
|
|
# Run provided helper script
|
|
sudo ./scripts/enable_1gb_pages.sh
|
|
|
|
# Verify 1GB pages are available
|
|
cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
|
|
# Should be: >= 1 (one 1GB page)
|
|
```
|
|
|
|
Update config.json:
|
|
```json
|
|
{
|
|
"cpu": {
|
|
"huge-pages": true
|
|
},
|
|
"randomx": {
|
|
"1gb-pages": true
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. RandomX Mode Selection
|
|
|
|
| Mode | Memory | Init Time | Throughput | Recommendation |
|
|
|------|--------|-----------|-----------|-----------------|
|
|
| **light** | 256 MB | 10 sec | Low | Testing, resource-constrained |
|
|
| **fast** | 2 GB | 2-5 min* | High | Production (with huge pages) |
|
|
| **auto** | 2 GB | Varies | High | Default (uses fast if possible) |
|
|
|
|
*With optimizations; can be 30+ minutes without huge pages
|
|
|
|
**For RISC-V, use fast mode with huge pages enabled.**
|
|
|
|
### 3. Dataset Initialization Threads
|
|
|
|
Optimal thread count = 60-75% of CPU cores (leaves headroom for OS/other tasks)
|
|
|
|
```json
|
|
{
|
|
"randomx": {
|
|
"init": 4
|
|
}
|
|
}
|
|
```
|
|
|
|
Or auto-detect (rewritten for RISC-V):
|
|
```json
|
|
{
|
|
"randomx": {
|
|
"init": -1
|
|
}
|
|
}
|
|
```
|
|
|
|
### 4. CPU Affinity (Optional)
|
|
|
|
Pin threads to specific cores for better cache locality:
|
|
|
|
```json
|
|
{
|
|
"cpu": {
|
|
"rx/0": [
|
|
{ "threads": 1, "affinity": 0 },
|
|
{ "threads": 1, "affinity": 1 },
|
|
{ "threads": 1, "affinity": 2 },
|
|
{ "threads": 1, "affinity": 3 }
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
### 5. CPU Governor (Linux)
|
|
|
|
Set to performance mode for maximum throughput:
|
|
|
|
```bash
|
|
# Check current governor
|
|
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
|
|
|
|
# Set to performance (requires root)
|
|
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
|
|
|
|
# Verify
|
|
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
|
|
# Should output: performance
|
|
```
|
|
|
|
## Configuration Examples
|
|
|
|
### Minimum (Testing)
|
|
```json
|
|
{
|
|
"randomx": {
|
|
"mode": "light"
|
|
},
|
|
"cpu": {
|
|
"huge-pages": false
|
|
}
|
|
}
|
|
```
|
|
|
|
### Recommended (Balanced)
|
|
```json
|
|
{
|
|
"randomx": {
|
|
"mode": "auto",
|
|
"init": 4,
|
|
"1gb-pages": true
|
|
},
|
|
"cpu": {
|
|
"huge-pages": true,
|
|
"priority": 2
|
|
}
|
|
}
|
|
```
|
|
|
|
### Maximum Performance (Production)
|
|
```json
|
|
{
|
|
"randomx": {
|
|
"mode": "fast",
|
|
"init": -1,
|
|
"1gb-pages": true,
|
|
"scratchpad_prefetch_mode": 1
|
|
},
|
|
"cpu": {
|
|
"huge-pages": true,
|
|
"priority": 3,
|
|
"yield": false
|
|
}
|
|
}
|
|
```
|
|
|
|
## CLI Equivalents
|
|
|
|
```bash
|
|
# Light mode
|
|
./xmrig --randomx-mode=light
|
|
|
|
# Fast mode with 4 init threads
|
|
./xmrig --randomx-mode=fast --randomx-init=4
|
|
|
|
# Benchmark
|
|
./xmrig --bench=1M --algo=rx/0
|
|
|
|
# Benchmark Wownero variant (1 MB scratchpad)
|
|
./xmrig --bench=1M --algo=rx/wow
|
|
|
|
# Mine to pool
|
|
./xmrig -o pool.example.com:3333 -u YOUR_WALLET -p x
|
|
```
|
|
|
|
## Performance Diagnostics
|
|
|
|
### Check if Vector Extensions are Detected
|
|
|
|
Look for `FEATURES:` line in output:
|
|
```
|
|
* CPU: ky,x60 (uarch ky,x1)
|
|
* FEATURES: rv64imafdcv zba zbb zbc zbs
|
|
```
|
|
|
|
- `v`: Vector extension (RVV) ✓
|
|
- `zba`, `zbb`, `zbc`, `zbs`: Bit manipulation ✓
|
|
- If missing, make sure build used `-march=rv64gcv_zba_zbb_zbc_zbs`
|
|
|
|
### Verify Huge Pages at Runtime
|
|
|
|
```bash
|
|
# Run xmrig with --bench=1M and check output
|
|
./xmrig --bench=1M
|
|
|
|
# Look for line like:
|
|
# HUGE PAGES 100% 1 / 1 (1024 MB)
|
|
```
|
|
|
|
- Should show 100% for dataset AND threads
|
|
- If less, increase `vm.nr_hugepages` and reboot
|
|
|
|
### Monitor Performance
|
|
|
|
```bash
|
|
# Run benchmark multiple times to find stable hashrate
|
|
./xmrig --bench=1M --algo=rx/0
|
|
./xmrig --bench=10M --algo=rx/0
|
|
./xmrig --bench=100M --algo=rx/0
|
|
|
|
# Check system load and memory during mining
|
|
while true; do free -h; grep HugePages /proc/meminfo; sleep 2; done
|
|
```
|
|
|
|
## Expected Performance
|
|
|
|
### Hardware: Orange Pi RV2 (Ky X1, 8 cores @ ~1.5 GHz)
|
|
|
|
| Config | Mode | Hashrate | Init Time |
|
|
|--------|------|----------|-----------|
|
|
| Scalar (baseline) | fast | 30 H/s | 10 min |
|
|
| Scalar + huge pages | fast | 33 H/s | 2 min |
|
|
| RVV (if enabled) | fast | 70-100 H/s | 3 min |
|
|
|
|
*Actual results depend on CPU frequency, memory speed, and load*
|
|
|
|
## Troubleshooting
|
|
|
|
### Long Initialization Times (30+ minutes)
|
|
|
|
**Cause**: Huge pages not enabled, system using swap
|
|
**Solution**:
|
|
1. Enable huge pages: `sudo sysctl -w vm.nr_hugepages=2048`
|
|
2. Reboot: `sudo reboot`
|
|
3. Reduce mining threads to free memory
|
|
4. Check available memory: `free -h`
|
|
|
|
### Low Hashrate (50% of expected)
|
|
|
|
**Cause**: CPU governor set to power-save, no huge pages, high contention
|
|
**Solution**:
|
|
1. Set governor to performance: `echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor`
|
|
2. Enable huge pages
|
|
3. Reduce number of mining threads
|
|
4. Check system load: `top` or `htop`
|
|
|
|
### Dataset Init Crashes or Hangs
|
|
|
|
**Cause**: Insufficient memory, corrupted huge pages
|
|
**Solution**:
|
|
1. Disable huge pages temporarily: set `huge-pages: false` in config
|
|
2. Reduce mining threads
|
|
3. Reboot and re-enable huge pages
|
|
4. Try light mode: `--randomx-mode=light`
|
|
|
|
### Out of Memory During Benchmark
|
|
|
|
**Cause**: Not enough RAM for dataset + cache + threads
|
|
**Solution**:
|
|
1. Use light mode: `--randomx-mode=light`
|
|
2. Reduce mining threads: `--threads=1`
|
|
3. Increase available memory (kill other processes)
|
|
4. Check: `free -h` before mining
|
|
|
|
## Advanced Tuning
|
|
|
|
### Vector Length (VLEN) Detection
|
|
|
|
RISC-V vector extension variable length (VLEN) affects performance:
|
|
|
|
```bash
|
|
# Check VLEN on your CPU
|
|
cat /proc/cpuinfo | grep vlen
|
|
|
|
# Expected values:
|
|
# - 128 bits (16 bytes) = minimum
|
|
# - 256 bits (32 bytes) = common
|
|
# - 512 bits (64 bytes) = high performance
|
|
```
|
|
|
|
Larger VLEN generally means better performance for vectorized operations.
|
|
|
|
### Prefetch Optimization
|
|
|
|
The code automatically optimizes memory prefetching for RISC-V:
|
|
|
|
```
|
|
scratchpad_prefetch_mode: 0 = disabled (slowest)
|
|
scratchpad_prefetch_mode: 1 = prefetch.r (default, recommended)
|
|
scratchpad_prefetch_mode: 2 = prefetch.w (experimental)
|
|
```
|
|
|
|
### Memory Bandwidth Saturation
|
|
|
|
If experiencing memory bandwidth saturation (high latency):
|
|
|
|
1. Reduce mining threads
|
|
2. Increase L2/L3 cache by mining fewer threads per core
|
|
3. Enable cache QoS (AMD Ryzen): `cache_qos: true`
|
|
|
|
## Building with Custom Flags
|
|
|
|
To build with custom RISC-V flags:
|
|
|
|
```bash
|
|
mkdir build && cd build
|
|
cmake -DCMAKE_BUILD_TYPE=Release \
|
|
-DCMAKE_C_FLAGS="-march=rv64gcv_zba_zbb_zbc_zbs -O3 -funroll-loops -fomit-frame-pointer" \
|
|
..
|
|
make -j$(nproc)
|
|
```
|
|
|
|
## Future Optimizations
|
|
|
|
- [ ] Zbk* (crypto) support detection and usage
|
|
- [ ] Optimal VLEN-aware algorithm selection
|
|
- [ ] Per-core memory affinity (NUMA support)
|
|
- [ ] Dynamic thread count adjustment based on thermals
|
|
- [ ] Cross-compile optimizations for various RISC-V cores
|
|
|
|
## References
|
|
|
|
- [RISC-V Vector Extension Spec](https://github.com/riscv/riscv-v-spec)
|
|
- [RISC-V Bit Manipulation Spec](https://github.com/riscv/riscv-bitmanip)
|
|
- [RISC-V Crypto Spec](https://github.com/riscv/riscv-crypto)
|
|
- [XMRig Documentation](https://xmrig.com/docs)
|
|
|
|
---
|
|
|
|
For further optimization, enable RVV intrinsics by replacing `sse2rvv.h` with `sse2rvv_optimized.h` in the build.
|