1
0
mirror of https://github.com/xmrig/xmrig.git synced 2026-04-17 04:59:28 -04:00

Compare commits

...

53 Commits

Author SHA1 Message Date
XMRig
4f58a7afff v6.26.0-dev 2026-03-25 23:58:45 +07:00
xmrig
806cfc3f4d Merge pull request #3790 from SChernykh/dev
Fix arm64 builds (attempt number 2)
2026-03-03 18:27:10 +07:00
SChernykh
84352c71ca Fix arm64 builds (attempt number 2) 2026-03-03 12:19:28 +01:00
xmrig
27d535d00f Merge pull request #3789 from SChernykh/dev
Fixed clang arm64 builds
2026-03-03 14:55:10 +07:00
SChernykh
9d296c7f02 Fixed clang arm64 builds 2026-03-03 08:36:24 +01:00
xmrig
3b4e38ba18 Merge pull request #3785 from SChernykh/dev
Don't reset nonce during donation rounds (possible fix for #3669)
2026-02-28 00:55:39 +07:00
SChernykh
8b33d2494b Don't reset nonce during donation rounds (possible fix for #3669) 2026-02-27 18:52:37 +01:00
xmrig
c534c669cb Merge pull request #3784 from Willie169/master
Fix OpenCL address-space mismatch in keccak_f800_round
2026-02-27 11:24:26 +07:00
Willie Shen
d7f7094c45 Apply uint32_t st[25] in keccak_f800 2026-02-26 19:18:06 +08:00
Willie Shen
14dcd36296 Fix OpenCL address-space mismatch in keccak_f800_round 2026-02-26 16:34:37 +08:00
xmrig
a935641274 Merge pull request #3783 from SChernykh/dev
Fixed initial dataset prefetch for RandomX v2
2026-02-16 03:44:35 +07:00
SChernykh
976a08efb4 Fixed initial dataset prefetch for RandomX v2 2026-02-15 21:35:54 +01:00
xmrig
17aa97e543 Merge pull request #3782 from SChernykh/dev
RandomX v2: don't update dataset when switching to/from it
2026-02-15 20:54:01 +07:00
SChernykh
48b29fd68b RandomX v2: don't update dataset when switching to/from it 2026-02-13 13:34:11 +01:00
XMRig
6454a0abf2 Removed unused -P command line option. 2026-02-09 14:50:42 +07:00
xmrig
8464d474d4 Merge pull request #3778 from SChernykh/dev
ARM64 fixes
2026-02-04 14:33:37 +07:00
SChernykh
26ee1cd291 ARM64 fixes
- Added a check for free memory before enabling NUMA
- Removed duplicate AES tables from the ARM64 JIT compiler

Fixes #3729 and #3777
2026-02-04 00:07:16 +01:00
xmrig
d9b39e8c32 Merge pull request #3776 from SChernykh/dev
Sync changes with xmrig-proxy
2026-02-02 18:24:25 +07:00
SChernykh
05dd6dcc40 Sync changes with xmrig-proxy 2026-02-02 12:11:01 +01:00
xmrig
316a3673bc Merge pull request #3775 from SChernykh/dev
RandomX v2: added `commitment` field to stratum submit message
2026-02-01 19:57:41 +07:00
SChernykh
42c943c03f RandomX v2: added commitment field to stratum submit message 2026-02-01 12:32:58 +01:00
xmrig
c2c8080783 Merge pull request #3774 from SChernykh/dev
RandomX v2 (RISC-V)
2026-02-01 12:16:08 +07:00
SChernykh
d82d7f3f20 RandomX v2 (RISC-V) 2026-01-31 21:50:38 +01:00
xmrig
a189d84fcd Merge pull request #3772 from SChernykh/dev
RandomX v2 (ARM64)
2026-01-31 23:34:33 +07:00
SChernykh
cb6001945e RandomX v2 (ARM64) 2026-01-31 17:30:35 +01:00
xmrig
f16a06eb67 Merge pull request #3769 from SChernykh/dev
RandomX v2 (initial support)
2026-01-30 22:09:57 +07:00
SChernykh
9d71358f46 RandomX v2 + commitments 2026-01-30 16:07:25 +01:00
xmrig
5a80c65d31 Merge pull request #3765 from oxyzenQ/dev-typos
fix: cross typos detail below:
2026-01-22 12:57:13 +07:00
rezky_nightky
67cc6cfd1c fix: cross typos detail below:
What I did
Repository rules / exclusions
I didn’t find repo-specific spelling tooling already in place (no existing codespell config).
Given this is a C/C++ repo with vendored sources, I treated src/3rdparty/** as third-party and excluded it from typo fixing (and encoded that in the new .codespellrc).
Typos fixed (first-party only)
Docs
CHANGELOG.md: perfomance -> performance
doc/API.md: optionaly -> optionally, Offical -> Official
doc/BENCHMARK.md: parameteres -> parameters
doc/CPU.md: loosing -> losing, instuctions -> instructions
doc/CHANGELOG_OLD.md: multiple obvious text typos like Breaked -> Broken, singal -> signal, previos -> previous, secons -> seconds, automaticaly -> automatically, perfomance -> performance
Code comments / doc comments (safe text-only changes)
src/base/crypto/sha3.cpp: Inteface -> Interface (comment banner)
src/backend/opencl/cl/cn/cryptonight.cl: performe -> perform, crashs -> crashes (comments)
src/backend/opencl/cl/kawpow/kawpow.cl: regsters -> registers, intial -> initial (comments)
src/crypto/randomx/aes_hash.cpp: intial -> initial (comment)
src/crypto/randomx/intrin_portable.h: cant -> can't (comment)
src/crypto/randomx/randomx.h: intialization -> initialization (doc comment)
src/crypto/cn/c_jh.c: intital -> initial (comment)
src/crypto/cn/skein_port.h: varaiable -> variable (comment)
src/backend/opencl/cl/cn/wolf-skein.cl: Build-in -> Built-in (comment)
What I intentionally did NOT change
Anything under src/3rdparty/** (vendored).
A few remaining codespell hits are either:
Upstream/embedded sources we excluded (groestl256.cl, jh.cl contain Projet)
Potentially valid identifier/name (Carmel CPU codename)
Low-risk token in codegen comments (vor inside an instruction comment)
These are handled via ignore rules in .codespellrc instead of modifying code.

Added: .codespellrc
Created /.codespellrc with:

skip entries for vendored / embedded upstream areas:
./src/3rdparty
./src/crypto/ghostrider
./src/crypto/randomx/blake2
./src/crypto/cn/sse2neon.h
./src/backend/opencl/cl/cn/groestl256.cl
./src/backend/opencl/cl/cn/jh.cl
ignore-words-list for:
Carmel
vor
Verification
codespell . --config ./.codespellrc now exits clean (exit code 0).

Signed-off-by: rezky_nightky <with.rezky@gmail.com>
2026-01-21 22:36:59 +07:00
XMRig
db24bf5154 Revert "Merge branch 'pr3764' into dev"
This reverts commit 0d9a372e49, reversing
changes made to 1a04bf2904.
2026-01-21 21:32:51 +07:00
XMRig
0d9a372e49 Merge branch 'pr3764' into dev 2026-01-21 21:27:41 +07:00
XMRig
c1e3d386fe Merge branch 'master' of https://github.com/oxyzenQ/xmrig into pr3764 2026-01-21 21:27:11 +07:00
rezky_nightky
5ca4828255 feat: stability improvements, see detail below
Key stability improvements made (deterministic + bounded)
1) Bounded memory usage in long-running stats
Fixed unbounded growth in NetworkState latency tracking:
Replaced std::vector<uint16_t> m_latency + push_back() with a fixed-size ring buffer (kLatencyWindow = 1024) and explicit counters.
Median latency computation now operates on at most 1024 samples, preventing memory growth and avoiding performance cliffs from ever-growing copies/sorts.
2) Prevent crash/UAF on shutdown + more predictable teardown
Controller shutdown ordering (Controller::stop()):
Now stops m_miner before destroying m_network.
This reduces chances of worker threads submitting results into a network listener that’s already destroyed.
Thread teardown hardening (backend/common/Thread.h):
Destructor now checks std::thread::joinable() before join().
Avoids std::terminate() if a thread object exists but never started due to early exit/error paths.
3) Fixed real leaks (including executable memory)
Executable memory leak fixed (crypto/cn/CnCtx.cpp):
CnCtx::create() allocates executable memory for generated_code via VirtualMemory::allocateExecutableMemory(0x4000, ...).
Previously CnCtx::release() only _mm_free()’d the struct, leaking the executable mapping.
Now CnCtx::release() frees generated_code before freeing the ctx.
GPU verification leak fixed (net/JobResults.cpp):
In getResults() (GPU result verification), a cryptonight_ctx was created via CnCtx::create() but never released.
Added CnCtx::release(ctx, 1).
4) JobResults: bounded queues + backpressure + safe shutdown semantics
The old JobResults could:

enqueue unlimited std::list items (m_results, m_bundles) → unbounded RAM,
call uv_queue_work per async batch → unbounded libuv threadpool backlog,
delete handler directly while worker threads might still submit → potential crash/UAF.
Changes made:

Hard queue limits:
kMaxQueuedResults = 4096
kMaxQueuedBundles = 256
Excess is dropped (bounded behavior under load).
Async coalescing:
Only one pending async notification at a time (m_pendingAsync), reducing eventfd/uv wake storms.
Bounded libuv work scheduling:
Only one uv_queue_work is scheduled at a time (m_workScheduled), preventing CPU starvation and unpredictable backlog.
Safe shutdown:
JobResults::stop() now detaches global handler first, then calls handler->stop().
Shutdown detaches m_listener, clears queues, and defers deletion until in-flight work is done.
Defensive bound on GPU result count:
Clamp count to 0xFF inside JobResults as well, not just in the caller, to guard against corrupted kernels/drivers.
5) Idempotent cleanup
VirtualMemory::destroy() now sets pool = nullptr after delete:
prevents accidental double-delete on repeated teardown paths.
Verification performed
codespell . --config ./.codespellrc: clean
CMake configure + build completed successfully (Release build)

Signed-off-by: rezky_nightky <with.rezky@gmail.com>
2026-01-21 21:22:43 +07:00
XMRig
1a04bf2904 Merge branch 'pr3762' into dev 2026-01-21 21:22:34 +07:00
XMRig
5feb764b27 Merge branch 'fix-keepalive-timer' of https://github.com/HashVault/vltrig into pr3762 2026-01-21 21:21:48 +07:00
rezky_nightky
cb7511507f fix: cross typos detail below:
What I did
Repository rules / exclusions
I didn’t find repo-specific spelling tooling already in place (no existing codespell config).
Given this is a C/C++ repo with vendored sources, I treated src/3rdparty/** as third-party and excluded it from typo fixing (and encoded that in the new .codespellrc).
Typos fixed (first-party only)
Docs
CHANGELOG.md: perfomance -> performance
doc/API.md: optionaly -> optionally, Offical -> Official
doc/BENCHMARK.md: parameteres -> parameters
doc/CPU.md: loosing -> losing, instuctions -> instructions
doc/CHANGELOG_OLD.md: multiple obvious text typos like Breaked -> Broken, singal -> signal, previos -> previous, secons -> seconds, automaticaly -> automatically, perfomance -> performance
Code comments / doc comments (safe text-only changes)
src/base/crypto/sha3.cpp: Inteface -> Interface (comment banner)
src/backend/opencl/cl/cn/cryptonight.cl: performe -> perform, crashs -> crashes (comments)
src/backend/opencl/cl/kawpow/kawpow.cl: regsters -> registers, intial -> initial (comments)
src/crypto/randomx/aes_hash.cpp: intial -> initial (comment)
src/crypto/randomx/intrin_portable.h: cant -> can't (comment)
src/crypto/randomx/randomx.h: intialization -> initialization (doc comment)
src/crypto/cn/c_jh.c: intital -> initial (comment)
src/crypto/cn/skein_port.h: varaiable -> variable (comment)
src/backend/opencl/cl/cn/wolf-skein.cl: Build-in -> Built-in (comment)
What I intentionally did NOT change
Anything under src/3rdparty/** (vendored).
A few remaining codespell hits are either:
Upstream/embedded sources we excluded (groestl256.cl, jh.cl contain Projet)
Potentially valid identifier/name (Carmel CPU codename)
Low-risk token in codegen comments (vor inside an instruction comment)
These are handled via ignore rules in .codespellrc instead of modifying code.

Added: .codespellrc
Created /.codespellrc with:

skip entries for vendored / embedded upstream areas:
./src/3rdparty
./src/crypto/ghostrider
./src/crypto/randomx/blake2
./src/crypto/cn/sse2neon.h
./src/backend/opencl/cl/cn/groestl256.cl
./src/backend/opencl/cl/cn/jh.cl
ignore-words-list for:
Carmel
vor
Verification
codespell . --config ./.codespellrc now exits clean (exit code 0).

Signed-off-by: rezky_nightky <with.rezky@gmail.com>
2026-01-21 20:14:59 +07:00
HashVault
6e6eab1763 Fix keepalive timer logic
- Reset timer on send instead of receive (pool needs to know we're alive)
- Remove timer disable after first ping to enable continuous keepalives
2026-01-20 14:39:06 +03:00
xmrig
f35f9d7241 Merge pull request #3759 from SChernykh/dev
Optimized VAES code
2026-01-17 21:55:01 +07:00
SChernykh
45d0a15c98 Optimized VAES code
Use only 1 mask instead of 2
2026-01-16 20:43:35 +01:00
xmrig
f4845cbd68 Merge pull request #3758 from SChernykh/dev
RandomX: added VAES-512 support for Zen5
2026-01-16 19:07:09 +07:00
SChernykh
ed80a8a828 RandomX: added VAES-512 support for Zen5
+0.1-0.2% hashrate improvement.
2026-01-16 13:04:40 +01:00
xmrig
9e5492eecc Merge pull request #3757 from SChernykh/dev
Improved RISC-V code
2026-01-15 19:51:57 +07:00
SChernykh
e41b28ef78 Improved RISC-V code 2026-01-15 12:48:55 +01:00
xmrig
1bd59129c4 Merge pull request #3750 from SChernykh/dev
RISC-V: use vector hardware AES instead of scalar
2026-01-01 15:43:36 +07:00
SChernykh
8ccf7de304 RISC-V: use vector hardware AES instead of scalar 2025-12-31 23:37:55 +01:00
xmrig
30ffb9cb27 Merge pull request #3749 from SChernykh/dev
RISC-V: detect and use hardware AES
2025-12-30 14:13:44 +07:00
SChernykh
d3a84c4b52 RISC-V: detect and use hardware AES 2025-12-29 22:10:07 +01:00
xmrig
eb49237aaa Merge pull request #3748 from SChernykh/dev
RISC-V: auto-detect and use vector code for all RandomX AES functions
2025-12-28 13:12:50 +07:00
SChernykh
e1efd3dc7f RISC-V: auto-detect and use vector code for all RandomX AES functions 2025-12-27 21:30:14 +01:00
xmrig
e3d0135708 Merge pull request #3746 from SChernykh/dev
RISC-V: vectorized RandomX main loop
2025-12-27 18:40:47 +07:00
SChernykh
f661e1eb30 RISC-V: vectorized RandomX main loop 2025-12-26 22:11:39 +01:00
XMRig
99488751f1 v6.25.1-dev 2025-12-23 20:53:43 +07:00
XMRig
5fb0321c84 Merge branch 'master' into dev 2025-12-23 20:53:11 +07:00
86 changed files with 10048 additions and 6670 deletions

3
.codespellrc Normal file
View File

@@ -0,0 +1,3 @@
[codespell]
skip = ./src/3rdparty,./src/crypto/ghostrider,./src/crypto/randomx/blake2,./src/crypto/cn/sse2neon.h,./src/backend/opencl/cl/cn/groestl256.cl,./src/backend/opencl/cl/cn/jh.cl
ignore-words-list = Carmel,vor

View File

@@ -1,3 +1,17 @@
# v6.26.0
- [#3769](https://github.com/xmrig/xmrig/pull/3769), [#3772](https://github.com/xmrig/xmrig/pull/3772), [#3774](https://github.com/xmrig/xmrig/pull/3774), [#3775](https://github.com/xmrig/xmrig/pull/3775), [#3776](https://github.com/xmrig/xmrig/pull/3776), [#3782](https://github.com/xmrig/xmrig/pull/3782), [#3783](https://github.com/xmrig/xmrig/pull/3783) **Added support for RandomX v2.**
- [#3746](https://github.com/xmrig/xmrig/pull/3746) RISC-V: vectorized RandomX main loop.
- [#3748](https://github.com/xmrig/xmrig/pull/3748) RISC-V: auto-detect and use vector code for all RandomX AES functions.
- [#3749](https://github.com/xmrig/xmrig/pull/3749) RISC-V: detect and use hardware AES.
- [#3750](https://github.com/xmrig/xmrig/pull/3750) RISC-V: use vector hardware AES instead of scalar.
- [#3757](https://github.com/xmrig/xmrig/pull/3757) RISC-V: Fixed scratchpad prefetch, removed an unnecessary instruction.
- [#3758](https://github.com/xmrig/xmrig/pull/3758) RandomX: added VAES-512 support for Zen5.
- [#3759](https://github.com/xmrig/xmrig/pull/3759) RandomX: Optimized VAES code.
- [#3762](https://github.com/xmrig/xmrig/pull/3762) Fixed keepalive timer logic.
- [#3778](https://github.com/xmrig/xmrig/pull/3778) RandomX: ARM64 fixes.
- [#3784](https://github.com/xmrig/xmrig/pull/3784) Fixed OpenCL address-space mismatch in `keccak_f800_round`.
- [#3785](https://github.com/xmrig/xmrig/pull/3785) Don't reset nonce during donation rounds.
# v6.25.0
- [#3680](https://github.com/xmrig/xmrig/pull/3680) Added `armv8l` to the list of 32-bit ARM targets.
- [#3708](https://github.com/xmrig/xmrig/pull/3708) Minor Aarch64 JIT changes (better instruction selection, don't emit instructions that add 0, etc).
@@ -160,7 +174,7 @@
# v6.16.2
- [#2751](https://github.com/xmrig/xmrig/pull/2751) Fixed crash on CPUs supporting VAES and running GCC-compiled xmrig.
- [#2761](https://github.com/xmrig/xmrig/pull/2761) Fixed broken auto-tuning in GCC Windows build.
- [#2771](https://github.com/xmrig/xmrig/issues/2771) Fixed environment variables support for GhostRider and KawPow.
- [#2771](https://github.com/xmrig/xmrig/issues/2771) Fixed environment variables support for GhostRider and KawPow.
- [#2769](https://github.com/xmrig/xmrig/pull/2769) Performance fixes:
- Fixed several performance bottlenecks introduced in v6.16.1.
- Fixed overall GCC-compiled build performance, it's the same speed as MSVC build now.
@@ -468,7 +482,7 @@
- Compiler for Windows gcc builds updated to v10.1.
# v5.11.1
- [#1652](https://github.com/xmrig/xmrig/pull/1652) Up to 1% RandomX perfomance improvement on recent AMD CPUs.
- [#1652](https://github.com/xmrig/xmrig/pull/1652) Up to 1% RandomX performance improvement on recent AMD CPUs.
- [#1306](https://github.com/xmrig/xmrig/issues/1306) Fixed possible double connection to a pool.
- [#1654](https://github.com/xmrig/xmrig/issues/1654) Fixed build with LibreSSL.
@@ -574,9 +588,9 @@
- Added automatic huge pages configuration on Linux if use the miner with root privileges.
- **Added [automatic Intel prefetchers configuration](https://xmrig.com/docs/miner/randomx-optimization-guide#intel-specific-optimizations) on Linux.**
- Added new option `wrmsr` in `randomx` object with command line equivalent `--randomx-wrmsr=6`.
- [#1396](https://github.com/xmrig/xmrig/pull/1396) [#1401](https://github.com/xmrig/xmrig/pull/1401) New performance optimizations for Ryzen CPUs.
- [#1385](https://github.com/xmrig/xmrig/issues/1385) Added `max-threads-hint` option support for RandomX dataset initialization threads.
- [#1386](https://github.com/xmrig/xmrig/issues/1386) Added `priority` option support for RandomX dataset initialization threads.
- [#1396](https://github.com/xmrig/xmrig/pull/1396) [#1401](https://github.com/xmrig/xmrig/pull/1401) New performance optimizations for Ryzen CPUs.
- [#1385](https://github.com/xmrig/xmrig/issues/1385) Added `max-threads-hint` option support for RandomX dataset initialization threads.
- [#1386](https://github.com/xmrig/xmrig/issues/1386) Added `priority` option support for RandomX dataset initialization threads.
- For official builds all dependencies (libuv, hwloc, openssl) updated to recent versions.
- Windows `msvc` builds now use Visual Studio 2019 instead of 2017.
@@ -622,7 +636,7 @@ This release based on 4.x.x series and include all features from v4.6.2-beta, ch
- Removed command line option `--http-enabled`, HTTP API enabled automatically if any other `--http-*` option provided.
- [#1172](https://github.com/xmrig/xmrig/issues/1172) **Added OpenCL mining backend.**
- [#268](https://github.com/xmrig/xmrig-amd/pull/268) [#270](https://github.com/xmrig/xmrig-amd/pull/270) [#271](https://github.com/xmrig/xmrig-amd/pull/271) [#273](https://github.com/xmrig/xmrig-amd/pull/273) [#274](https://github.com/xmrig/xmrig-amd/pull/274) [#1171](https://github.com/xmrig/xmrig/pull/1171) Added RandomX support for OpenCL, thanks [@SChernykh](https://github.com/SChernykh).
- Algorithm `cn/wow` removed, as no longer alive.
- Algorithm `cn/wow` removed, as no longer alive.
# Previous versions
[doc/CHANGELOG_OLD.md](doc/CHANGELOG_OLD.md)

View File

@@ -51,42 +51,105 @@ if (XMRIG_RISCV)
# default build uses the RV64GC baseline
set(RVARCH "rv64gc")
enable_language(ASM)
try_run(RANDOMX_VECTOR_RUN_FAIL
RANDOMX_VECTOR_COMPILE_OK
${CMAKE_CURRENT_BINARY_DIR}/
${CMAKE_CURRENT_SOURCE_DIR}/src/crypto/randomx/tests/riscv64_vector.s
COMPILE_DEFINITIONS "-march=rv64gcv")
if (RANDOMX_VECTOR_COMPILE_OK AND NOT RANDOMX_VECTOR_RUN_FAIL)
set(RVARCH_V ON)
message(STATUS "RISC-V vector extension detected")
else()
set(RVARCH_V OFF)
endif()
try_run(RANDOMX_ZICBOP_RUN_FAIL
RANDOMX_ZICBOP_COMPILE_OK
${CMAKE_CURRENT_BINARY_DIR}/
${CMAKE_CURRENT_SOURCE_DIR}/src/crypto/randomx/tests/riscv64_zicbop.s
COMPILE_DEFINITIONS "-march=rv64gc_zicbop")
if (RANDOMX_ZICBOP_COMPILE_OK AND NOT RANDOMX_ZICBOP_RUN_FAIL)
set(RVARCH_ZICBOP ON)
message(STATUS "RISC-V zicbop extension detected")
else()
set(RVARCH_ZICBOP OFF)
endif()
try_run(RANDOMX_ZBA_RUN_FAIL
RANDOMX_ZBA_COMPILE_OK
${CMAKE_CURRENT_BINARY_DIR}/
${CMAKE_CURRENT_SOURCE_DIR}/src/crypto/randomx/tests/riscv64_zba.s
COMPILE_DEFINITIONS "-march=rv64gc_zba")
if (RANDOMX_ZBA_COMPILE_OK AND NOT RANDOMX_ZBA_RUN_FAIL)
set(RVARCH_ZBA ON)
message(STATUS "RISC-V zba extension detected")
else()
set(RVARCH_ZBA OFF)
endif()
try_run(RANDOMX_ZBB_RUN_FAIL
RANDOMX_ZBB_COMPILE_OK
${CMAKE_CURRENT_BINARY_DIR}/
${CMAKE_CURRENT_SOURCE_DIR}/src/crypto/randomx/tests/riscv64_zbb.s
COMPILE_DEFINITIONS "-march=rv64gc_zbb")
if (RANDOMX_ZBB_COMPILE_OK AND NOT RANDOMX_ZBB_RUN_FAIL)
set(RVARCH_ZBB ON)
message(STATUS "RISC-V zbb extension detected")
else()
set(RVARCH_ZBB OFF)
endif()
try_run(RANDOMX_ZVKB_RUN_FAIL
RANDOMX_ZVKB_COMPILE_OK
${CMAKE_CURRENT_BINARY_DIR}/
${CMAKE_CURRENT_SOURCE_DIR}/src/crypto/randomx/tests/riscv64_zvkb.s
COMPILE_DEFINITIONS "-march=rv64gcv_zvkb")
if (RANDOMX_ZVKB_COMPILE_OK AND NOT RANDOMX_ZVKB_RUN_FAIL)
set(RVARCH_ZVKB ON)
message(STATUS "RISC-V zvkb extension detected")
else()
set(RVARCH_ZVKB OFF)
endif()
try_run(RANDOMX_ZVKNED_RUN_FAIL
RANDOMX_ZVKNED_COMPILE_OK
${CMAKE_CURRENT_BINARY_DIR}/
${CMAKE_CURRENT_SOURCE_DIR}/src/crypto/randomx/tests/riscv64_zvkned.s
COMPILE_DEFINITIONS "-march=rv64gcv_zvkned")
if (RANDOMX_ZVKNED_COMPILE_OK AND NOT RANDOMX_ZVKNED_RUN_FAIL)
set(RVARCH_ZVKNED ON)
message(STATUS "RISC-V zvkned extension detected")
else()
set(RVARCH_ZVKNED OFF)
endif()
# for native builds, enable Zba and Zbb if supported by the CPU
if(ARCH STREQUAL "native")
enable_language(ASM)
try_run(RANDOMX_VECTOR_RUN_FAIL
RANDOMX_VECTOR_COMPILE_OK
${CMAKE_CURRENT_BINARY_DIR}/
${CMAKE_CURRENT_SOURCE_DIR}/src/crypto/randomx/tests/riscv64_vector.s
COMPILE_DEFINITIONS "-march=rv64gcv_zicbop")
if (RANDOMX_VECTOR_COMPILE_OK AND NOT RANDOMX_VECTOR_RUN_FAIL)
set(RVARCH "${RVARCH}v_zicbop")
add_definitions(-DXMRIG_RVV_ENABLED)
message(STATUS "RISC-V vector extension detected")
if (ARCH STREQUAL "native")
if (RVARCH_V)
set(RVARCH "${RVARCH}v")
endif()
try_run(RANDOMX_ZBA_RUN_FAIL
RANDOMX_ZBA_COMPILE_OK
${CMAKE_CURRENT_BINARY_DIR}/
${CMAKE_CURRENT_SOURCE_DIR}/src/crypto/randomx/tests/riscv64_zba.s
COMPILE_DEFINITIONS "-march=rv64gc_zba")
if (RANDOMX_ZBA_COMPILE_OK AND NOT RANDOMX_ZBA_RUN_FAIL)
if (RVARCH_ZICBOP)
set(RVARCH "${RVARCH}_zicbop")
endif()
if (RVARCH_ZBA)
set(RVARCH "${RVARCH}_zba")
message(STATUS "RISC-V zba extension detected")
endif()
try_run(RANDOMX_ZBB_RUN_FAIL
RANDOMX_ZBB_COMPILE_OK
${CMAKE_CURRENT_BINARY_DIR}/
${CMAKE_CURRENT_SOURCE_DIR}/src/crypto/randomx/tests/riscv64_zbb.s
COMPILE_DEFINITIONS "-march=rv64gc_zbb")
if (RANDOMX_ZBB_COMPILE_OK AND NOT RANDOMX_ZBB_RUN_FAIL)
if (RVARCH_ZBB)
set(RVARCH "${RVARCH}_zbb")
message(STATUS "RISC-V zbb extension detected")
endif()
if (RVARCH_ZVKB)
set(RVARCH "${RVARCH}_zvkb")
endif()
if (RVARCH_ZVKNED)
set(RVARCH "${RVARCH}_zvkned")
endif()
endif()

View File

@@ -86,10 +86,33 @@ if (WITH_RANDOMX)
src/crypto/randomx/jit_compiler_rv64_vector_static.S
src/crypto/randomx/jit_compiler_rv64.cpp
src/crypto/randomx/jit_compiler_rv64_vector.cpp
src/crypto/randomx/aes_hash_rv64_vector.cpp
src/crypto/randomx/aes_hash_rv64_zvkned.cpp
)
# cheat because cmake and ccache hate each other
set_property(SOURCE src/crypto/randomx/jit_compiler_rv64_static.S PROPERTY LANGUAGE C)
set_property(SOURCE src/crypto/randomx/jit_compiler_rv64_vector_static.S PROPERTY LANGUAGE C)
set(RV64_VECTOR_FILE_ARCH "rv64gcv")
if (ARCH STREQUAL "native")
if (RVARCH_ZICBOP)
set(RV64_VECTOR_FILE_ARCH "${RV64_VECTOR_FILE_ARCH}_zicbop")
endif()
if (RVARCH_ZBA)
set(RV64_VECTOR_FILE_ARCH "${RV64_VECTOR_FILE_ARCH}_zba")
endif()
if (RVARCH_ZBB)
set(RV64_VECTOR_FILE_ARCH "${RV64_VECTOR_FILE_ARCH}_zbb")
endif()
if (RVARCH_ZVKB)
set(RV64_VECTOR_FILE_ARCH "${RV64_VECTOR_FILE_ARCH}_zvkb")
endif()
endif()
set_source_files_properties(src/crypto/randomx/jit_compiler_rv64_vector_static.S PROPERTIES COMPILE_FLAGS "-march=${RV64_VECTOR_FILE_ARCH}_zvkned")
set_source_files_properties(src/crypto/randomx/aes_hash_rv64_vector.cpp PROPERTIES COMPILE_FLAGS "-O3 -march=${RV64_VECTOR_FILE_ARCH}")
set_source_files_properties(src/crypto/randomx/aes_hash_rv64_zvkned.cpp PROPERTIES COMPILE_FLAGS "-O3 -march=${RV64_VECTOR_FILE_ARCH}_zvkned")
else()
list(APPEND SOURCES_CRYPTO
src/crypto/randomx/jit_compiler_fallback.cpp
@@ -167,6 +190,15 @@ if (WITH_RANDOMX)
list(APPEND HEADERS_CRYPTO src/crypto/rx/Profiler.h)
list(APPEND SOURCES_CRYPTO src/crypto/rx/Profiler.cpp)
endif()
if (WITH_VAES)
set(SOURCES_CRYPTO "${SOURCES_CRYPTO}" src/crypto/randomx/aes_hash_vaes512.cpp)
if (CMAKE_C_COMPILER_ID MATCHES MSVC)
set_source_files_properties(src/crypto/randomx/aes_hash_vaes512.cpp PROPERTIES COMPILE_FLAGS "/arch:AVX512")
elseif (CMAKE_C_COMPILER_ID MATCHES GNU OR CMAKE_C_COMPILER_ID MATCHES Clang)
set_source_files_properties(src/crypto/randomx/aes_hash_vaes512.cpp PROPERTIES COMPILE_FLAGS "-mavx512f -mvaes")
endif()
endif()
else()
remove_definitions(/DXMRIG_ALGO_RANDOMX)
endif()

View File

@@ -1,8 +1,8 @@
# HTTP API
If you want use HTTP API you need enable it (`"enabled": true,`) then choice `port` and optionaly `host`. API not available if miner built without HTTP support (`-DWITH_HTTP=OFF`).
If you want use HTTP API you need enable it (`"enabled": true,`) then choice `port` and optionally `host`. API not available if miner built without HTTP support (`-DWITH_HTTP=OFF`).
Offical HTTP client for API: http://workers.xmrig.info/
Official HTTP client for API: http://workers.xmrig.info/
Example configuration:

View File

@@ -17,7 +17,7 @@ Double check that you see `Huge pages 100%` both for dataset and for all threads
### Benchmark with custom config
You can run benchmark with any configuration you want. Just start without command line parameteres, use regular config.json and add `"benchmark":"1M",` on the next line after pool url.
You can run benchmark with any configuration you want. Just start without command line parameters, use regular config.json and add `"benchmark":"1M",` on the next line after pool url.
# Stress test
@@ -26,4 +26,4 @@ You can also run continuous stress-test that is as close to the real RandomX min
xmrig --stress
xmrig --stress -a rx/wow
```
This will require Internet connection and will run indefinitely.
This will require Internet connection and will run indefinitely.

View File

@@ -57,7 +57,7 @@
# v4.0.0-beta
- [#1172](https://github.com/xmrig/xmrig/issues/1172) **Added OpenCL mining backend.**
- [#268](https://github.com/xmrig/xmrig-amd/pull/268) [#270](https://github.com/xmrig/xmrig-amd/pull/270) [#271](https://github.com/xmrig/xmrig-amd/pull/271) [#273](https://github.com/xmrig/xmrig-amd/pull/273) [#274](https://github.com/xmrig/xmrig-amd/pull/274) [#1171](https://github.com/xmrig/xmrig/pull/1171) Added RandomX support for OpenCL, thanks [@SChernykh](https://github.com/SChernykh).
- Algorithm `cn/wow` removed, as no longer alive.
- Algorithm `cn/wow` removed, as no longer alive.
# v3.2.0
- Added per pool option `coin` with single possible value `monero` for pools without algorithm negotiation, for upcoming Monero fork.
@@ -103,7 +103,7 @@
- [#1105](https://github.com/xmrig/xmrig/issues/1105) Improved auto configuration for `cn-pico` algorithm.
- Added commands `pause` and `resume` via JSON RPC 2.0 API (`POST /json_rpc`).
- Added command line option `--export-topology` for export hwloc topology to a XML file.
- Breaked backward compatibility with previous configs and command line, `variant` option replaced to `algo`, global option `algo` removed, all CPU related settings moved to `cpu` object.
- Broken backward compatibility with previous configs and command line, `variant` option replaced to `algo`, global option `algo` removed, all CPU related settings moved to `cpu` object.
- Options `av`, `safe` and `max-cpu-usage` removed.
- Algorithm `cn/msr` renamed to `cn/fast`.
- Algorithm `cn/xtl` removed.
@@ -122,7 +122,7 @@
- [#1092](https://github.com/xmrig/xmrig/issues/1092) Fixed crash if wrong CPU affinity used.
- [#1103](https://github.com/xmrig/xmrig/issues/1103) Improved auto configuration for RandomX for CPUs where L2 cache is limiting factor.
- [#1105](https://github.com/xmrig/xmrig/issues/1105) Improved auto configuration for `cn-pico` algorithm.
- [#1106](https://github.com/xmrig/xmrig/issues/1106) Fixed `hugepages` field in summary API.
- [#1106](https://github.com/xmrig/xmrig/issues/1106) Fixed `hugepages` field in summary API.
- Added alternative short format for CPU threads.
- Changed format for CPU threads with intensity above 1.
- Name for reference RandomX configuration changed to `rx/test` to avoid potential conflicts in future.
@@ -150,7 +150,7 @@
- [#1050](https://github.com/xmrig/xmrig/pull/1050) Added RandomXL algorithm for [Loki](https://loki.network/), algorithm name used by miner is `randomx/loki` or `rx/loki`.
- Added [flexible](https://github.com/xmrig/xmrig/blob/evo/doc/CPU.md) multi algorithm configuration.
- Added unlimited switching between incompatible algorithms, all mining options can be changed in runtime.
- Breaked backward compatibility with previous configs and command line, `variant` option replaced to `algo`, global option `algo` removed, all CPU related settings moved to `cpu` object.
- Broken backward compatibility with previous configs and command line, `variant` option replaced to `algo`, global option `algo` removed, all CPU related settings moved to `cpu` object.
- Options `av`, `safe` and `max-cpu-usage` removed.
- Algorithm `cn/msr` renamed to `cn/fast`.
- Algorithm `cn/xtl` removed.
@@ -183,7 +183,7 @@
- [#314](https://github.com/xmrig/xmrig-proxy/issues/314) Added donate over proxy feature.
- Added new option `donate-over-proxy`.
- Added real graceful exit.
# v2.14.4
- [#992](https://github.com/xmrig/xmrig/pull/992) Fixed compilation with Clang 3.5.
- [#1012](https://github.com/xmrig/xmrig/pull/1012) Fixed compilation with Clang 9.0.
@@ -250,7 +250,7 @@
# v2.8.1
- [#768](https://github.com/xmrig/xmrig/issues/768) Fixed build with Visual Studio 2015.
- [#769](https://github.com/xmrig/xmrig/issues/769) Fixed regression, some ANSI escape sequences was in log with disabled colors.
- [#777](https://github.com/xmrig/xmrig/issues/777) Better report about pool connection issues.
- [#777](https://github.com/xmrig/xmrig/issues/777) Better report about pool connection issues.
- Simplified checks for ASM auto detection, only AES support necessary.
- Added missing options to `--help` output.
@@ -259,7 +259,7 @@
- Added global and per thread option `"asm"` and command line equivalent.
- **[#758](https://github.com/xmrig/xmrig/issues/758) Added SSL/TLS support for secure connections to pools.**
- Added per pool options `"tls"` and `"tls-fingerprint"` and command line equivalents.
- [#767](https://github.com/xmrig/xmrig/issues/767) Added config autosave feature, same with GPU miners.
- [#767](https://github.com/xmrig/xmrig/issues/767) Added config autosave feature, same with GPU miners.
- [#245](https://github.com/xmrig/xmrig-proxy/issues/245) Fixed API ID collision when run multiple miners on same machine.
- [#757](https://github.com/xmrig/xmrig/issues/757) Fixed send buffer overflow.
@@ -346,7 +346,7 @@
# v2.4.4
- Added libmicrohttpd version to --version output.
- Fixed bug in singal handler, in some cases miner wasn't shutdown properly.
- Fixed bug in signal handler, in some cases miner wasn't shutdown properly.
- Fixed recent MSVC 2017 version detection.
- [#279](https://github.com/xmrig/xmrig/pull/279) Fixed build on some macOS versions.
@@ -359,7 +359,7 @@
# v2.4.2
- [#60](https://github.com/xmrig/xmrig/issues/60) Added FreeBSD support, thanks [vcambur](https://github.com/vcambur).
- [#153](https://github.com/xmrig/xmrig/issues/153) Fixed issues with dwarfpool.com.
# v2.4.1
- [#147](https://github.com/xmrig/xmrig/issues/147) Fixed comparability with monero-stratum.
@@ -371,7 +371,7 @@
- [#101](https://github.com/xmrig/xmrig/issues/101) Fixed MSVC 2017 (15.3) compile time version detection.
- [#108](https://github.com/xmrig/xmrig/issues/108) Silently ignore invalid values for `donate-level` option.
- [#111](https://github.com/xmrig/xmrig/issues/111) Fixed build without AEON support.
# v2.3.1
- [#68](https://github.com/xmrig/xmrig/issues/68) Fixed compatibility with Docker containers, was nothing print on console.
@@ -398,7 +398,7 @@
# v2.1.0
- [#40](https://github.com/xmrig/xmrig/issues/40)
Improved miner shutdown, fixed crash on exit for Linux and OS X.
- Fixed, login request was contain malformed JSON if username or password has some special characters for example `\`.
- Fixed, login request was contain malformed JSON if username or password has some special characters for example `\`.
- [#220](https://github.com/fireice-uk/xmr-stak-cpu/pull/220) Better support for Round Robin DNS, IP address now always chosen randomly instead of stuck on first one.
- Changed donation address, new [xmrig-proxy](https://github.com/xmrig/xmrig-proxy) is coming soon.
@@ -418,16 +418,16 @@ Improved miner shutdown, fixed crash on exit for Linux and OS X.
- Fixed Windows XP support.
- Fixed regression, option `--no-color` was not fully disable colored output.
- Show resolved pool IP address in miner output.
# v1.0.1
- Fix broken software AES implementation, app has crashed if CPU not support AES-NI, only version 1.0.0 affected.
# v1.0.0
- Miner complete rewritten in C++ with libuv.
- This version should be fully compatible (except config file) with previos versions, many new nice features will come in next versions.
- This is still beta. If you found regression, stability or perfomance issues or have an idea for new feature please fell free to open new [issue](https://github.com/xmrig/xmrig/issues/new).
- This version should be fully compatible (except config file) with previous versions, many new nice features will come in next versions.
- This is still beta. If you found regression, stability or performance issues or have an idea for new feature please fell free to open new [issue](https://github.com/xmrig/xmrig/issues/new).
- Added new option `--print-time=N`, print hashrate report every N seconds.
- New hashrate reports, by default every 60 secons.
- New hashrate reports, by default every 60 seconds.
- Added Microsoft Visual C++ 2015 and 2017 support.
- Removed dependency on libcurl.
- To compile this version from source please switch to [dev](https://github.com/xmrig/xmrig/tree/dev) branch.
@@ -440,7 +440,7 @@ Improved miner shutdown, fixed crash on exit for Linux and OS X.
- Fixed gcc 7.1 support.
# v0.8.1
- Added nicehash support, detects automaticaly by pool URL, for example `cryptonight.eu.nicehash.com:3355` or manually via option `--nicehash`.
- Added nicehash support, detects automatically by pool URL, for example `cryptonight.eu.nicehash.com:3355` or manually via option `--nicehash`.
# v0.8.0
- Added double hash mode, also known as lower power mode. `--av=2` and `--av=4`.

View File

@@ -124,7 +124,7 @@ Force enable (`true`) or disable (`false`) hardware AES support. Default value `
Mining threads priority, value from `1` (lowest priority) to `5` (highest possible priority). Default value `null` means miner don't change threads priority at all. Setting priority higher than 2 can make your PC unresponsive.
#### `memory-pool` (since v4.3.0)
Use continuous, persistent memory block for mining threads, useful for preserve huge pages allocation while algorithm switching. Possible values `false` (feature disabled, by default) or `true` or specific count of 2 MB huge pages. It helps to avoid loosing huge pages for scratchpads when RandomX dataset is updated and mining threads restart after a 2-3 days of mining.
Use continuous, persistent memory block for mining threads, useful for preserve huge pages allocation while algorithm switching. Possible values `false` (feature disabled, by default) or `true` or specific count of 2 MB huge pages. It helps to avoid losing huge pages for scratchpads when RandomX dataset is updated and mining threads restart after a 2-3 days of mining.
#### `yield` (since v5.1.1)
Prefer system better system response/stability `true` (default value) or maximum hashrate `false`.
@@ -133,7 +133,7 @@ Prefer system better system response/stability `true` (default value) or maximum
Enable/configure or disable ASM optimizations. Possible values: `true`, `false`, `"intel"`, `"ryzen"`, `"bulldozer"`.
#### `argon2-impl` (since v3.1.0)
Allow override automatically detected Argon2 implementation, this option added mostly for debug purposes, default value `null` means autodetect. This is used in RandomX dataset initialization and also in some other mining algorithms. Other possible values: `"x86_64"`, `"SSE2"`, `"SSSE3"`, `"XOP"`, `"AVX2"`, `"AVX-512F"`. Manual selection has no safe guards - if your CPU doesn't support required instuctions, miner will crash.
Allow override automatically detected Argon2 implementation, this option added mostly for debug purposes, default value `null` means autodetect. This is used in RandomX dataset initialization and also in some other mining algorithms. Other possible values: `"x86_64"`, `"SSE2"`, `"SSSE3"`, `"XOP"`, `"AVX2"`, `"AVX-512F"`. Manual selection has no safe guards - if your CPU doesn't support required instructions, miner will crash.
#### `astrobwt-max-size`
AstroBWT algorithm: skip hashes with large stage 2 size, default: `550`, min: `400`, max: `1200`. Optimal value depends on your CPU/GPU

View File

@@ -89,11 +89,16 @@ static void print_cpu(const Config *)
{
const auto info = Cpu::info();
Log::print(GREEN_BOLD(" * ") WHITE_BOLD("%-13s%s (%zu)") " %s %sAES%s",
Log::print(GREEN_BOLD(" * ") WHITE_BOLD("%-13s%s (%zu)") " %s %s%sAES%s",
"CPU",
info->brand(),
info->packages(),
ICpuInfo::is64bit() ? GREEN_BOLD("64-bit") : RED_BOLD("32-bit"),
#ifdef XMRIG_RISCV
info->hasRISCV_Vector() ? GREEN_BOLD_S "RVV " : RED_BOLD_S "-RVV ",
#else
"",
#endif
info->hasAES() ? GREEN_BOLD_S : RED_BOLD_S "-",
info->isVM() ? RED_BOLD_S " VM" : ""
);

View File

@@ -48,6 +48,24 @@ static const std::map<int, std::map<uint32_t, uint64_t> > hashCheck = {
{ 9000000U, 0x323935102AB6B45CULL },
{ 10000000U, 0xB5231262E2792B26ULL }
}},
{ Algorithm::RX_V2, {
# ifndef NDEBUG
{ 10000U, 0x57d2051d099613a4ULL },
{ 20000U, 0x0bae0155cc797f01ULL },
# endif
{ 250000U, 0x18cf741a71484072ULL },
{ 500000U, 0xcd8c3e6ec31b2faeULL },
{ 1000000U, 0x88d6b8fb70cd479dULL },
{ 2000000U, 0x0e16828d236a1a63ULL },
{ 3000000U, 0x2739bdd0f25b83a6ULL },
{ 4000000U, 0x32f42d9006d2d34bULL },
{ 5000000U, 0x16d9c6286cb82251ULL },
{ 6000000U, 0x1f916ae19d6bcf07ULL },
{ 7000000U, 0x1f474f99a873948fULL },
{ 8000000U, 0x8d67e0ddf05476bbULL },
{ 9000000U, 0x3ebf37dcd5c4a215ULL },
{ 10000000U, 0x7efbddff3f30fb74ULL }
}},
{ Algorithm::RX_WOW, {
# ifndef NDEBUG
{ 10000U, 0x6B0918757100B338ULL },
@@ -88,6 +106,24 @@ static const std::map<int, std::map<uint32_t, uint64_t> > hashCheck1T = {
{ 9000000U, 0xC6D39EF59213A07CULL },
{ 10000000U, 0x95E6BAE68DD779CDULL }
}},
{ Algorithm::RX_V2, {
# ifndef NDEBUG
{ 10000, 0x90eb7c07cd9e0d90ULL },
{ 20000, 0x6523a3658d7d9930ULL },
# endif
{ 250000, 0xf83b6d9d355ee5b1ULL },
{ 500000, 0xbea3c1bf1465e9abULL },
{ 1000000, 0x9e16f7cb56b366e1ULL },
{ 2000000, 0x3b5e671f47e15e55ULL },
{ 3000000, 0xec5819c180df03e2ULL },
{ 4000000, 0x19d31b498f86aad4ULL },
{ 5000000, 0x2487626c75cd12ccULL },
{ 6000000, 0xa323a25a5286c39aULL },
{ 7000000, 0xa123b100f3104dfcULL },
{ 8000000, 0x602db9d83bfa0ddcULL },
{ 9000000, 0x98da909e579765ddULL },
{ 10000000, 0x3a45b7247cec9895ULL }
}},
{ Algorithm::RX_WOW, {
# ifndef NDEBUG
{ 10000U, 0x9EC1B9B8C8C7F082ULL },

View File

@@ -256,7 +256,10 @@ void xmrig::CpuWorker<N>::start()
# ifdef XMRIG_ALGO_RANDOMX
bool first = true;
alignas(16) uint64_t tempHash[8] = {};
alignas(64) uint64_t tempHash[8] = {};
size_t prev_job_size = 0;
alignas(64) uint8_t prev_job[Job::kMaxBlobSize] = {};
# endif
while (!Nonce::isOutdated(Nonce::CPU, m_job.sequence())) {
@@ -297,6 +300,11 @@ void xmrig::CpuWorker<N>::start()
job.generateMinerSignature(m_job.blob(), job.size(), miner_signature_ptr);
}
randomx_calculate_hash_first(m_vm, tempHash, m_job.blob(), job.size());
if (RandomX_CurrentConfig.Tweak_V2_COMMITMENT) {
prev_job_size = job.size();
memcpy(prev_job, m_job.blob(), prev_job_size);
}
}
if (!nextRound()) {
@@ -307,7 +315,15 @@ void xmrig::CpuWorker<N>::start()
memcpy(miner_signature_saved, miner_signature_ptr, sizeof(miner_signature_saved));
job.generateMinerSignature(m_job.blob(), job.size(), miner_signature_ptr);
}
randomx_calculate_hash_next(m_vm, tempHash, m_job.blob(), job.size(), m_hash);
if (RandomX_CurrentConfig.Tweak_V2_COMMITMENT) {
memcpy(m_commitment, m_hash, RANDOMX_HASH_SIZE);
randomx_calculate_commitment(prev_job, prev_job_size, m_hash, m_hash);
prev_job_size = job.size();
memcpy(prev_job, m_job.blob(), prev_job_size);
}
}
else
# endif
@@ -347,8 +363,20 @@ void xmrig::CpuWorker<N>::start()
}
else
# endif
if (value < job.target()) {
JobResults::submit(job, current_job_nonces[i], m_hash + (i * 32), job.hasMinerSignature() ? miner_signature_saved : nullptr);
uint8_t* extra_data = nullptr;
if (job.algorithm().family() == Algorithm::RANDOM_X) {
if (RandomX_CurrentConfig.Tweak_V2_COMMITMENT) {
extra_data = m_commitment;
}
else if (job.hasMinerSignature()) {
extra_data = miner_signature_saved;
}
}
JobResults::submit(job, current_job_nonces[i], m_hash + (i * 32), extra_data);
}
}
m_count += N;

View File

@@ -83,6 +83,7 @@ private:
void allocateCnCtx();
void consumeJob();
alignas(8) uint8_t m_commitment[N * 32]{ 0 };
alignas(8) uint8_t m_hash[N * 32]{ 0 };
const Algorithm m_algorithm;
const Assembly m_assembly;

View File

@@ -85,6 +85,7 @@ public:
FLAG_POPCNT,
FLAG_CAT_L3,
FLAG_VM,
FLAG_RISCV_VECTOR,
FLAG_MAX
};
@@ -109,6 +110,7 @@ public:
virtual bool hasOneGbPages() const = 0;
virtual bool hasXOP() const = 0;
virtual bool isVM() const = 0;
virtual bool hasRISCV_Vector() const = 0;
virtual bool jccErratum() const = 0;
virtual const char *backend() const = 0;
virtual const char *brand() const = 0;

View File

@@ -58,8 +58,8 @@
namespace xmrig {
constexpr size_t kCpuFlagsSize = 15;
static const std::array<const char *, kCpuFlagsSize> flagNames = { "aes", "vaes", "avx", "avx2", "avx512f", "bmi2", "osxsave", "pdpe1gb", "sse2", "ssse3", "sse4.1", "xop", "popcnt", "cat_l3", "vm" };
constexpr size_t kCpuFlagsSize = 16;
static const std::array<const char *, kCpuFlagsSize> flagNames = { "aes", "vaes", "avx", "avx2", "avx512f", "bmi2", "osxsave", "pdpe1gb", "sse2", "ssse3", "sse4.1", "xop", "popcnt", "cat_l3", "vm", "rvv" };
static_assert(kCpuFlagsSize == ICpuInfo::FLAG_MAX, "kCpuFlagsSize and FLAG_MAX mismatch");

View File

@@ -52,6 +52,7 @@ protected:
inline bool hasOneGbPages() const override { return has(FLAG_PDPE1GB); }
inline bool hasXOP() const override { return has(FLAG_XOP); }
inline bool isVM() const override { return has(FLAG_VM); }
inline bool hasRISCV_Vector() const override { return has(FLAG_RISCV_VECTOR); }
inline bool jccErratum() const override { return m_jccErratum; }
inline const char *brand() const override { return m_brand; }
inline const std::vector<int32_t> &units() const override { return m_units; }

View File

@@ -34,7 +34,7 @@ namespace xmrig {
extern String cpu_name_riscv();
extern bool has_riscv_vector();
extern bool has_riscv_crypto();
extern bool has_riscv_aes();
} // namespace xmrig
@@ -55,8 +55,11 @@ xmrig::BasicCpuInfo::BasicCpuInfo() :
strncpy(m_brand, name.data(), sizeof(m_brand) - 1);
}
// Check for crypto extensions (Zknd/Zkne/Zknh - AES and SHA)
m_flags.set(FLAG_AES, has_riscv_crypto());
// Check for vector extensions
m_flags.set(FLAG_RISCV_VECTOR, has_riscv_vector());
// Check for AES extensions (Zknd/Zkne)
m_flags.set(FLAG_AES, has_riscv_aes());
// RISC-V typically supports 1GB huge pages
m_flags.set(FLAG_PDPE1GB, std::ifstream("/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages").good());

View File

@@ -32,9 +32,9 @@ struct riscv_cpu_desc
String isa;
String uarch;
bool has_vector = false;
bool has_crypto = false;
bool has_aes = false;
inline bool isReady() const { return !model.isNull(); }
inline bool isReady() const { return !isa.isNull(); }
};
static bool lookup_riscv(char *line, const char *pattern, String &value)
@@ -81,22 +81,32 @@ static bool read_riscv_cpuinfo(riscv_cpu_desc *desc)
lookup_riscv(buf, "model name", desc->model);
if (lookup_riscv(buf, "isa", desc->isa)) {
// Check for vector extensions
if (strstr(buf, "zve") || strstr(buf, "v_")) {
desc->has_vector = true;
}
// Check for crypto extensions (AES, SHA, etc.)
// zkn* = NIST crypto suite, zks* = SM crypto suite
// Note: zba/zbb/zbc/zbs are bit-manipulation, NOT crypto
if (strstr(buf, "zknd") || strstr(buf, "zkne") || strstr(buf, "zknh") ||
strstr(buf, "zksed") || strstr(buf, "zksh")) {
desc->has_crypto = true;
desc->isa.toLower();
for (const String& s : desc->isa.split('_')) {
const char* p = s.data();
const size_t n = s.size();
if ((s.size() > 4) && (memcmp(p, "rv64", 4) == 0)) {
for (size_t i = 4; i < n; ++i) {
if (p[i] == 'v') {
desc->has_vector = true;
break;
}
}
}
else if (s == "zve64d") {
desc->has_vector = true;
}
else if ((s == "zvkn") || (s == "zvknc") || (s == "zvkned") || (s == "zvkng")){
desc->has_aes = true;
}
}
}
lookup_riscv(buf, "uarch", desc->uarch);
if (desc->isReady() && !desc->isa.isNull()) {
if (desc->isReady()) {
break;
}
}
@@ -128,11 +138,11 @@ bool has_riscv_vector()
return false;
}
bool has_riscv_crypto()
bool has_riscv_aes()
{
riscv_cpu_desc desc;
if (read_riscv_cpuinfo(&desc)) {
return desc.has_crypto;
return desc.has_aes;
}
return false;
}

View File

@@ -19,6 +19,7 @@
#define ALGO_CN_PICO_TLO 0x63120274
#define ALGO_CN_UPX2 0x63110200
#define ALGO_RX_0 0x72151200
#define ALGO_RX_V2 0x72151202
#define ALGO_RX_WOW 0x72141177
#define ALGO_RX_ARQMA 0x72121061
#define ALGO_RX_SFX 0x72151273

View File

@@ -706,7 +706,7 @@ __kernel void cn2(__global uint4 *Scratchpad, __global ulong *states, __global u
}
# if (ALGO_FAMILY == FAMILY_CN_HEAVY)
/* Also left over threads performe this loop.
/* Also left over threads perform this loop.
* The left over thread results will be ignored
*/
#pragma unroll 16
@@ -1005,7 +1005,7 @@ __kernel void Groestl(__global ulong *states, __global uint *BranchBuf, __global
ulong State[8] = { 0UL, 0UL, 0UL, 0UL, 0UL, 0UL, 0UL, 0x0001000000000000UL };
ulong H[8], M[8];
// BUG: AMD driver 19.7.X crashs if this is written as loop
// BUG: AMD driver 19.7.X crashes if this is written as loop
// Thx AMD for so bad software
{
((ulong8 *)M)[0] = vload8(0, states);

File diff suppressed because it is too large Load Diff

View File

@@ -10,7 +10,7 @@
#else
# define STATIC
/* taken from https://www.khronos.org/registry/OpenCL/extensions/amd/cl_amd_media_ops.txt
* Build-in Function
* Built-in Function
* uintn amd_bitalign (uintn src0, uintn src1, uintn src2)
* Description
* dst.s0 = (uint) (((((long)src0.s0) << 32) | (long)src1.s0) >> (src2.s0 & 31))

View File

@@ -74,10 +74,10 @@ void keccak_f800_round(uint32_t st[25], const int r)
// Keccak - implemented as a variant of SHAKE
// The width is 800, with a bitrate of 576, a capacity of 224, and no padding
// Only need 64 bits of output for mining
void keccak_f800(uint32_t* st)
void keccak_f800(uint32_t st[25])
{
// Complete all 22 rounds as a separate impl to
// evaluate only first 8 words is wasteful of regsters
// evaluate only first 8 words is wasteful of registers
for (int r = 0; r < 22; r++) {
keccak_f800_round(st, r);
}
@@ -181,7 +181,7 @@ __kernel void progpow_search(__global dag_t const* g_dag, __global uint* job_blo
for (int i = 10; i < 25; i++)
state[i] = ravencoin_rndc[i-10];
// Run intial keccak round
// Run initial keccak round
keccak_f800(state);
for (int i = 0; i < 8; i++)

View File

@@ -2,7 +2,7 @@
namespace xmrig {
static const char kawpow_cl[5944] = {
static const char kawpow_cl[5947] = {
0x23,0x69,0x66,0x64,0x65,0x66,0x20,0x63,0x6c,0x5f,0x63,0x6c,0x61,0x6e,0x67,0x5f,0x73,0x74,0x6f,0x72,0x61,0x67,0x65,0x5f,0x63,0x6c,0x61,0x73,0x73,0x5f,0x73,0x70,
0x65,0x63,0x69,0x66,0x69,0x65,0x72,0x73,0x0a,0x23,0x70,0x72,0x61,0x67,0x6d,0x61,0x20,0x4f,0x50,0x45,0x4e,0x43,0x4c,0x20,0x45,0x58,0x54,0x45,0x4e,0x53,0x49,0x4f,
0x4e,0x20,0x63,0x6c,0x5f,0x63,0x6c,0x61,0x6e,0x67,0x5f,0x73,0x74,0x6f,0x72,0x61,0x67,0x65,0x5f,0x63,0x6c,0x61,0x73,0x73,0x5f,0x73,0x70,0x65,0x63,0x69,0x66,0x69,
@@ -77,118 +77,118 @@ static const char kawpow_cl[5944] = {
0x29,0x0a,0x73,0x74,0x5b,0x6a,0x2b,0x69,0x5d,0x20,0x5e,0x3d,0x20,0x28,0x7e,0x62,0x63,0x5b,0x28,0x69,0x2b,0x31,0x29,0x20,0x25,0x20,0x35,0x5d,0x29,0x26,0x62,0x63,
0x5b,0x28,0x69,0x2b,0x32,0x29,0x20,0x25,0x20,0x35,0x5d,0x3b,0x0a,0x7d,0x0a,0x73,0x74,0x5b,0x30,0x5d,0x20,0x5e,0x3d,0x20,0x6b,0x65,0x63,0x63,0x61,0x6b,0x66,0x5f,
0x72,0x6e,0x64,0x63,0x5b,0x72,0x5d,0x3b,0x0a,0x7d,0x0a,0x76,0x6f,0x69,0x64,0x20,0x6b,0x65,0x63,0x63,0x61,0x6b,0x5f,0x66,0x38,0x30,0x30,0x28,0x75,0x69,0x6e,0x74,
0x33,0x32,0x5f,0x74,0x2a,0x20,0x73,0x74,0x29,0x0a,0x7b,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x72,0x3d,0x30,0x3b,0x20,0x72,0x3c,0x32,0x32,0x3b,0x20,
0x72,0x2b,0x2b,0x29,0x20,0x7b,0x0a,0x6b,0x65,0x63,0x63,0x61,0x6b,0x5f,0x66,0x38,0x30,0x30,0x5f,0x72,0x6f,0x75,0x6e,0x64,0x28,0x73,0x74,0x2c,0x72,0x29,0x3b,0x0a,
0x7d,0x0a,0x7d,0x0a,0x23,0x64,0x65,0x66,0x69,0x6e,0x65,0x20,0x66,0x6e,0x76,0x31,0x61,0x28,0x68,0x2c,0x20,0x64,0x29,0x20,0x28,0x68,0x20,0x3d,0x20,0x28,0x68,0x20,
0x5e,0x20,0x64,0x29,0x20,0x2a,0x20,0x46,0x4e,0x56,0x5f,0x50,0x52,0x49,0x4d,0x45,0x29,0x0a,0x74,0x79,0x70,0x65,0x64,0x65,0x66,0x20,0x73,0x74,0x72,0x75,0x63,0x74,
0x0a,0x7b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x7a,0x2c,0x77,0x2c,0x6a,0x73,0x72,0x2c,0x6a,0x63,0x6f,0x6e,0x67,0x3b,0x0a,0x7d,0x20,0x6b,0x69,0x73,
0x73,0x39,0x39,0x5f,0x74,0x3b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x6b,0x69,0x73,0x73,0x39,0x39,0x28,0x6b,0x69,0x73,0x73,0x39,0x39,0x5f,0x74,0x2a,
0x20,0x73,0x74,0x29,0x0a,0x7b,0x0a,0x73,0x74,0x2d,0x3e,0x7a,0x3d,0x33,0x36,0x39,0x36,0x39,0x2a,0x28,0x73,0x74,0x2d,0x3e,0x7a,0x26,0x36,0x35,0x35,0x33,0x35,0x29,
0x2b,0x28,0x73,0x74,0x2d,0x3e,0x7a,0x3e,0x3e,0x31,0x36,0x29,0x3b,0x0a,0x73,0x74,0x2d,0x3e,0x77,0x3d,0x31,0x38,0x30,0x30,0x30,0x2a,0x28,0x73,0x74,0x2d,0x3e,0x77,
0x26,0x36,0x35,0x35,0x33,0x35,0x29,0x2b,0x28,0x73,0x74,0x2d,0x3e,0x77,0x3e,0x3e,0x31,0x36,0x29,0x3b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x4d,0x57,
0x43,0x3d,0x28,0x28,0x73,0x74,0x2d,0x3e,0x7a,0x3c,0x3c,0x31,0x36,0x29,0x2b,0x73,0x74,0x2d,0x3e,0x77,0x29,0x3b,0x0a,0x73,0x74,0x2d,0x3e,0x6a,0x73,0x72,0x20,0x5e,
0x3d,0x20,0x28,0x73,0x74,0x2d,0x3e,0x6a,0x73,0x72,0x3c,0x3c,0x31,0x37,0x29,0x3b,0x0a,0x73,0x74,0x2d,0x3e,0x6a,0x73,0x72,0x20,0x5e,0x3d,0x20,0x28,0x73,0x74,0x2d,
0x3e,0x6a,0x73,0x72,0x3e,0x3e,0x31,0x33,0x29,0x3b,0x0a,0x73,0x74,0x2d,0x3e,0x6a,0x73,0x72,0x20,0x5e,0x3d,0x20,0x28,0x73,0x74,0x2d,0x3e,0x6a,0x73,0x72,0x3c,0x3c,
0x35,0x29,0x3b,0x0a,0x73,0x74,0x2d,0x3e,0x6a,0x63,0x6f,0x6e,0x67,0x3d,0x36,0x39,0x30,0x36,0x39,0x2a,0x73,0x74,0x2d,0x3e,0x6a,0x63,0x6f,0x6e,0x67,0x2b,0x31,0x32,
0x33,0x34,0x35,0x36,0x37,0x3b,0x0a,0x72,0x65,0x74,0x75,0x72,0x6e,0x20,0x28,0x28,0x4d,0x57,0x43,0x5e,0x73,0x74,0x2d,0x3e,0x6a,0x63,0x6f,0x6e,0x67,0x29,0x2b,0x73,
0x74,0x2d,0x3e,0x6a,0x73,0x72,0x29,0x3b,0x0a,0x7d,0x0a,0x76,0x6f,0x69,0x64,0x20,0x66,0x69,0x6c,0x6c,0x5f,0x6d,0x69,0x78,0x28,0x6c,0x6f,0x63,0x61,0x6c,0x20,0x75,
0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x2a,0x20,0x73,0x65,0x65,0x64,0x2c,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x2c,0x75,
0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x2a,0x20,0x6d,0x69,0x78,0x29,0x0a,0x7b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x66,0x6e,0x76,0x5f,0x68,0x61,0x73,
0x68,0x3d,0x46,0x4e,0x56,0x5f,0x4f,0x46,0x46,0x53,0x45,0x54,0x5f,0x42,0x41,0x53,0x49,0x53,0x3b,0x0a,0x6b,0x69,0x73,0x73,0x39,0x39,0x5f,0x74,0x20,0x73,0x74,0x3b,
0x0a,0x73,0x74,0x2e,0x7a,0x3d,0x66,0x6e,0x76,0x31,0x61,0x28,0x66,0x6e,0x76,0x5f,0x68,0x61,0x73,0x68,0x2c,0x73,0x65,0x65,0x64,0x5b,0x30,0x5d,0x29,0x3b,0x0a,0x73,
0x74,0x2e,0x77,0x3d,0x66,0x6e,0x76,0x31,0x61,0x28,0x66,0x6e,0x76,0x5f,0x68,0x61,0x73,0x68,0x2c,0x73,0x65,0x65,0x64,0x5b,0x31,0x5d,0x29,0x3b,0x0a,0x73,0x74,0x2e,
0x6a,0x73,0x72,0x3d,0x66,0x6e,0x76,0x31,0x61,0x28,0x66,0x6e,0x76,0x5f,0x68,0x61,0x73,0x68,0x2c,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x73,0x74,0x2e,
0x6a,0x63,0x6f,0x6e,0x67,0x3d,0x66,0x6e,0x76,0x31,0x61,0x28,0x66,0x6e,0x76,0x5f,0x68,0x61,0x73,0x68,0x2c,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x23,
0x70,0x72,0x61,0x67,0x6d,0x61,0x20,0x75,0x6e,0x72,0x6f,0x6c,0x6c,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,0x50,0x52,
0x4f,0x47,0x50,0x4f,0x57,0x5f,0x52,0x45,0x47,0x53,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x6d,0x69,0x78,0x5b,0x69,0x5d,0x3d,0x6b,0x69,0x73,0x73,0x39,0x39,0x28,0x26,
0x73,0x74,0x29,0x3b,0x0a,0x7d,0x0a,0x74,0x79,0x70,0x65,0x64,0x65,0x66,0x20,0x73,0x74,0x72,0x75,0x63,0x74,0x0a,0x7b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,
0x20,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x4c,0x41,0x4e,0x45,0x53,0x5d,0x3b,0x0a,0x7d,0x20,0x73,0x68,0x75,0x66,0x66,
0x6c,0x65,0x5f,0x74,0x3b,0x0a,0x74,0x79,0x70,0x65,0x64,0x65,0x66,0x20,0x73,0x74,0x72,0x75,0x63,0x74,0x0a,0x7b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,
0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x33,0x32,0x2f,0x73,0x69,0x7a,0x65,0x6f,0x66,0x28,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x29,0x5d,0x3b,0x0a,0x7d,0x20,
0x68,0x61,0x73,0x68,0x33,0x32,0x5f,0x74,0x3b,0x0a,0x23,0x69,0x66,0x20,0x50,0x4c,0x41,0x54,0x46,0x4f,0x52,0x4d,0x20,0x21,0x3d,0x20,0x4f,0x50,0x45,0x4e,0x43,0x4c,
0x5f,0x50,0x4c,0x41,0x54,0x46,0x4f,0x52,0x4d,0x5f,0x4e,0x56,0x49,0x44,0x49,0x41,0x20,0x0a,0x5f,0x5f,0x61,0x74,0x74,0x72,0x69,0x62,0x75,0x74,0x65,0x5f,0x5f,0x28,
0x28,0x72,0x65,0x71,0x64,0x5f,0x77,0x6f,0x72,0x6b,0x5f,0x67,0x72,0x6f,0x75,0x70,0x5f,0x73,0x69,0x7a,0x65,0x28,0x47,0x52,0x4f,0x55,0x50,0x5f,0x53,0x49,0x5a,0x45,
0x2c,0x31,0x2c,0x31,0x29,0x29,0x29,0x0a,0x23,0x65,0x6e,0x64,0x69,0x66,0x0a,0x5f,0x5f,0x6b,0x65,0x72,0x6e,0x65,0x6c,0x20,0x76,0x6f,0x69,0x64,0x20,0x70,0x72,0x6f,
0x67,0x70,0x6f,0x77,0x5f,0x73,0x65,0x61,0x72,0x63,0x68,0x28,0x5f,0x5f,0x67,0x6c,0x6f,0x62,0x61,0x6c,0x20,0x64,0x61,0x67,0x5f,0x74,0x20,0x63,0x6f,0x6e,0x73,0x74,
0x2a,0x20,0x67,0x5f,0x64,0x61,0x67,0x2c,0x5f,0x5f,0x67,0x6c,0x6f,0x62,0x61,0x6c,0x20,0x75,0x69,0x6e,0x74,0x2a,0x20,0x6a,0x6f,0x62,0x5f,0x62,0x6c,0x6f,0x62,0x2c,
0x75,0x6c,0x6f,0x6e,0x67,0x20,0x74,0x61,0x72,0x67,0x65,0x74,0x2c,0x75,0x69,0x6e,0x74,0x20,0x68,0x61,0x63,0x6b,0x5f,0x66,0x61,0x6c,0x73,0x65,0x2c,0x76,0x6f,0x6c,
0x61,0x74,0x69,0x6c,0x65,0x20,0x5f,0x5f,0x67,0x6c,0x6f,0x62,0x61,0x6c,0x20,0x75,0x69,0x6e,0x74,0x2a,0x20,0x72,0x65,0x73,0x75,0x6c,0x74,0x73,0x2c,0x76,0x6f,0x6c,
0x61,0x74,0x69,0x6c,0x65,0x20,0x5f,0x5f,0x67,0x6c,0x6f,0x62,0x61,0x6c,0x20,0x75,0x69,0x6e,0x74,0x2a,0x20,0x73,0x74,0x6f,0x70,0x29,0x0a,0x7b,0x0a,0x63,0x6f,0x6e,
0x73,0x74,0x20,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x6c,0x69,0x64,0x3d,0x67,0x65,0x74,0x5f,0x6c,0x6f,0x63,0x61,0x6c,0x5f,0x69,0x64,0x28,0x30,0x29,0x3b,
0x0a,0x63,0x6f,0x6e,0x73,0x74,0x20,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x67,0x69,0x64,0x3d,0x67,0x65,0x74,0x5f,0x67,0x6c,0x6f,0x62,0x61,0x6c,0x5f,0x69,
0x64,0x28,0x30,0x29,0x3b,0x0a,0x69,0x66,0x28,0x73,0x74,0x6f,0x70,0x5b,0x30,0x5d,0x29,0x20,0x7b,0x0a,0x69,0x66,0x28,0x6c,0x69,0x64,0x3d,0x3d,0x30,0x29,0x20,0x7b,
0x0a,0x61,0x74,0x6f,0x6d,0x69,0x63,0x5f,0x69,0x6e,0x63,0x28,0x73,0x74,0x6f,0x70,0x2b,0x31,0x29,0x3b,0x0a,0x7d,0x0a,0x72,0x65,0x74,0x75,0x72,0x6e,0x3b,0x0a,0x7d,
0x0a,0x5f,0x5f,0x6c,0x6f,0x63,0x61,0x6c,0x20,0x73,0x68,0x75,0x66,0x66,0x6c,0x65,0x5f,0x74,0x20,0x73,0x68,0x61,0x72,0x65,0x5b,0x48,0x41,0x53,0x48,0x45,0x53,0x5f,
0x50,0x45,0x52,0x5f,0x47,0x52,0x4f,0x55,0x50,0x5d,0x3b,0x0a,0x5f,0x5f,0x6c,0x6f,0x63,0x61,0x6c,0x20,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x63,0x5f,0x64,
0x61,0x67,0x5b,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x43,0x41,0x43,0x48,0x45,0x5f,0x57,0x4f,0x52,0x44,0x53,0x5d,0x3b,0x0a,0x63,0x6f,0x6e,0x73,0x74,0x20,0x75,
0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x3d,0x6c,0x69,0x64,0x26,0x28,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x4c,0x41,0x4e,
0x45,0x53,0x2d,0x31,0x29,0x3b,0x0a,0x63,0x6f,0x6e,0x73,0x74,0x20,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x67,0x72,0x6f,0x75,0x70,0x5f,0x69,0x64,0x3d,0x6c,
0x69,0x64,0x2f,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x4c,0x41,0x4e,0x45,0x53,0x3b,0x0a,0x66,0x6f,0x72,0x20,0x28,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,
0x77,0x6f,0x72,0x64,0x3d,0x6c,0x69,0x64,0x2a,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x44,0x41,0x47,0x5f,0x4c,0x4f,0x41,0x44,0x53,0x3b,0x20,0x77,0x6f,0x72,0x64,
0x3c,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x43,0x41,0x43,0x48,0x45,0x5f,0x57,0x4f,0x52,0x44,0x53,0x3b,0x20,0x77,0x6f,0x72,0x64,0x2b,0x3d,0x47,0x52,0x4f,0x55,
0x50,0x5f,0x53,0x49,0x5a,0x45,0x2a,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x44,0x41,0x47,0x5f,0x4c,0x4f,0x41,0x44,0x53,0x29,0x0a,0x7b,0x0a,0x64,0x61,0x67,0x5f,
0x74,0x20,0x6c,0x6f,0x61,0x64,0x3d,0x67,0x5f,0x64,0x61,0x67,0x5b,0x77,0x6f,0x72,0x64,0x2f,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x44,0x41,0x47,0x5f,0x4c,0x4f,
0x41,0x44,0x53,0x5d,0x3b,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x44,0x41,
0x47,0x5f,0x4c,0x4f,0x41,0x44,0x53,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x63,0x5f,0x64,0x61,0x67,0x5b,0x77,0x6f,0x72,0x64,0x2b,0x69,0x5d,0x3d,0x6c,0x6f,0x61,0x64,
0x2e,0x73,0x5b,0x69,0x5d,0x3b,0x0a,0x7d,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x68,0x61,0x73,0x68,0x5f,0x73,0x65,0x65,0x64,0x5b,0x32,0x5d,0x3b,0x20,
0x0a,0x68,0x61,0x73,0x68,0x33,0x32,0x5f,0x74,0x20,0x64,0x69,0x67,0x65,0x73,0x74,0x3b,0x20,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x73,0x74,0x61,0x74,
0x65,0x32,0x5b,0x38,0x5d,0x3b,0x0a,0x7b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x73,0x74,0x61,0x74,0x65,0x5b,0x32,0x35,0x5d,0x3b,0x20,0x0a,0x66,0x6f,
0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,0x31,0x30,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x73,0x74,0x61,0x74,0x65,0x5b,0x69,0x5d,0x3d,
0x6a,0x6f,0x62,0x5f,0x62,0x6c,0x6f,0x62,0x5b,0x69,0x5d,0x3b,0x0a,0x73,0x74,0x61,0x74,0x65,0x5b,0x38,0x5d,0x3d,0x67,0x69,0x64,0x3b,0x0a,0x66,0x6f,0x72,0x20,0x28,
0x69,0x6e,0x74,0x20,0x69,0x3d,0x31,0x30,0x3b,0x20,0x69,0x3c,0x32,0x35,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x73,0x74,0x61,0x74,0x65,0x5b,0x69,0x5d,0x3d,0x72,0x61,
0x76,0x65,0x6e,0x63,0x6f,0x69,0x6e,0x5f,0x72,0x6e,0x64,0x63,0x5b,0x69,0x2d,0x31,0x30,0x5d,0x3b,0x0a,0x6b,0x65,0x63,0x63,0x61,0x6b,0x5f,0x66,0x38,0x30,0x30,0x28,
0x73,0x74,0x61,0x74,0x65,0x29,0x3b,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,0x38,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,
0x73,0x74,0x61,0x74,0x65,0x32,0x5b,0x69,0x5d,0x3d,0x73,0x74,0x61,0x74,0x65,0x5b,0x69,0x5d,0x3b,0x0a,0x7d,0x0a,0x23,0x70,0x72,0x61,0x67,0x6d,0x61,0x20,0x75,0x6e,
0x72,0x6f,0x6c,0x6c,0x20,0x31,0x0a,0x66,0x6f,0x72,0x20,0x28,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x68,0x3d,0x30,0x3b,0x20,0x68,0x3c,0x50,0x52,0x4f,0x47,
0x50,0x4f,0x57,0x5f,0x4c,0x41,0x4e,0x45,0x53,0x3b,0x20,0x68,0x2b,0x2b,0x29,0x0a,0x7b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x6d,0x69,0x78,0x5b,0x50,
0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x52,0x45,0x47,0x53,0x5d,0x3b,0x0a,0x69,0x66,0x28,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x3d,0x3d,0x68,0x29,0x20,0x7b,0x0a,0x73,
0x68,0x61,0x72,0x65,0x5b,0x67,0x72,0x6f,0x75,0x70,0x5f,0x69,0x64,0x5d,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x30,0x5d,0x3d,0x73,0x74,0x61,0x74,0x65,0x32,
0x5b,0x30,0x5d,0x3b,0x0a,0x73,0x68,0x61,0x72,0x65,0x5b,0x67,0x72,0x6f,0x75,0x70,0x5f,0x69,0x64,0x5d,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x31,0x5d,0x3d,
0x73,0x74,0x61,0x74,0x65,0x32,0x5b,0x31,0x5d,0x3b,0x0a,0x7d,0x0a,0x62,0x61,0x72,0x72,0x69,0x65,0x72,0x28,0x43,0x4c,0x4b,0x5f,0x4c,0x4f,0x43,0x41,0x4c,0x5f,0x4d,
0x45,0x4d,0x5f,0x46,0x45,0x4e,0x43,0x45,0x29,0x3b,0x0a,0x66,0x69,0x6c,0x6c,0x5f,0x6d,0x69,0x78,0x28,0x73,0x68,0x61,0x72,0x65,0x5b,0x67,0x72,0x6f,0x75,0x70,0x5f,
0x69,0x64,0x5d,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x2c,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x2c,0x6d,0x69,0x78,0x29,0x3b,0x0a,0x23,0x70,0x72,0x61,0x67,0x6d,
0x61,0x20,0x75,0x6e,0x72,0x6f,0x6c,0x6c,0x20,0x32,0x0a,0x66,0x6f,0x72,0x20,0x28,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x6c,0x6f,0x6f,0x70,0x3d,0x30,0x3b,
0x20,0x6c,0x6f,0x6f,0x70,0x3c,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x43,0x4e,0x54,0x5f,0x44,0x41,0x47,0x3b,0x20,0x2b,0x2b,0x6c,0x6f,0x6f,0x70,0x29,0x0a,0x7b,
0x0a,0x69,0x66,0x28,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x3d,0x3d,0x28,0x6c,0x6f,0x6f,0x70,0x20,0x25,0x20,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x4c,0x41,0x4e,
0x45,0x53,0x29,0x29,0x0a,0x73,0x68,0x61,0x72,0x65,0x5b,0x30,0x5d,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x67,0x72,0x6f,0x75,0x70,0x5f,0x69,0x64,0x5d,0x3d,
0x6d,0x69,0x78,0x5b,0x30,0x5d,0x3b,0x0a,0x62,0x61,0x72,0x72,0x69,0x65,0x72,0x28,0x43,0x4c,0x4b,0x5f,0x4c,0x4f,0x43,0x41,0x4c,0x5f,0x4d,0x45,0x4d,0x5f,0x46,0x45,
0x4e,0x43,0x45,0x29,0x3b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x6f,0x66,0x66,0x73,0x65,0x74,0x3d,0x73,0x68,0x61,0x72,0x65,0x5b,0x30,0x5d,0x2e,0x75,
0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x67,0x72,0x6f,0x75,0x70,0x5f,0x69,0x64,0x5d,0x3b,0x0a,0x6f,0x66,0x66,0x73,0x65,0x74,0x20,0x25,0x3d,0x20,0x50,0x52,0x4f,0x47,
0x50,0x4f,0x57,0x5f,0x44,0x41,0x47,0x5f,0x45,0x4c,0x45,0x4d,0x45,0x4e,0x54,0x53,0x3b,0x0a,0x6f,0x66,0x66,0x73,0x65,0x74,0x3d,0x6f,0x66,0x66,0x73,0x65,0x74,0x2a,
0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x4c,0x41,0x4e,0x45,0x53,0x2b,0x28,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x5e,0x6c,0x6f,0x6f,0x70,0x29,0x20,0x25,0x20,0x50,
0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x4c,0x41,0x4e,0x45,0x53,0x3b,0x0a,0x64,0x61,0x67,0x5f,0x74,0x20,0x64,0x61,0x74,0x61,0x5f,0x64,0x61,0x67,0x3d,0x67,0x5f,0x64,
0x61,0x67,0x5b,0x6f,0x66,0x66,0x73,0x65,0x74,0x5d,0x3b,0x0a,0x69,0x66,0x28,0x68,0x61,0x63,0x6b,0x5f,0x66,0x61,0x6c,0x73,0x65,0x29,0x20,0x62,0x61,0x72,0x72,0x69,
0x65,0x72,0x28,0x43,0x4c,0x4b,0x5f,0x4c,0x4f,0x43,0x41,0x4c,0x5f,0x4d,0x45,0x4d,0x5f,0x46,0x45,0x4e,0x43,0x45,0x29,0x3b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,
0x74,0x20,0x64,0x61,0x74,0x61,0x3b,0x0a,0x58,0x4d,0x52,0x49,0x47,0x5f,0x49,0x4e,0x43,0x4c,0x55,0x44,0x45,0x5f,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x52,0x41,
0x4e,0x44,0x4f,0x4d,0x5f,0x4d,0x41,0x54,0x48,0x0a,0x69,0x66,0x28,0x68,0x61,0x63,0x6b,0x5f,0x66,0x61,0x6c,0x73,0x65,0x29,0x20,0x62,0x61,0x72,0x72,0x69,0x65,0x72,
0x28,0x43,0x4c,0x4b,0x5f,0x4c,0x4f,0x43,0x41,0x4c,0x5f,0x4d,0x45,0x4d,0x5f,0x46,0x45,0x4e,0x43,0x45,0x29,0x3b,0x0a,0x58,0x4d,0x52,0x49,0x47,0x5f,0x49,0x4e,0x43,
0x4c,0x55,0x44,0x45,0x5f,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x44,0x41,0x54,0x41,0x5f,0x4c,0x4f,0x41,0x44,0x53,0x0a,0x7d,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,
0x5f,0x74,0x20,0x6d,0x69,0x78,0x5f,0x68,0x61,0x73,0x68,0x3d,0x46,0x4e,0x56,0x5f,0x4f,0x46,0x46,0x53,0x45,0x54,0x5f,0x42,0x41,0x53,0x49,0x53,0x3b,0x0a,0x23,0x70,
0x72,0x61,0x67,0x6d,0x61,0x20,0x75,0x6e,0x72,0x6f,0x6c,0x6c,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,0x50,0x52,0x4f,
0x47,0x50,0x4f,0x57,0x5f,0x52,0x45,0x47,0x53,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x66,0x6e,0x76,0x31,0x61,0x28,0x6d,0x69,0x78,0x5f,0x68,0x61,0x73,0x68,0x2c,0x6d,
0x69,0x78,0x5b,0x69,0x5d,0x29,0x3b,0x0a,0x68,0x61,0x73,0x68,0x33,0x32,0x5f,0x74,0x20,0x64,0x69,0x67,0x65,0x73,0x74,0x5f,0x74,0x65,0x6d,0x70,0x3b,0x0a,0x66,0x6f,
0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,0x38,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x64,0x69,0x67,0x65,0x73,0x74,0x5f,0x74,0x65,0x6d,
0x70,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x69,0x5d,0x3d,0x46,0x4e,0x56,0x5f,0x4f,0x46,0x46,0x53,0x45,0x54,0x5f,0x42,0x41,0x53,0x49,0x53,0x3b,0x0a,0x73,
0x68,0x61,0x72,0x65,0x5b,0x67,0x72,0x6f,0x75,0x70,0x5f,0x69,0x64,0x5d,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x5d,0x3d,
0x6d,0x69,0x78,0x5f,0x68,0x61,0x73,0x68,0x3b,0x0a,0x62,0x61,0x72,0x72,0x69,0x65,0x72,0x28,0x43,0x4c,0x4b,0x5f,0x4c,0x4f,0x43,0x41,0x4c,0x5f,0x4d,0x45,0x4d,0x5f,
0x46,0x45,0x4e,0x43,0x45,0x29,0x3b,0x0a,0x23,0x70,0x72,0x61,0x67,0x6d,0x61,0x20,0x75,0x6e,0x72,0x6f,0x6c,0x6c,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,
0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x4c,0x41,0x4e,0x45,0x53,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x66,0x6e,0x76,0x31,0x61,
0x28,0x64,0x69,0x67,0x65,0x73,0x74,0x5f,0x74,0x65,0x6d,0x70,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x69,0x20,0x25,0x20,0x38,0x5d,0x2c,0x73,0x68,0x61,0x72,
0x65,0x5b,0x67,0x72,0x6f,0x75,0x70,0x5f,0x69,0x64,0x5d,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x69,0x5d,0x29,0x3b,0x0a,0x69,0x66,0x28,0x68,0x3d,0x3d,0x6c,
0x61,0x6e,0x65,0x5f,0x69,0x64,0x29,0x0a,0x64,0x69,0x67,0x65,0x73,0x74,0x3d,0x64,0x69,0x67,0x65,0x73,0x74,0x5f,0x74,0x65,0x6d,0x70,0x3b,0x0a,0x7d,0x0a,0x75,0x69,
0x6e,0x74,0x36,0x34,0x5f,0x74,0x20,0x72,0x65,0x73,0x75,0x6c,0x74,0x3b,0x0a,0x7b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x73,0x74,0x61,0x74,0x65,0x5b,
0x32,0x35,0x5d,0x3d,0x7b,0x30,0x78,0x30,0x7d,0x3b,0x20,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,0x38,0x3b,0x20,0x69,
0x2b,0x2b,0x29,0x0a,0x73,0x74,0x61,0x74,0x65,0x5b,0x69,0x5d,0x3d,0x73,0x74,0x61,0x74,0x65,0x32,0x5b,0x69,0x5d,0x3b,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,
0x20,0x69,0x3d,0x38,0x3b,0x20,0x69,0x3c,0x31,0x36,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x73,0x74,0x61,0x74,0x65,0x5b,0x69,0x5d,0x3d,0x64,0x69,0x67,0x65,0x73,0x74,
0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x69,0x2d,0x38,0x5d,0x3b,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x31,0x36,0x3b,0x20,0x69,0x3c,
0x32,0x35,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x73,0x74,0x61,0x74,0x65,0x5b,0x69,0x5d,0x3d,0x72,0x61,0x76,0x65,0x6e,0x63,0x6f,0x69,0x6e,0x5f,0x72,0x6e,0x64,0x63,
0x5b,0x69,0x2d,0x31,0x36,0x5d,0x3b,0x0a,0x6b,0x65,0x63,0x63,0x61,0x6b,0x5f,0x66,0x38,0x30,0x30,0x28,0x73,0x74,0x61,0x74,0x65,0x29,0x3b,0x0a,0x75,0x69,0x6e,0x74,
0x36,0x34,0x5f,0x74,0x20,0x72,0x65,0x73,0x3d,0x28,0x75,0x69,0x6e,0x74,0x36,0x34,0x5f,0x74,0x29,0x73,0x74,0x61,0x74,0x65,0x5b,0x31,0x5d,0x3c,0x3c,0x33,0x32,0x7c,
0x73,0x74,0x61,0x74,0x65,0x5b,0x30,0x5d,0x3b,0x0a,0x72,0x65,0x73,0x75,0x6c,0x74,0x3d,0x61,0x73,0x5f,0x75,0x6c,0x6f,0x6e,0x67,0x28,0x61,0x73,0x5f,0x75,0x63,0x68,
0x61,0x72,0x38,0x28,0x72,0x65,0x73,0x29,0x2e,0x73,0x37,0x36,0x35,0x34,0x33,0x32,0x31,0x30,0x29,0x3b,0x0a,0x7d,0x0a,0x69,0x66,0x28,0x72,0x65,0x73,0x75,0x6c,0x74,
0x3c,0x3d,0x74,0x61,0x72,0x67,0x65,0x74,0x29,0x0a,0x7b,0x0a,0x2a,0x73,0x74,0x6f,0x70,0x3d,0x31,0x3b,0x0a,0x63,0x6f,0x6e,0x73,0x74,0x20,0x75,0x69,0x6e,0x74,0x20,
0x6b,0x3d,0x61,0x74,0x6f,0x6d,0x69,0x63,0x5f,0x69,0x6e,0x63,0x28,0x72,0x65,0x73,0x75,0x6c,0x74,0x73,0x29,0x2b,0x31,0x3b,0x0a,0x69,0x66,0x28,0x6b,0x3c,0x3d,0x31,
0x35,0x29,0x0a,0x72,0x65,0x73,0x75,0x6c,0x74,0x73,0x5b,0x6b,0x5d,0x3d,0x67,0x69,0x64,0x3b,0x0a,0x7d,0x0a,0x7d,0x0a,0x00
0x33,0x32,0x5f,0x74,0x20,0x73,0x74,0x5b,0x32,0x35,0x5d,0x29,0x0a,0x7b,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x72,0x3d,0x30,0x3b,0x20,0x72,0x3c,0x32,
0x32,0x3b,0x20,0x72,0x2b,0x2b,0x29,0x20,0x7b,0x0a,0x6b,0x65,0x63,0x63,0x61,0x6b,0x5f,0x66,0x38,0x30,0x30,0x5f,0x72,0x6f,0x75,0x6e,0x64,0x28,0x73,0x74,0x2c,0x72,
0x29,0x3b,0x0a,0x7d,0x0a,0x7d,0x0a,0x23,0x64,0x65,0x66,0x69,0x6e,0x65,0x20,0x66,0x6e,0x76,0x31,0x61,0x28,0x68,0x2c,0x20,0x64,0x29,0x20,0x28,0x68,0x20,0x3d,0x20,
0x28,0x68,0x20,0x5e,0x20,0x64,0x29,0x20,0x2a,0x20,0x46,0x4e,0x56,0x5f,0x50,0x52,0x49,0x4d,0x45,0x29,0x0a,0x74,0x79,0x70,0x65,0x64,0x65,0x66,0x20,0x73,0x74,0x72,
0x75,0x63,0x74,0x0a,0x7b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x7a,0x2c,0x77,0x2c,0x6a,0x73,0x72,0x2c,0x6a,0x63,0x6f,0x6e,0x67,0x3b,0x0a,0x7d,0x20,
0x6b,0x69,0x73,0x73,0x39,0x39,0x5f,0x74,0x3b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x6b,0x69,0x73,0x73,0x39,0x39,0x28,0x6b,0x69,0x73,0x73,0x39,0x39,
0x5f,0x74,0x2a,0x20,0x73,0x74,0x29,0x0a,0x7b,0x0a,0x73,0x74,0x2d,0x3e,0x7a,0x3d,0x33,0x36,0x39,0x36,0x39,0x2a,0x28,0x73,0x74,0x2d,0x3e,0x7a,0x26,0x36,0x35,0x35,
0x33,0x35,0x29,0x2b,0x28,0x73,0x74,0x2d,0x3e,0x7a,0x3e,0x3e,0x31,0x36,0x29,0x3b,0x0a,0x73,0x74,0x2d,0x3e,0x77,0x3d,0x31,0x38,0x30,0x30,0x30,0x2a,0x28,0x73,0x74,
0x2d,0x3e,0x77,0x26,0x36,0x35,0x35,0x33,0x35,0x29,0x2b,0x28,0x73,0x74,0x2d,0x3e,0x77,0x3e,0x3e,0x31,0x36,0x29,0x3b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,
0x20,0x4d,0x57,0x43,0x3d,0x28,0x28,0x73,0x74,0x2d,0x3e,0x7a,0x3c,0x3c,0x31,0x36,0x29,0x2b,0x73,0x74,0x2d,0x3e,0x77,0x29,0x3b,0x0a,0x73,0x74,0x2d,0x3e,0x6a,0x73,
0x72,0x20,0x5e,0x3d,0x20,0x28,0x73,0x74,0x2d,0x3e,0x6a,0x73,0x72,0x3c,0x3c,0x31,0x37,0x29,0x3b,0x0a,0x73,0x74,0x2d,0x3e,0x6a,0x73,0x72,0x20,0x5e,0x3d,0x20,0x28,
0x73,0x74,0x2d,0x3e,0x6a,0x73,0x72,0x3e,0x3e,0x31,0x33,0x29,0x3b,0x0a,0x73,0x74,0x2d,0x3e,0x6a,0x73,0x72,0x20,0x5e,0x3d,0x20,0x28,0x73,0x74,0x2d,0x3e,0x6a,0x73,
0x72,0x3c,0x3c,0x35,0x29,0x3b,0x0a,0x73,0x74,0x2d,0x3e,0x6a,0x63,0x6f,0x6e,0x67,0x3d,0x36,0x39,0x30,0x36,0x39,0x2a,0x73,0x74,0x2d,0x3e,0x6a,0x63,0x6f,0x6e,0x67,
0x2b,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x3b,0x0a,0x72,0x65,0x74,0x75,0x72,0x6e,0x20,0x28,0x28,0x4d,0x57,0x43,0x5e,0x73,0x74,0x2d,0x3e,0x6a,0x63,0x6f,0x6e,0x67,
0x29,0x2b,0x73,0x74,0x2d,0x3e,0x6a,0x73,0x72,0x29,0x3b,0x0a,0x7d,0x0a,0x76,0x6f,0x69,0x64,0x20,0x66,0x69,0x6c,0x6c,0x5f,0x6d,0x69,0x78,0x28,0x6c,0x6f,0x63,0x61,
0x6c,0x20,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x2a,0x20,0x73,0x65,0x65,0x64,0x2c,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x6c,0x61,0x6e,0x65,0x5f,0x69,
0x64,0x2c,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x2a,0x20,0x6d,0x69,0x78,0x29,0x0a,0x7b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x66,0x6e,0x76,0x5f,
0x68,0x61,0x73,0x68,0x3d,0x46,0x4e,0x56,0x5f,0x4f,0x46,0x46,0x53,0x45,0x54,0x5f,0x42,0x41,0x53,0x49,0x53,0x3b,0x0a,0x6b,0x69,0x73,0x73,0x39,0x39,0x5f,0x74,0x20,
0x73,0x74,0x3b,0x0a,0x73,0x74,0x2e,0x7a,0x3d,0x66,0x6e,0x76,0x31,0x61,0x28,0x66,0x6e,0x76,0x5f,0x68,0x61,0x73,0x68,0x2c,0x73,0x65,0x65,0x64,0x5b,0x30,0x5d,0x29,
0x3b,0x0a,0x73,0x74,0x2e,0x77,0x3d,0x66,0x6e,0x76,0x31,0x61,0x28,0x66,0x6e,0x76,0x5f,0x68,0x61,0x73,0x68,0x2c,0x73,0x65,0x65,0x64,0x5b,0x31,0x5d,0x29,0x3b,0x0a,
0x73,0x74,0x2e,0x6a,0x73,0x72,0x3d,0x66,0x6e,0x76,0x31,0x61,0x28,0x66,0x6e,0x76,0x5f,0x68,0x61,0x73,0x68,0x2c,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x29,0x3b,0x0a,
0x73,0x74,0x2e,0x6a,0x63,0x6f,0x6e,0x67,0x3d,0x66,0x6e,0x76,0x31,0x61,0x28,0x66,0x6e,0x76,0x5f,0x68,0x61,0x73,0x68,0x2c,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x29,
0x3b,0x0a,0x23,0x70,0x72,0x61,0x67,0x6d,0x61,0x20,0x75,0x6e,0x72,0x6f,0x6c,0x6c,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,
0x3c,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x52,0x45,0x47,0x53,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x6d,0x69,0x78,0x5b,0x69,0x5d,0x3d,0x6b,0x69,0x73,0x73,0x39,
0x39,0x28,0x26,0x73,0x74,0x29,0x3b,0x0a,0x7d,0x0a,0x74,0x79,0x70,0x65,0x64,0x65,0x66,0x20,0x73,0x74,0x72,0x75,0x63,0x74,0x0a,0x7b,0x0a,0x75,0x69,0x6e,0x74,0x33,
0x32,0x5f,0x74,0x20,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x4c,0x41,0x4e,0x45,0x53,0x5d,0x3b,0x0a,0x7d,0x20,0x73,0x68,
0x75,0x66,0x66,0x6c,0x65,0x5f,0x74,0x3b,0x0a,0x74,0x79,0x70,0x65,0x64,0x65,0x66,0x20,0x73,0x74,0x72,0x75,0x63,0x74,0x0a,0x7b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,
0x5f,0x74,0x20,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x33,0x32,0x2f,0x73,0x69,0x7a,0x65,0x6f,0x66,0x28,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x29,0x5d,0x3b,
0x0a,0x7d,0x20,0x68,0x61,0x73,0x68,0x33,0x32,0x5f,0x74,0x3b,0x0a,0x23,0x69,0x66,0x20,0x50,0x4c,0x41,0x54,0x46,0x4f,0x52,0x4d,0x20,0x21,0x3d,0x20,0x4f,0x50,0x45,
0x4e,0x43,0x4c,0x5f,0x50,0x4c,0x41,0x54,0x46,0x4f,0x52,0x4d,0x5f,0x4e,0x56,0x49,0x44,0x49,0x41,0x20,0x0a,0x5f,0x5f,0x61,0x74,0x74,0x72,0x69,0x62,0x75,0x74,0x65,
0x5f,0x5f,0x28,0x28,0x72,0x65,0x71,0x64,0x5f,0x77,0x6f,0x72,0x6b,0x5f,0x67,0x72,0x6f,0x75,0x70,0x5f,0x73,0x69,0x7a,0x65,0x28,0x47,0x52,0x4f,0x55,0x50,0x5f,0x53,
0x49,0x5a,0x45,0x2c,0x31,0x2c,0x31,0x29,0x29,0x29,0x0a,0x23,0x65,0x6e,0x64,0x69,0x66,0x0a,0x5f,0x5f,0x6b,0x65,0x72,0x6e,0x65,0x6c,0x20,0x76,0x6f,0x69,0x64,0x20,
0x70,0x72,0x6f,0x67,0x70,0x6f,0x77,0x5f,0x73,0x65,0x61,0x72,0x63,0x68,0x28,0x5f,0x5f,0x67,0x6c,0x6f,0x62,0x61,0x6c,0x20,0x64,0x61,0x67,0x5f,0x74,0x20,0x63,0x6f,
0x6e,0x73,0x74,0x2a,0x20,0x67,0x5f,0x64,0x61,0x67,0x2c,0x5f,0x5f,0x67,0x6c,0x6f,0x62,0x61,0x6c,0x20,0x75,0x69,0x6e,0x74,0x2a,0x20,0x6a,0x6f,0x62,0x5f,0x62,0x6c,
0x6f,0x62,0x2c,0x75,0x6c,0x6f,0x6e,0x67,0x20,0x74,0x61,0x72,0x67,0x65,0x74,0x2c,0x75,0x69,0x6e,0x74,0x20,0x68,0x61,0x63,0x6b,0x5f,0x66,0x61,0x6c,0x73,0x65,0x2c,
0x76,0x6f,0x6c,0x61,0x74,0x69,0x6c,0x65,0x20,0x5f,0x5f,0x67,0x6c,0x6f,0x62,0x61,0x6c,0x20,0x75,0x69,0x6e,0x74,0x2a,0x20,0x72,0x65,0x73,0x75,0x6c,0x74,0x73,0x2c,
0x76,0x6f,0x6c,0x61,0x74,0x69,0x6c,0x65,0x20,0x5f,0x5f,0x67,0x6c,0x6f,0x62,0x61,0x6c,0x20,0x75,0x69,0x6e,0x74,0x2a,0x20,0x73,0x74,0x6f,0x70,0x29,0x0a,0x7b,0x0a,
0x63,0x6f,0x6e,0x73,0x74,0x20,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x6c,0x69,0x64,0x3d,0x67,0x65,0x74,0x5f,0x6c,0x6f,0x63,0x61,0x6c,0x5f,0x69,0x64,0x28,
0x30,0x29,0x3b,0x0a,0x63,0x6f,0x6e,0x73,0x74,0x20,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x67,0x69,0x64,0x3d,0x67,0x65,0x74,0x5f,0x67,0x6c,0x6f,0x62,0x61,
0x6c,0x5f,0x69,0x64,0x28,0x30,0x29,0x3b,0x0a,0x69,0x66,0x28,0x73,0x74,0x6f,0x70,0x5b,0x30,0x5d,0x29,0x20,0x7b,0x0a,0x69,0x66,0x28,0x6c,0x69,0x64,0x3d,0x3d,0x30,
0x29,0x20,0x7b,0x0a,0x61,0x74,0x6f,0x6d,0x69,0x63,0x5f,0x69,0x6e,0x63,0x28,0x73,0x74,0x6f,0x70,0x2b,0x31,0x29,0x3b,0x0a,0x7d,0x0a,0x72,0x65,0x74,0x75,0x72,0x6e,
0x3b,0x0a,0x7d,0x0a,0x5f,0x5f,0x6c,0x6f,0x63,0x61,0x6c,0x20,0x73,0x68,0x75,0x66,0x66,0x6c,0x65,0x5f,0x74,0x20,0x73,0x68,0x61,0x72,0x65,0x5b,0x48,0x41,0x53,0x48,
0x45,0x53,0x5f,0x50,0x45,0x52,0x5f,0x47,0x52,0x4f,0x55,0x50,0x5d,0x3b,0x0a,0x5f,0x5f,0x6c,0x6f,0x63,0x61,0x6c,0x20,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,
0x63,0x5f,0x64,0x61,0x67,0x5b,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x43,0x41,0x43,0x48,0x45,0x5f,0x57,0x4f,0x52,0x44,0x53,0x5d,0x3b,0x0a,0x63,0x6f,0x6e,0x73,
0x74,0x20,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x3d,0x6c,0x69,0x64,0x26,0x28,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,
0x4c,0x41,0x4e,0x45,0x53,0x2d,0x31,0x29,0x3b,0x0a,0x63,0x6f,0x6e,0x73,0x74,0x20,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x67,0x72,0x6f,0x75,0x70,0x5f,0x69,
0x64,0x3d,0x6c,0x69,0x64,0x2f,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x4c,0x41,0x4e,0x45,0x53,0x3b,0x0a,0x66,0x6f,0x72,0x20,0x28,0x75,0x69,0x6e,0x74,0x33,0x32,
0x5f,0x74,0x20,0x77,0x6f,0x72,0x64,0x3d,0x6c,0x69,0x64,0x2a,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x44,0x41,0x47,0x5f,0x4c,0x4f,0x41,0x44,0x53,0x3b,0x20,0x77,
0x6f,0x72,0x64,0x3c,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x43,0x41,0x43,0x48,0x45,0x5f,0x57,0x4f,0x52,0x44,0x53,0x3b,0x20,0x77,0x6f,0x72,0x64,0x2b,0x3d,0x47,
0x52,0x4f,0x55,0x50,0x5f,0x53,0x49,0x5a,0x45,0x2a,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x44,0x41,0x47,0x5f,0x4c,0x4f,0x41,0x44,0x53,0x29,0x0a,0x7b,0x0a,0x64,
0x61,0x67,0x5f,0x74,0x20,0x6c,0x6f,0x61,0x64,0x3d,0x67,0x5f,0x64,0x61,0x67,0x5b,0x77,0x6f,0x72,0x64,0x2f,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x44,0x41,0x47,
0x5f,0x4c,0x4f,0x41,0x44,0x53,0x5d,0x3b,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,
0x5f,0x44,0x41,0x47,0x5f,0x4c,0x4f,0x41,0x44,0x53,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x63,0x5f,0x64,0x61,0x67,0x5b,0x77,0x6f,0x72,0x64,0x2b,0x69,0x5d,0x3d,0x6c,
0x6f,0x61,0x64,0x2e,0x73,0x5b,0x69,0x5d,0x3b,0x0a,0x7d,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x68,0x61,0x73,0x68,0x5f,0x73,0x65,0x65,0x64,0x5b,0x32,
0x5d,0x3b,0x20,0x0a,0x68,0x61,0x73,0x68,0x33,0x32,0x5f,0x74,0x20,0x64,0x69,0x67,0x65,0x73,0x74,0x3b,0x20,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x73,
0x74,0x61,0x74,0x65,0x32,0x5b,0x38,0x5d,0x3b,0x0a,0x7b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x73,0x74,0x61,0x74,0x65,0x5b,0x32,0x35,0x5d,0x3b,0x20,
0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,0x31,0x30,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x73,0x74,0x61,0x74,0x65,0x5b,
0x69,0x5d,0x3d,0x6a,0x6f,0x62,0x5f,0x62,0x6c,0x6f,0x62,0x5b,0x69,0x5d,0x3b,0x0a,0x73,0x74,0x61,0x74,0x65,0x5b,0x38,0x5d,0x3d,0x67,0x69,0x64,0x3b,0x0a,0x66,0x6f,
0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x31,0x30,0x3b,0x20,0x69,0x3c,0x32,0x35,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x73,0x74,0x61,0x74,0x65,0x5b,0x69,0x5d,
0x3d,0x72,0x61,0x76,0x65,0x6e,0x63,0x6f,0x69,0x6e,0x5f,0x72,0x6e,0x64,0x63,0x5b,0x69,0x2d,0x31,0x30,0x5d,0x3b,0x0a,0x6b,0x65,0x63,0x63,0x61,0x6b,0x5f,0x66,0x38,
0x30,0x30,0x28,0x73,0x74,0x61,0x74,0x65,0x29,0x3b,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,0x38,0x3b,0x20,0x69,0x2b,
0x2b,0x29,0x0a,0x73,0x74,0x61,0x74,0x65,0x32,0x5b,0x69,0x5d,0x3d,0x73,0x74,0x61,0x74,0x65,0x5b,0x69,0x5d,0x3b,0x0a,0x7d,0x0a,0x23,0x70,0x72,0x61,0x67,0x6d,0x61,
0x20,0x75,0x6e,0x72,0x6f,0x6c,0x6c,0x20,0x31,0x0a,0x66,0x6f,0x72,0x20,0x28,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x68,0x3d,0x30,0x3b,0x20,0x68,0x3c,0x50,
0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x4c,0x41,0x4e,0x45,0x53,0x3b,0x20,0x68,0x2b,0x2b,0x29,0x0a,0x7b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x6d,0x69,
0x78,0x5b,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x52,0x45,0x47,0x53,0x5d,0x3b,0x0a,0x69,0x66,0x28,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x3d,0x3d,0x68,0x29,0x20,
0x7b,0x0a,0x73,0x68,0x61,0x72,0x65,0x5b,0x67,0x72,0x6f,0x75,0x70,0x5f,0x69,0x64,0x5d,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x30,0x5d,0x3d,0x73,0x74,0x61,
0x74,0x65,0x32,0x5b,0x30,0x5d,0x3b,0x0a,0x73,0x68,0x61,0x72,0x65,0x5b,0x67,0x72,0x6f,0x75,0x70,0x5f,0x69,0x64,0x5d,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,
0x31,0x5d,0x3d,0x73,0x74,0x61,0x74,0x65,0x32,0x5b,0x31,0x5d,0x3b,0x0a,0x7d,0x0a,0x62,0x61,0x72,0x72,0x69,0x65,0x72,0x28,0x43,0x4c,0x4b,0x5f,0x4c,0x4f,0x43,0x41,
0x4c,0x5f,0x4d,0x45,0x4d,0x5f,0x46,0x45,0x4e,0x43,0x45,0x29,0x3b,0x0a,0x66,0x69,0x6c,0x6c,0x5f,0x6d,0x69,0x78,0x28,0x73,0x68,0x61,0x72,0x65,0x5b,0x67,0x72,0x6f,
0x75,0x70,0x5f,0x69,0x64,0x5d,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x2c,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x2c,0x6d,0x69,0x78,0x29,0x3b,0x0a,0x23,0x70,0x72,
0x61,0x67,0x6d,0x61,0x20,0x75,0x6e,0x72,0x6f,0x6c,0x6c,0x20,0x32,0x0a,0x66,0x6f,0x72,0x20,0x28,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x6c,0x6f,0x6f,0x70,
0x3d,0x30,0x3b,0x20,0x6c,0x6f,0x6f,0x70,0x3c,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x43,0x4e,0x54,0x5f,0x44,0x41,0x47,0x3b,0x20,0x2b,0x2b,0x6c,0x6f,0x6f,0x70,
0x29,0x0a,0x7b,0x0a,0x69,0x66,0x28,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x3d,0x3d,0x28,0x6c,0x6f,0x6f,0x70,0x20,0x25,0x20,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,
0x4c,0x41,0x4e,0x45,0x53,0x29,0x29,0x0a,0x73,0x68,0x61,0x72,0x65,0x5b,0x30,0x5d,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x67,0x72,0x6f,0x75,0x70,0x5f,0x69,
0x64,0x5d,0x3d,0x6d,0x69,0x78,0x5b,0x30,0x5d,0x3b,0x0a,0x62,0x61,0x72,0x72,0x69,0x65,0x72,0x28,0x43,0x4c,0x4b,0x5f,0x4c,0x4f,0x43,0x41,0x4c,0x5f,0x4d,0x45,0x4d,
0x5f,0x46,0x45,0x4e,0x43,0x45,0x29,0x3b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x6f,0x66,0x66,0x73,0x65,0x74,0x3d,0x73,0x68,0x61,0x72,0x65,0x5b,0x30,
0x5d,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x67,0x72,0x6f,0x75,0x70,0x5f,0x69,0x64,0x5d,0x3b,0x0a,0x6f,0x66,0x66,0x73,0x65,0x74,0x20,0x25,0x3d,0x20,0x50,
0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x44,0x41,0x47,0x5f,0x45,0x4c,0x45,0x4d,0x45,0x4e,0x54,0x53,0x3b,0x0a,0x6f,0x66,0x66,0x73,0x65,0x74,0x3d,0x6f,0x66,0x66,0x73,
0x65,0x74,0x2a,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x4c,0x41,0x4e,0x45,0x53,0x2b,0x28,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x5e,0x6c,0x6f,0x6f,0x70,0x29,0x20,
0x25,0x20,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x4c,0x41,0x4e,0x45,0x53,0x3b,0x0a,0x64,0x61,0x67,0x5f,0x74,0x20,0x64,0x61,0x74,0x61,0x5f,0x64,0x61,0x67,0x3d,
0x67,0x5f,0x64,0x61,0x67,0x5b,0x6f,0x66,0x66,0x73,0x65,0x74,0x5d,0x3b,0x0a,0x69,0x66,0x28,0x68,0x61,0x63,0x6b,0x5f,0x66,0x61,0x6c,0x73,0x65,0x29,0x20,0x62,0x61,
0x72,0x72,0x69,0x65,0x72,0x28,0x43,0x4c,0x4b,0x5f,0x4c,0x4f,0x43,0x41,0x4c,0x5f,0x4d,0x45,0x4d,0x5f,0x46,0x45,0x4e,0x43,0x45,0x29,0x3b,0x0a,0x75,0x69,0x6e,0x74,
0x33,0x32,0x5f,0x74,0x20,0x64,0x61,0x74,0x61,0x3b,0x0a,0x58,0x4d,0x52,0x49,0x47,0x5f,0x49,0x4e,0x43,0x4c,0x55,0x44,0x45,0x5f,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,
0x5f,0x52,0x41,0x4e,0x44,0x4f,0x4d,0x5f,0x4d,0x41,0x54,0x48,0x0a,0x69,0x66,0x28,0x68,0x61,0x63,0x6b,0x5f,0x66,0x61,0x6c,0x73,0x65,0x29,0x20,0x62,0x61,0x72,0x72,
0x69,0x65,0x72,0x28,0x43,0x4c,0x4b,0x5f,0x4c,0x4f,0x43,0x41,0x4c,0x5f,0x4d,0x45,0x4d,0x5f,0x46,0x45,0x4e,0x43,0x45,0x29,0x3b,0x0a,0x58,0x4d,0x52,0x49,0x47,0x5f,
0x49,0x4e,0x43,0x4c,0x55,0x44,0x45,0x5f,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x44,0x41,0x54,0x41,0x5f,0x4c,0x4f,0x41,0x44,0x53,0x0a,0x7d,0x0a,0x75,0x69,0x6e,
0x74,0x33,0x32,0x5f,0x74,0x20,0x6d,0x69,0x78,0x5f,0x68,0x61,0x73,0x68,0x3d,0x46,0x4e,0x56,0x5f,0x4f,0x46,0x46,0x53,0x45,0x54,0x5f,0x42,0x41,0x53,0x49,0x53,0x3b,
0x0a,0x23,0x70,0x72,0x61,0x67,0x6d,0x61,0x20,0x75,0x6e,0x72,0x6f,0x6c,0x6c,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,
0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x52,0x45,0x47,0x53,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x66,0x6e,0x76,0x31,0x61,0x28,0x6d,0x69,0x78,0x5f,0x68,0x61,0x73,
0x68,0x2c,0x6d,0x69,0x78,0x5b,0x69,0x5d,0x29,0x3b,0x0a,0x68,0x61,0x73,0x68,0x33,0x32,0x5f,0x74,0x20,0x64,0x69,0x67,0x65,0x73,0x74,0x5f,0x74,0x65,0x6d,0x70,0x3b,
0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,0x38,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x64,0x69,0x67,0x65,0x73,0x74,0x5f,
0x74,0x65,0x6d,0x70,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x69,0x5d,0x3d,0x46,0x4e,0x56,0x5f,0x4f,0x46,0x46,0x53,0x45,0x54,0x5f,0x42,0x41,0x53,0x49,0x53,
0x3b,0x0a,0x73,0x68,0x61,0x72,0x65,0x5b,0x67,0x72,0x6f,0x75,0x70,0x5f,0x69,0x64,0x5d,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x6c,0x61,0x6e,0x65,0x5f,0x69,
0x64,0x5d,0x3d,0x6d,0x69,0x78,0x5f,0x68,0x61,0x73,0x68,0x3b,0x0a,0x62,0x61,0x72,0x72,0x69,0x65,0x72,0x28,0x43,0x4c,0x4b,0x5f,0x4c,0x4f,0x43,0x41,0x4c,0x5f,0x4d,
0x45,0x4d,0x5f,0x46,0x45,0x4e,0x43,0x45,0x29,0x3b,0x0a,0x23,0x70,0x72,0x61,0x67,0x6d,0x61,0x20,0x75,0x6e,0x72,0x6f,0x6c,0x6c,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,
0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,0x50,0x52,0x4f,0x47,0x50,0x4f,0x57,0x5f,0x4c,0x41,0x4e,0x45,0x53,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x66,0x6e,
0x76,0x31,0x61,0x28,0x64,0x69,0x67,0x65,0x73,0x74,0x5f,0x74,0x65,0x6d,0x70,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x69,0x20,0x25,0x20,0x38,0x5d,0x2c,0x73,
0x68,0x61,0x72,0x65,0x5b,0x67,0x72,0x6f,0x75,0x70,0x5f,0x69,0x64,0x5d,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x69,0x5d,0x29,0x3b,0x0a,0x69,0x66,0x28,0x68,
0x3d,0x3d,0x6c,0x61,0x6e,0x65,0x5f,0x69,0x64,0x29,0x0a,0x64,0x69,0x67,0x65,0x73,0x74,0x3d,0x64,0x69,0x67,0x65,0x73,0x74,0x5f,0x74,0x65,0x6d,0x70,0x3b,0x0a,0x7d,
0x0a,0x75,0x69,0x6e,0x74,0x36,0x34,0x5f,0x74,0x20,0x72,0x65,0x73,0x75,0x6c,0x74,0x3b,0x0a,0x7b,0x0a,0x75,0x69,0x6e,0x74,0x33,0x32,0x5f,0x74,0x20,0x73,0x74,0x61,
0x74,0x65,0x5b,0x32,0x35,0x5d,0x3d,0x7b,0x30,0x78,0x30,0x7d,0x3b,0x20,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x30,0x3b,0x20,0x69,0x3c,0x38,
0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x73,0x74,0x61,0x74,0x65,0x5b,0x69,0x5d,0x3d,0x73,0x74,0x61,0x74,0x65,0x32,0x5b,0x69,0x5d,0x3b,0x0a,0x66,0x6f,0x72,0x20,0x28,
0x69,0x6e,0x74,0x20,0x69,0x3d,0x38,0x3b,0x20,0x69,0x3c,0x31,0x36,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x73,0x74,0x61,0x74,0x65,0x5b,0x69,0x5d,0x3d,0x64,0x69,0x67,
0x65,0x73,0x74,0x2e,0x75,0x69,0x6e,0x74,0x33,0x32,0x73,0x5b,0x69,0x2d,0x38,0x5d,0x3b,0x0a,0x66,0x6f,0x72,0x20,0x28,0x69,0x6e,0x74,0x20,0x69,0x3d,0x31,0x36,0x3b,
0x20,0x69,0x3c,0x32,0x35,0x3b,0x20,0x69,0x2b,0x2b,0x29,0x0a,0x73,0x74,0x61,0x74,0x65,0x5b,0x69,0x5d,0x3d,0x72,0x61,0x76,0x65,0x6e,0x63,0x6f,0x69,0x6e,0x5f,0x72,
0x6e,0x64,0x63,0x5b,0x69,0x2d,0x31,0x36,0x5d,0x3b,0x0a,0x6b,0x65,0x63,0x63,0x61,0x6b,0x5f,0x66,0x38,0x30,0x30,0x28,0x73,0x74,0x61,0x74,0x65,0x29,0x3b,0x0a,0x75,
0x69,0x6e,0x74,0x36,0x34,0x5f,0x74,0x20,0x72,0x65,0x73,0x3d,0x28,0x75,0x69,0x6e,0x74,0x36,0x34,0x5f,0x74,0x29,0x73,0x74,0x61,0x74,0x65,0x5b,0x31,0x5d,0x3c,0x3c,
0x33,0x32,0x7c,0x73,0x74,0x61,0x74,0x65,0x5b,0x30,0x5d,0x3b,0x0a,0x72,0x65,0x73,0x75,0x6c,0x74,0x3d,0x61,0x73,0x5f,0x75,0x6c,0x6f,0x6e,0x67,0x28,0x61,0x73,0x5f,
0x75,0x63,0x68,0x61,0x72,0x38,0x28,0x72,0x65,0x73,0x29,0x2e,0x73,0x37,0x36,0x35,0x34,0x33,0x32,0x31,0x30,0x29,0x3b,0x0a,0x7d,0x0a,0x69,0x66,0x28,0x72,0x65,0x73,
0x75,0x6c,0x74,0x3c,0x3d,0x74,0x61,0x72,0x67,0x65,0x74,0x29,0x0a,0x7b,0x0a,0x2a,0x73,0x74,0x6f,0x70,0x3d,0x31,0x3b,0x0a,0x63,0x6f,0x6e,0x73,0x74,0x20,0x75,0x69,
0x6e,0x74,0x20,0x6b,0x3d,0x61,0x74,0x6f,0x6d,0x69,0x63,0x5f,0x69,0x6e,0x63,0x28,0x72,0x65,0x73,0x75,0x6c,0x74,0x73,0x29,0x2b,0x31,0x3b,0x0a,0x69,0x66,0x28,0x6b,
0x3c,0x3d,0x31,0x35,0x29,0x0a,0x72,0x65,0x73,0x75,0x6c,0x74,0x73,0x5b,0x6b,0x5d,0x3d,0x67,0x69,0x64,0x3b,0x0a,0x7d,0x0a,0x7d,0x0a,0x00
};
} // namespace xmrig

File diff suppressed because it is too large Load Diff

View File

@@ -77,6 +77,7 @@ const char *Algorithm::kCN_UPX2 = "cn/upx2";
#ifdef XMRIG_ALGO_RANDOMX
const char *Algorithm::kRX = "rx";
const char *Algorithm::kRX_0 = "rx/0";
const char *Algorithm::kRX_V2 = "rx/2";
const char *Algorithm::kRX_WOW = "rx/wow";
const char *Algorithm::kRX_ARQ = "rx/arq";
const char *Algorithm::kRX_GRAFT = "rx/graft";
@@ -143,6 +144,7 @@ static const std::map<uint32_t, const char *> kAlgorithmNames = {
# ifdef XMRIG_ALGO_RANDOMX
ALGO_NAME(RX_0),
ALGO_NAME(RX_V2),
ALGO_NAME(RX_WOW),
ALGO_NAME(RX_ARQ),
ALGO_NAME(RX_GRAFT),
@@ -253,6 +255,8 @@ static const std::map<const char *, Algorithm::Id, aliasCompare> kAlgorithmAlias
ALGO_ALIAS(RX_0, "rx/test"),
ALGO_ALIAS(RX_0, "randomx"),
ALGO_ALIAS(RX_0, "rx"),
ALGO_ALIAS_AUTO(RX_V2), ALGO_ALIAS(RX_V2, "randomx/v2"),
ALGO_ALIAS(RX_V2, "rx/v2"),
ALGO_ALIAS_AUTO(RX_WOW), ALGO_ALIAS(RX_WOW, "randomx/wow"),
ALGO_ALIAS(RX_WOW, "randomwow"),
ALGO_ALIAS_AUTO(RX_ARQ), ALGO_ALIAS(RX_ARQ, "randomx/arq"),
@@ -350,7 +354,7 @@ std::vector<xmrig::Algorithm> xmrig::Algorithm::all(const std::function<bool(con
CN_HEAVY_0, CN_HEAVY_TUBE, CN_HEAVY_XHV,
CN_PICO_0, CN_PICO_TLO,
CN_UPX2,
RX_0, RX_WOW, RX_ARQ, RX_GRAFT, RX_SFX, RX_YADA,
RX_0, RX_V2, RX_WOW, RX_ARQ, RX_GRAFT, RX_SFX, RX_YADA,
AR2_CHUKWA, AR2_CHUKWA_V2, AR2_WRKZ,
KAWPOW_RVN,
GHOSTRIDER_RTM

View File

@@ -73,6 +73,7 @@ public:
CN_GR_5 = 0x63120105, // "cn/turtle-lite" GhostRider
GHOSTRIDER_RTM = 0x6c150000, // "ghostrider" GhostRider
RX_0 = 0x72151200, // "rx/0" RandomX (reference configuration).
RX_V2 = 0x72151202, // "rx/2" RandomX (Monero v2).
RX_WOW = 0x72141177, // "rx/wow" RandomWOW (Wownero).
RX_ARQ = 0x72121061, // "rx/arq" RandomARQ (Arqma).
RX_GRAFT = 0x72151267, // "rx/graft" RandomGRAFT (Graft).
@@ -139,6 +140,7 @@ public:
# ifdef XMRIG_ALGO_RANDOMX
static const char *kRX;
static const char *kRX_0;
static const char* kRX_V2;
static const char *kRX_WOW;
static const char *kRX_ARQ;
static const char *kRX_GRAFT;

View File

@@ -48,7 +48,7 @@
#define KECCAK_ROUNDS 24
/* *************************** Public Inteface ************************ */
/* *************************** Public Interface ************************ */
/* For Init or Reset call these: */
sha3_return_t

View File

@@ -81,7 +81,7 @@ xmrig::Client::Client(int id, const char *agent, IClientListener *listener) :
BaseClient(id, listener),
m_agent(agent),
m_sendBuf(1024),
m_tempBuf(256)
m_tempBuf(320)
{
m_reader.setListener(this);
m_key = m_storage.add(this);
@@ -199,6 +199,7 @@ int64_t xmrig::Client::submit(const JobResult &result)
char *nonce = m_tempBuf.data();
char *data = m_tempBuf.data() + 16;
char *signature = m_tempBuf.data() + 88;
char *commitment = m_tempBuf.data() + 224;
Cvt::toHex(nonce, sizeof(uint32_t) * 2 + 1, reinterpret_cast<const uint8_t *>(&result.nonce), sizeof(uint32_t));
Cvt::toHex(data, 65, result.result(), 32);
@@ -206,6 +207,10 @@ int64_t xmrig::Client::submit(const JobResult &result)
if (result.minerSignature()) {
Cvt::toHex(signature, 129, result.minerSignature(), 64);
}
if (result.commitment()) {
Cvt::toHex(commitment, 65, result.commitment(), 32);
}
# endif
Document doc(kObjectType);
@@ -227,6 +232,16 @@ int64_t xmrig::Client::submit(const JobResult &result)
}
# endif
# ifndef XMRIG_PROXY_PROJECT
if (result.commitment()) {
params.AddMember("commitment", StringRef(commitment), allocator);
}
# else
if (result.commitment) {
params.AddMember("commitment", StringRef(result.commitment), allocator);
}
# endif
if (has<EXT_ALGO>() && result.algorithm.isValid()) {
params.AddMember("algo", StringRef(result.algorithm.name()), allocator);
}
@@ -554,6 +569,7 @@ int64_t xmrig::Client::send(size_t size)
}
m_expire = Chrono::steadyMSecs() + kResponseTimeout;
startTimeout();
return m_sequence++;
}
@@ -661,8 +677,6 @@ void xmrig::Client::onClose()
void xmrig::Client::parse(char *line, size_t len)
{
startTimeout();
LOG_DEBUG("[%s] received (%d bytes): \"%.*s\"", url(), len, static_cast<int>(len), line);
if (len < 22 || line[0] != '{') {
@@ -857,8 +871,6 @@ void xmrig::Client::parseResponse(int64_t id, const rapidjson::Value &result, co
void xmrig::Client::ping()
{
send(snprintf(m_sendBuf.data(), m_sendBuf.size(), "{\"id\":%" PRId64 ",\"jsonrpc\":\"2.0\",\"method\":\"keepalived\",\"params\":{\"id\":\"%s\"}}\n", m_sequence, m_rpcId.data()));
m_keepAlive = 0;
}

View File

@@ -410,6 +410,7 @@ bool xmrig::DaemonClient::parseJob(const rapidjson::Value &params, int *code)
m_blocktemplate.offset(BlockTemplate::TX_EXTRA_NONCE_OFFSET) - k,
m_blocktemplate.txExtraNonce().size(),
m_blocktemplate.minerTxMerkleTreeBranch(),
m_blocktemplate.minerTxMerkleTreePath(),
m_blocktemplate.outputType() == 3
);
# endif

View File

@@ -269,6 +269,7 @@ void xmrig::Job::copy(const Job &other)
m_minerTxExtraNonceOffset = other.m_minerTxExtraNonceOffset;
m_minerTxExtraNonceSize = other.m_minerTxExtraNonceSize;
m_minerTxMerkleTreeBranch = other.m_minerTxMerkleTreeBranch;
m_minerTxMerkleTreePath = other.m_minerTxMerkleTreePath;
m_hasViewTag = other.m_hasViewTag;
# else
memcpy(m_ephPublicKey, other.m_ephPublicKey, sizeof(m_ephPublicKey));
@@ -325,6 +326,7 @@ void xmrig::Job::move(Job &&other)
m_minerTxExtraNonceOffset = other.m_minerTxExtraNonceOffset;
m_minerTxExtraNonceSize = other.m_minerTxExtraNonceSize;
m_minerTxMerkleTreeBranch = std::move(other.m_minerTxMerkleTreeBranch);
m_minerTxMerkleTreePath = other.m_minerTxMerkleTreePath;
m_hasViewTag = other.m_hasViewTag;
# else
memcpy(m_ephPublicKey, other.m_ephPublicKey, sizeof(m_ephPublicKey));
@@ -349,7 +351,7 @@ void xmrig::Job::setSpendSecretKey(const uint8_t *key)
}
void xmrig::Job::setMinerTx(const uint8_t *begin, const uint8_t *end, size_t minerTxEphPubKeyOffset, size_t minerTxPubKeyOffset, size_t minerTxExtraNonceOffset, size_t minerTxExtraNonceSize, const Buffer &minerTxMerkleTreeBranch, bool hasViewTag)
void xmrig::Job::setMinerTx(const uint8_t *begin, const uint8_t *end, size_t minerTxEphPubKeyOffset, size_t minerTxPubKeyOffset, size_t minerTxExtraNonceOffset, size_t minerTxExtraNonceSize, const Buffer &minerTxMerkleTreeBranch, uint32_t minerTxMerkleTreePath, bool hasViewTag)
{
m_minerTxPrefix.assign(begin, end);
m_minerTxEphPubKeyOffset = minerTxEphPubKeyOffset;
@@ -357,6 +359,7 @@ void xmrig::Job::setMinerTx(const uint8_t *begin, const uint8_t *end, size_t min
m_minerTxExtraNonceOffset = minerTxExtraNonceOffset;
m_minerTxExtraNonceSize = minerTxExtraNonceSize;
m_minerTxMerkleTreeBranch = minerTxMerkleTreeBranch;
m_minerTxMerkleTreePath = minerTxMerkleTreePath;
m_hasViewTag = hasViewTag;
}
@@ -401,7 +404,7 @@ void xmrig::Job::generateHashingBlob(String &blob) const
{
uint8_t root_hash[32];
const uint8_t* p = m_minerTxPrefix.data();
BlockTemplate::calculateRootHash(p, p + m_minerTxPrefix.size(), m_minerTxMerkleTreeBranch, root_hash);
BlockTemplate::calculateRootHash(p, p + m_minerTxPrefix.size(), m_minerTxMerkleTreeBranch, m_minerTxMerkleTreePath, root_hash);
uint64_t root_hash_offset = nonceOffset() + nonceSize();

View File

@@ -121,7 +121,7 @@ public:
inline bool hasViewTag() const { return m_hasViewTag; }
void setSpendSecretKey(const uint8_t* key);
void setMinerTx(const uint8_t* begin, const uint8_t* end, size_t minerTxEphPubKeyOffset, size_t minerTxPubKeyOffset, size_t minerTxExtraNonceOffset, size_t minerTxExtraNonceSize, const Buffer& minerTxMerkleTreeBranch, bool hasViewTag);
void setMinerTx(const uint8_t* begin, const uint8_t* end, size_t minerTxEphPubKeyOffset, size_t minerTxPubKeyOffset, size_t minerTxExtraNonceOffset, size_t minerTxExtraNonceSize, const Buffer& minerTxMerkleTreeBranch, uint32_t minerTxMerkleTreePath, bool hasViewTag);
void setViewTagInMinerTx(uint8_t view_tag);
void setExtraNonceInMinerTx(uint32_t extra_nonce);
void generateSignatureData(String& signatureData, uint8_t& view_tag) const;
@@ -179,6 +179,7 @@ private:
size_t m_minerTxExtraNonceOffset = 0;
size_t m_minerTxExtraNonceSize = 0;
Buffer m_minerTxMerkleTreeBranch;
uint32_t m_minerTxMerkleTreePath = 0;
bool m_hasViewTag = false;
# else
// Miner signatures

View File

@@ -48,69 +48,98 @@ void xmrig::BlockTemplate::calculateMinerTxHash(const uint8_t *prefix_begin, con
}
void xmrig::BlockTemplate::calculateRootHash(const uint8_t *prefix_begin, const uint8_t *prefix_end, const Buffer &miner_tx_merkle_tree_branch, uint8_t *root_hash)
void xmrig::BlockTemplate::calculateRootHash(const uint8_t *prefix_begin, const uint8_t *prefix_end, const Buffer &miner_tx_merkle_tree_branch, uint32_t miner_tx_merkle_tree_path, uint8_t *root_hash)
{
calculateMinerTxHash(prefix_begin, prefix_end, root_hash);
for (size_t i = 0; i < miner_tx_merkle_tree_branch.size(); i += kHashSize) {
const size_t depth = miner_tx_merkle_tree_branch.size() / kHashSize;
for (size_t d = 0; d < depth; ++d) {
uint8_t h[kHashSize * 2];
memcpy(h, root_hash, kHashSize);
memcpy(h + kHashSize, miner_tx_merkle_tree_branch.data() + i, kHashSize);
const uint32_t t = (miner_tx_merkle_tree_path >> (depth - d - 1)) & 1;
memcpy(h + kHashSize * t, root_hash, kHashSize);
memcpy(h + kHashSize * (t ^ 1), miner_tx_merkle_tree_branch.data() + d * kHashSize, kHashSize);
keccak(h, kHashSize * 2, root_hash, kHashSize);
}
}
void xmrig::BlockTemplate::calculateMerkleTreeHash()
void xmrig::BlockTemplate::calculateMerkleTreeHash(uint32_t index)
{
m_minerTxMerkleTreeBranch.clear();
m_minerTxMerkleTreePath = 0;
const uint64_t count = m_numHashes + 1;
const size_t count = m_hashes.size() / kHashSize;
const uint8_t *h = m_hashes.data();
if (count == 1) {
if (count == 1) {
memcpy(m_rootHash, h, kHashSize);
}
else if (count == 2) {
m_minerTxMerkleTreeBranch.insert(m_minerTxMerkleTreeBranch.end(), h + kHashSize, h + kHashSize * 2);
}
else if (count == 2) {
keccak(h, kHashSize * 2, m_rootHash, kHashSize);
}
else {
size_t i = 0;
size_t j = 0;
size_t cnt = 0;
for (i = 0, cnt = 1; cnt <= count; ++i, cnt <<= 1) {}
m_minerTxMerkleTreeBranch.reserve(1);
m_minerTxMerkleTreeBranch.insert(m_minerTxMerkleTreeBranch.end(), h + kHashSize * (index ^ 1), h + kHashSize * ((index ^ 1) + 1));
m_minerTxMerkleTreePath = static_cast<uint32_t>(index);
}
else {
uint8_t h2[kHashSize];
memcpy(h2, h + kHashSize * index, kHashSize);
cnt >>= 1;
size_t cnt = 1, proof_max_size = 0;
do {
cnt <<= 1;
++proof_max_size;
} while (cnt <= count);
cnt >>= 1;
m_minerTxMerkleTreeBranch.reserve(kHashSize * (i - 1));
m_minerTxMerkleTreeBranch.reserve(proof_max_size);
Buffer ints(cnt * kHashSize);
memcpy(ints.data(), h, (cnt * 2 - count) * kHashSize);
for (i = cnt * 2 - count, j = cnt * 2 - count; j < cnt; i += 2, ++j) {
if (i == 0) {
m_minerTxMerkleTreeBranch.insert(m_minerTxMerkleTreeBranch.end(), h + kHashSize, h + kHashSize * 2);
}
keccak(h + i * kHashSize, kHashSize * 2, ints.data() + j * kHashSize, kHashSize);
}
const size_t k = cnt * 2 - count;
memcpy(ints.data(), h, k * kHashSize);
while (cnt > 2) {
cnt >>= 1;
for (i = 0, j = 0; j < cnt; i += 2, ++j) {
if (i == 0) {
m_minerTxMerkleTreeBranch.insert(m_minerTxMerkleTreeBranch.end(), ints.data() + kHashSize, ints.data() + kHashSize * 2);
}
keccak(ints.data() + i * kHashSize, kHashSize * 2, ints.data() + j * kHashSize, kHashSize);
}
}
for (size_t i = k, j = k; j < cnt; i += 2, ++j) {
keccak(h + i * kHashSize, kHashSize * 2, ints.data() + j * kHashSize, kHashSize);
m_minerTxMerkleTreeBranch.insert(m_minerTxMerkleTreeBranch.end(), ints.data() + kHashSize, ints.data() + kHashSize * 2);
keccak(ints.data(), kHashSize * 2, m_rootHash, kHashSize);
}
if (memcmp(h + i * kHashSize, h2, kHashSize) == 0) {
m_minerTxMerkleTreeBranch.insert(m_minerTxMerkleTreeBranch.end(), h + kHashSize * (i + 1), h + kHashSize * (i + 2));
memcpy(h2, ints.data() + j * kHashSize, kHashSize);
}
else if (memcmp(h + (i + 1) * kHashSize, h2, kHashSize) == 0) {
m_minerTxMerkleTreeBranch.insert(m_minerTxMerkleTreeBranch.end(), h + kHashSize * i, h + kHashSize * (i + 1));
memcpy(h2, ints.data() + j * kHashSize, kHashSize);
m_minerTxMerkleTreePath = 1;
}
}
while (cnt >= 2) {
cnt >>= 1;
for (size_t i = 0, j = 0; j < cnt; i += 2, ++j) {
uint8_t tmp[kHashSize];
keccak(ints.data() + i * kHashSize, kHashSize * 2, tmp, kHashSize);
if (memcmp(ints.data() + i * kHashSize, h2, kHashSize) == 0) {
m_minerTxMerkleTreeBranch.insert(m_minerTxMerkleTreeBranch.end(), ints.data() + kHashSize * (i + 1), ints.data() + kHashSize * (i + 2));
memcpy(h2, tmp, kHashSize);
m_minerTxMerkleTreePath <<= 1;
}
else if (memcmp(ints.data() + (i + 1) * kHashSize, h2, kHashSize) == 0) {
m_minerTxMerkleTreeBranch.insert(m_minerTxMerkleTreeBranch.end(), ints.data() + kHashSize * i, ints.data() + kHashSize * (i + 1));
memcpy(h2, tmp, kHashSize);
m_minerTxMerkleTreePath = (m_minerTxMerkleTreePath << 1) | 1;
}
memcpy(ints.data() + j * kHashSize, tmp, kHashSize);
}
}
memcpy(m_rootHash, ints.data(), kHashSize);
}
}
@@ -375,16 +404,42 @@ bool xmrig::BlockTemplate::parse(bool hashes)
ar(m_numHashes);
if (hashes) {
m_hashes.resize((m_numHashes + 1) * kHashSize);
calculateMinerTxHash(blob(MINER_TX_PREFIX_OFFSET), blob(MINER_TX_PREFIX_END_OFFSET), m_hashes.data());
// FCMP++ layout:
//
// index 0 fcmp_pp_n_tree_layers + 31 zero bytes
// index 1 fcmp_pp_tree_root
// index 2 coinbase transaction hash
// index 3+ other transaction hashes
//
// pre-FCMP++ layout:
//
// index 0 coinbase transaction hash
// index 1+ other transaction hashes
//
const uint32_t coinbase_tx_index = is_fcmp_pp ? 2 : 0;
m_hashes.clear();
m_hashes.resize((coinbase_tx_index + m_numHashes + 1) * kHashSize);
uint8_t* data = m_hashes.data() + coinbase_tx_index * kHashSize;
calculateMinerTxHash(blob(MINER_TX_PREFIX_OFFSET), blob(MINER_TX_PREFIX_END_OFFSET), data);
for (uint64_t i = 1; i <= m_numHashes; ++i) {
Span h;
ar(h, kHashSize);
memcpy(m_hashes.data() + i * kHashSize, h.data(), kHashSize);
memcpy(data + i * kHashSize, h.data(), kHashSize);
}
calculateMerkleTreeHash();
if (is_fcmp_pp) {
ar(m_FCMPTreeLayers);
ar(m_FCMPTreeRoot);
m_hashes[0] = m_FCMPTreeLayers;
memcpy(m_hashes.data() + kHashSize, m_FCMPTreeRoot, kHashSize);
}
calculateMerkleTreeHash(coinbase_tx_index);
}
return true;

View File

@@ -93,6 +93,7 @@ public:
inline uint64_t numHashes() const { return m_numHashes; }
inline const Buffer &hashes() const { return m_hashes; }
inline const Buffer &minerTxMerkleTreeBranch() const { return m_minerTxMerkleTreeBranch; }
inline uint32_t minerTxMerkleTreePath() const { return m_minerTxMerkleTreePath; }
inline const uint8_t *rootHash() const { return m_rootHash; }
inline Buffer generateHashingBlob() const
@@ -104,13 +105,13 @@ public:
}
static void calculateMinerTxHash(const uint8_t *prefix_begin, const uint8_t *prefix_end, uint8_t *hash);
static void calculateRootHash(const uint8_t *prefix_begin, const uint8_t *prefix_end, const Buffer &miner_tx_merkle_tree_branch, uint8_t *root_hash);
static void calculateRootHash(const uint8_t *prefix_begin, const uint8_t *prefix_end, const Buffer &miner_tx_merkle_tree_branch, uint32_t miner_tx_merkle_tree_path, uint8_t *root_hash);
bool parse(const Buffer &blocktemplate, const Coin &coin, bool hashes = kCalcHashes);
bool parse(const char *blocktemplate, size_t size, const Coin &coin, bool hashes);
bool parse(const rapidjson::Value &blocktemplate, const Coin &coin, bool hashes = kCalcHashes);
bool parse(const String &blocktemplate, const Coin &coin, bool hashes = kCalcHashes);
void calculateMerkleTreeHash();
void calculateMerkleTreeHash(uint32_t index);
void generateHashingBlob(Buffer &out) const;
private:
@@ -147,9 +148,12 @@ private:
uint64_t m_numHashes = 0;
Buffer m_hashes;
Buffer m_minerTxMerkleTreeBranch;
uint32_t m_minerTxMerkleTreePath = 0;
uint8_t m_rootHash[kHashSize]{};
uint8_t m_carrotViewTag[3]{};
uint8_t m_janusAnchor[16]{};
uint8_t m_FCMPTreeLayers = 0;
uint8_t m_FCMPTreeRoot[kHashSize]{};
};

View File

@@ -59,6 +59,7 @@
# include "crypto/rx/Profiler.h"
# include "crypto/rx/Rx.h"
# include "crypto/rx/RxConfig.h"
# include "crypto/rx/RxAlgo.h"
#endif
@@ -556,11 +557,12 @@ void xmrig::Miner::setJob(const Job &job, bool donate)
}
# ifdef XMRIG_ALGO_RANDOMX
if (job.algorithm().family() == Algorithm::RANDOM_X && !Rx::isReady(job)) {
if (job.algorithm().family() == Algorithm::RANDOM_X) {
if (d_ptr->algorithm != job.algorithm()) {
stop();
RxAlgo::apply(job.algorithm());
}
else {
else if (!Rx::isReady(job)) {
Nonce::pause(true);
Nonce::touch();
}
@@ -572,6 +574,7 @@ void xmrig::Miner::setJob(const Job &job, bool donate)
mutex.lock();
const uint8_t index = donate ? 1 : 0;
const bool same_job_index = d_ptr->job.index() == index;
d_ptr->reset = !(d_ptr->job.index() == 1 && index == 0 && d_ptr->userJobId == job.id());
@@ -591,7 +594,8 @@ void xmrig::Miner::setJob(const Job &job, bool donate)
const bool ready = d_ptr->initRX();
// Always reset nonce on RandomX dataset change
if (!ready) {
// Except for switching to/from donation
if (!ready && same_job_index) {
d_ptr->reset = true;
}
# else

View File

@@ -1,19 +1,8 @@
/* XMRig
* Copyright (c) 2018-2025 SChernykh <https://github.com/SChernykh>
* Copyright (c) 2016-2025 XMRig <https://github.com/xmrig>, <support@xmrig.com>
* Copyright (c) 2018-2026 SChernykh <https://github.com/SChernykh>
* Copyright (c) 2016-2026 XMRig <https://github.com/xmrig>, <support@xmrig.com>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
* SPDX-License-Identifier: GPL-3.0-or-later
*/
#pragma once
@@ -31,7 +20,7 @@
namespace xmrig {
static const char short_options[] = "a:c:kBp:Px:r:R:s:t:T:o:u:O:v:l:Sx:46";
static const char short_options[] = "a:c:kBp:x:r:R:s:t:T:o:u:O:v:l:S46";
static const option options[] = {

View File

@@ -235,7 +235,7 @@ static HashReturn Init(hashState *state, int hashbitlen)
/*initialize the initial hash value of JH*/
state->hashbitlen = hashbitlen;
/*load the intital hash value into state*/
/*load the initial hash value into state*/
switch (hashbitlen)
{
case 224: memcpy(state->x,JH224_H0,128); break;

View File

@@ -48,7 +48,7 @@
multiple of size / 8)
ptr_cast(x,size) casts a pointer to a pointer to a
varaiable of length 'size' bits
variable of length 'size' bits
*/
#define ui_type(size) uint##size##_t

View File

@@ -38,6 +38,13 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include "crypto/randomx/common.hpp"
#include "crypto/rx/Profiler.h"
#include "backend/cpu/Cpu.h"
#ifdef XMRIG_RISCV
#include "crypto/randomx/aes_hash_rv64_vector.hpp"
#include "crypto/randomx/aes_hash_rv64_zvkned.hpp"
#endif
#define AES_HASH_1R_STATE0 0xd7983aad, 0xcc82db47, 0x9fa856de, 0x92b52c0d
#define AES_HASH_1R_STATE1 0xace78057, 0xf59e125a, 0x15c7b798, 0x338d996e
#define AES_HASH_1R_STATE2 0xe8a07ce4, 0x5079506b, 0xae62c7d0, 0x6a770017
@@ -59,14 +66,27 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Hashing throughput: >20 GiB/s per CPU core with hardware AES
*/
template<int softAes>
void hashAes1Rx4(const void *input, size_t inputSize, void *hash) {
void hashAes1Rx4(const void *input, size_t inputSize, void *hash)
{
#ifdef XMRIG_RISCV
if (xmrig::Cpu::info()->hasAES()) {
hashAes1Rx4_zvkned(input, inputSize, hash);
return;
}
if (xmrig::Cpu::info()->hasRISCV_Vector()) {
hashAes1Rx4_RVV(input, inputSize, hash);
return;
}
#endif
const uint8_t* inptr = (uint8_t*)input;
const uint8_t* inputEnd = inptr + inputSize;
rx_vec_i128 state0, state1, state2, state3;
rx_vec_i128 in0, in1, in2, in3;
//intial state
//initial state
state0 = rx_set_int_vec_i128(AES_HASH_1R_STATE0);
state1 = rx_set_int_vec_i128(AES_HASH_1R_STATE1);
state2 = rx_set_int_vec_i128(AES_HASH_1R_STATE2);
@@ -127,7 +147,20 @@ template void hashAes1Rx4<true>(const void *input, size_t inputSize, void *hash)
calls to this function.
*/
template<int softAes>
void fillAes1Rx4(void *state, size_t outputSize, void *buffer) {
void fillAes1Rx4(void *state, size_t outputSize, void *buffer)
{
#ifdef XMRIG_RISCV
if (xmrig::Cpu::info()->hasAES()) {
fillAes1Rx4_zvkned(state, outputSize, buffer);
return;
}
if (xmrig::Cpu::info()->hasRISCV_Vector()) {
fillAes1Rx4_RVV(state, outputSize, buffer);
return;
}
#endif
const uint8_t* outptr = (uint8_t*)buffer;
const uint8_t* outputEnd = outptr + outputSize;
@@ -171,7 +204,20 @@ static constexpr randomx::Instruction inst{ 0xFF, 7, 7, 0xFF, 0xFFFFFFFFU };
alignas(16) static const randomx::Instruction inst_mask[2] = { inst, inst };
template<int softAes>
void fillAes4Rx4(void *state, size_t outputSize, void *buffer) {
void fillAes4Rx4(void *state, size_t outputSize, void *buffer)
{
#ifdef XMRIG_RISCV
if (xmrig::Cpu::info()->hasAES()) {
fillAes4Rx4_zvkned(state, outputSize, buffer);
return;
}
if (xmrig::Cpu::info()->hasRISCV_Vector()) {
fillAes4Rx4_RVV(state, outputSize, buffer);
return;
}
#endif
const uint8_t* outptr = (uint8_t*)buffer;
const uint8_t* outputEnd = outptr + outputSize;
@@ -235,134 +281,33 @@ void fillAes4Rx4(void *state, size_t outputSize, void *buffer) {
template void fillAes4Rx4<true>(void *state, size_t outputSize, void *buffer);
template void fillAes4Rx4<false>(void *state, size_t outputSize, void *buffer);
#if defined(XMRIG_RISCV) && defined(XMRIG_RVV_ENABLED)
static constexpr uint32_t AES_HASH_1R_STATE02[8] = { 0x92b52c0d, 0x9fa856de, 0xcc82db47, 0xd7983aad, 0x6a770017, 0xae62c7d0, 0x5079506b, 0xe8a07ce4 };
static constexpr uint32_t AES_HASH_1R_STATE13[8] = { 0x338d996e, 0x15c7b798, 0xf59e125a, 0xace78057, 0x630a240c, 0x07ad828d, 0x79a10005, 0x7e994948 };
static constexpr uint32_t AES_GEN_1R_KEY02[8] = { 0x6daca553, 0x62716609, 0xdbb5552b, 0xb4f44917, 0x3f1262f1, 0x9f947ec6, 0xf4c0794f, 0x3e20e345 };
static constexpr uint32_t AES_GEN_1R_KEY13[8] = { 0x6d7caf07, 0x846a710d, 0x1725d378, 0x0da1dc4e, 0x6aef8135, 0xb1ba317c, 0x16314c88, 0x49169154 };
static constexpr uint32_t AES_HASH_1R_XKEY00[8] = { 0xf6fa8389, 0x8b24949f, 0x90dc56bf, 0x06890201, 0xf6fa8389, 0x8b24949f, 0x90dc56bf, 0x06890201 };
static constexpr uint32_t AES_HASH_1R_XKEY11[8] = { 0x61b263d1, 0x51f4e03c, 0xee1043c6, 0xed18f99b, 0x61b263d1, 0x51f4e03c, 0xee1043c6, 0xed18f99b };
static constexpr uint32_t AES_HASH_STRIDE[8] = { 0, 4, 8, 12, 32, 36, 40, 44 };
#ifdef XMRIG_VAES
void hashAndFillAes1Rx4_VAES512(void *scratchpad, size_t scratchpadSize, void *hash, void* fill_state);
#endif
template<int softAes, int unroll>
void hashAndFillAes1Rx4(void *scratchpad, size_t scratchpadSize, void *hash, void* fill_state) {
void hashAndFillAes1Rx4(void *scratchpad, size_t scratchpadSize, void *hash, void* fill_state)
{
PROFILE_SCOPE(RandomX_AES);
uint8_t* scratchpadPtr = (uint8_t*)scratchpad;
const uint8_t* scratchpadEnd = scratchpadPtr + scratchpadSize;
vuint32m1_t hash_state02 = __riscv_vle32_v_u32m1(AES_HASH_1R_STATE02, 8);
vuint32m1_t hash_state13 = __riscv_vle32_v_u32m1(AES_HASH_1R_STATE13, 8);
const vuint32m1_t key02 = __riscv_vle32_v_u32m1(AES_GEN_1R_KEY02, 8);
const vuint32m1_t key13 = __riscv_vle32_v_u32m1(AES_GEN_1R_KEY13, 8);
const vuint32m1_t stride = __riscv_vle32_v_u32m1(AES_HASH_STRIDE, 8);
vuint32m1_t fill_state02 = __riscv_vluxei32_v_u32m1((uint32_t*)fill_state + 0, stride, 8);
vuint32m1_t fill_state13 = __riscv_vluxei32_v_u32m1((uint32_t*)fill_state + 4, stride, 8);
const vuint8m1_t lutenc_index0 = __riscv_vle8_v_u8m1(lutEncIndex[0], 32);
const vuint8m1_t lutenc_index1 = __riscv_vle8_v_u8m1(lutEncIndex[1], 32);
const vuint8m1_t lutenc_index2 = __riscv_vle8_v_u8m1(lutEncIndex[2], 32);
const vuint8m1_t lutenc_index3 = __riscv_vle8_v_u8m1(lutEncIndex[3], 32);
const vuint8m1_t& lutdec_index0 = lutenc_index0;
const vuint8m1_t lutdec_index1 = __riscv_vle8_v_u8m1(lutDecIndex[1], 32);
const vuint8m1_t& lutdec_index2 = lutenc_index2;
const vuint8m1_t lutdec_index3 = __riscv_vle8_v_u8m1(lutDecIndex[3], 32);
//process 64 bytes at a time in 4 lanes
while (scratchpadPtr < scratchpadEnd) {
#define HASH_STATE(k) \
hash_state02 = softaes_vector_double(hash_state02, __riscv_vluxei32_v_u32m1((uint32_t*)scratchpadPtr + k * 16 + 0, stride, 8), lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3); \
hash_state13 = softaes_vector_double(hash_state13, __riscv_vluxei32_v_u32m1((uint32_t*)scratchpadPtr + k * 16 + 4, stride, 8), lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3);
#define FILL_STATE(k) \
fill_state02 = softaes_vector_double(fill_state02, key02, lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3); \
fill_state13 = softaes_vector_double(fill_state13, key13, lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3); \
__riscv_vsuxei32_v_u32m1((uint32_t*)scratchpadPtr + k * 16 + 0, stride, fill_state02, 8); \
__riscv_vsuxei32_v_u32m1((uint32_t*)scratchpadPtr + k * 16 + 4, stride, fill_state13, 8);
switch (softAes) {
case 0:
HASH_STATE(0);
HASH_STATE(1);
FILL_STATE(0);
FILL_STATE(1);
scratchpadPtr += 128;
break;
default:
switch (unroll) {
case 4:
HASH_STATE(0);
FILL_STATE(0);
HASH_STATE(1);
FILL_STATE(1);
HASH_STATE(2);
FILL_STATE(2);
HASH_STATE(3);
FILL_STATE(3);
scratchpadPtr += 64 * 4;
break;
case 2:
HASH_STATE(0);
FILL_STATE(0);
HASH_STATE(1);
FILL_STATE(1);
scratchpadPtr += 64 * 2;
break;
default:
HASH_STATE(0);
FILL_STATE(0);
scratchpadPtr += 64;
break;
}
break;
}
#ifdef XMRIG_RISCV
if (xmrig::Cpu::info()->hasAES()) {
hashAndFillAes1Rx4_zvkned(scratchpad, scratchpadSize, hash, fill_state);
return;
}
#undef HASH_STATE
#undef FILL_STATE
if (xmrig::Cpu::info()->hasRISCV_Vector()) {
hashAndFillAes1Rx4_RVV(scratchpad, scratchpadSize, hash, fill_state);
return;
}
#endif
__riscv_vsuxei32_v_u32m1((uint32_t*)fill_state + 0, stride, fill_state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)fill_state + 4, stride, fill_state13, 8);
//two extra rounds to achieve full diffusion
const vuint32m1_t xkey00 = __riscv_vle32_v_u32m1(AES_HASH_1R_XKEY00, 8);
const vuint32m1_t xkey11 = __riscv_vle32_v_u32m1(AES_HASH_1R_XKEY11, 8);
hash_state02 = softaes_vector_double(hash_state02, xkey00, lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3);
hash_state13 = softaes_vector_double(hash_state13, xkey00, lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3);
hash_state02 = softaes_vector_double(hash_state02, xkey11, lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3);
hash_state13 = softaes_vector_double(hash_state13, xkey11, lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3);
//output hash
__riscv_vsuxei32_v_u32m1((uint32_t*)hash + 0, stride, hash_state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)hash + 4, stride, hash_state13, 8);
}
#else // defined(XMRIG_RISCV) && defined(XMRIG_RVV_ENABLED)
template<int softAes, int unroll>
void hashAndFillAes1Rx4(void *scratchpad, size_t scratchpadSize, void *hash, void* fill_state) {
PROFILE_SCOPE(RandomX_AES);
#ifdef XMRIG_VAES
if (xmrig::Cpu::info()->arch() == xmrig::ICpuInfo::ARCH_ZEN5) {
hashAndFillAes1Rx4_VAES512(scratchpad, scratchpadSize, hash, fill_state);
return;
}
#endif
uint8_t* scratchpadPtr = (uint8_t*)scratchpad;
const uint8_t* scratchpadEnd = scratchpadPtr + scratchpadSize;
@@ -500,7 +445,6 @@ void hashAndFillAes1Rx4(void *scratchpad, size_t scratchpadSize, void *hash, voi
rx_store_vec_i128((rx_vec_i128*)hash + 2, hash_state2);
rx_store_vec_i128((rx_vec_i128*)hash + 3, hash_state3);
}
#endif // defined(XMRIG_RISCV) && defined(XMRIG_RVV_ENABLED)
template void hashAndFillAes1Rx4<0,2>(void* scratchpad, size_t scratchpadSize, void* hash, void* fill_state);
template void hashAndFillAes1Rx4<1,1>(void* scratchpad, size_t scratchpadSize, void* hash, void* fill_state);
@@ -512,43 +456,54 @@ hashAndFillAes1Rx4_impl* softAESImpl = &hashAndFillAes1Rx4<1,1>;
void SelectSoftAESImpl(size_t threadsCount)
{
constexpr uint64_t test_length_ms = 100;
const std::array<hashAndFillAes1Rx4_impl *, 4> impl = {
&hashAndFillAes1Rx4<1,1>,
&hashAndFillAes1Rx4<2,1>,
&hashAndFillAes1Rx4<2,2>,
&hashAndFillAes1Rx4<2,4>,
};
size_t fast_idx = 0;
double fast_speed = 0.0;
for (size_t run = 0; run < 3; ++run) {
for (size_t i = 0; i < impl.size(); ++i) {
const double t1 = xmrig::Chrono::highResolutionMSecs();
std::vector<uint32_t> count(threadsCount, 0);
std::vector<std::thread> threads;
for (size_t t = 0; t < threadsCount; ++t) {
threads.emplace_back([&, t]() {
std::vector<uint8_t> scratchpad(10 * 1024);
alignas(16) uint8_t hash[64] = {};
alignas(16) uint8_t state[64] = {};
do {
(*impl[i])(scratchpad.data(), scratchpad.size(), hash, state);
++count[t];
} while (xmrig::Chrono::highResolutionMSecs() - t1 < test_length_ms);
});
}
uint32_t total = 0;
for (size_t t = 0; t < threadsCount; ++t) {
threads[t].join();
total += count[t];
}
const double t2 = xmrig::Chrono::highResolutionMSecs();
const double speed = total * 1e3 / (t2 - t1);
if (speed > fast_speed) {
fast_idx = i;
fast_speed = speed;
}
}
}
softAESImpl = impl[fast_idx];
constexpr uint64_t test_length_ms = 100;
const std::array<hashAndFillAes1Rx4_impl *, 4> impl = {
&hashAndFillAes1Rx4<1,1>,
&hashAndFillAes1Rx4<2,1>,
&hashAndFillAes1Rx4<2,2>,
&hashAndFillAes1Rx4<2,4>,
};
size_t fast_idx = 0;
double fast_speed = 0.0;
for (size_t run = 0; run < 3; ++run) {
for (size_t i = 0; i < impl.size(); ++i) {
const double t1 = xmrig::Chrono::highResolutionMSecs();
std::vector<uint32_t> count(threadsCount, 0);
std::vector<std::thread> threads;
for (size_t t = 0; t < threadsCount; ++t) {
threads.emplace_back([&, t]() {
std::vector<uint8_t> scratchpad(10 * 1024);
alignas(16) uint8_t hash[64] = {};
alignas(16) uint8_t state[64] = {};
do {
(*impl[i])(scratchpad.data(), scratchpad.size(), hash, state);
++count[t];
} while (xmrig::Chrono::highResolutionMSecs() - t1 < test_length_ms);
});
}
uint32_t total = 0;
for (size_t t = 0; t < threadsCount; ++t) {
threads[t].join();
total += count[t];
}
const double t2 = xmrig::Chrono::highResolutionMSecs();
const double speed = total * 1e3 / (t2 - t1);
if (speed > fast_speed) {
fast_idx = i;
fast_speed = speed;
}
}
}
softAESImpl = impl[fast_idx];
}

View File

@@ -0,0 +1,272 @@
/*
Copyright (c) 2025 SChernykh <https://github.com/SChernykh>
Copyright (c) 2025 XMRig <support@xmrig.com>
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <riscv_vector.h>
#include "crypto/randomx/soft_aes.h"
#include "crypto/randomx/randomx.h"
static FORCE_INLINE vuint32m1_t softaes_vector_double(
vuint32m1_t in,
vuint32m1_t key,
vuint8m1_t i0, vuint8m1_t i1, vuint8m1_t i2, vuint8m1_t i3,
const uint32_t* lut0, const uint32_t* lut1, const uint32_t *lut2, const uint32_t* lut3)
{
const vuint8m1_t in8 = __riscv_vreinterpret_v_u32m1_u8m1(in);
const vuint32m1_t index0 = __riscv_vreinterpret_v_u8m1_u32m1(__riscv_vrgather_vv_u8m1(in8, i0, 32));
const vuint32m1_t index1 = __riscv_vreinterpret_v_u8m1_u32m1(__riscv_vrgather_vv_u8m1(in8, i1, 32));
const vuint32m1_t index2 = __riscv_vreinterpret_v_u8m1_u32m1(__riscv_vrgather_vv_u8m1(in8, i2, 32));
const vuint32m1_t index3 = __riscv_vreinterpret_v_u8m1_u32m1(__riscv_vrgather_vv_u8m1(in8, i3, 32));
vuint32m1_t s0 = __riscv_vluxei32_v_u32m1(lut0, __riscv_vsll_vx_u32m1(index0, 2, 8), 8);
vuint32m1_t s1 = __riscv_vluxei32_v_u32m1(lut1, __riscv_vsll_vx_u32m1(index1, 2, 8), 8);
vuint32m1_t s2 = __riscv_vluxei32_v_u32m1(lut2, __riscv_vsll_vx_u32m1(index2, 2, 8), 8);
vuint32m1_t s3 = __riscv_vluxei32_v_u32m1(lut3, __riscv_vsll_vx_u32m1(index3, 2, 8), 8);
s0 = __riscv_vxor_vv_u32m1(s0, s1, 8);
s2 = __riscv_vxor_vv_u32m1(s2, s3, 8);
s0 = __riscv_vxor_vv_u32m1(s0, s2, 8);
return __riscv_vxor_vv_u32m1(s0, key, 8);
}
static constexpr uint32_t AES_HASH_1R_STATE02[8] = { 0x92b52c0d, 0x9fa856de, 0xcc82db47, 0xd7983aad, 0x6a770017, 0xae62c7d0, 0x5079506b, 0xe8a07ce4 };
static constexpr uint32_t AES_HASH_1R_STATE13[8] = { 0x338d996e, 0x15c7b798, 0xf59e125a, 0xace78057, 0x630a240c, 0x07ad828d, 0x79a10005, 0x7e994948 };
static constexpr uint32_t AES_GEN_1R_KEY02[8] = { 0x6daca553, 0x62716609, 0xdbb5552b, 0xb4f44917, 0x3f1262f1, 0x9f947ec6, 0xf4c0794f, 0x3e20e345 };
static constexpr uint32_t AES_GEN_1R_KEY13[8] = { 0x6d7caf07, 0x846a710d, 0x1725d378, 0x0da1dc4e, 0x6aef8135, 0xb1ba317c, 0x16314c88, 0x49169154 };
static constexpr uint32_t AES_HASH_1R_XKEY00[8] = { 0xf6fa8389, 0x8b24949f, 0x90dc56bf, 0x06890201, 0xf6fa8389, 0x8b24949f, 0x90dc56bf, 0x06890201 };
static constexpr uint32_t AES_HASH_1R_XKEY11[8] = { 0x61b263d1, 0x51f4e03c, 0xee1043c6, 0xed18f99b, 0x61b263d1, 0x51f4e03c, 0xee1043c6, 0xed18f99b };
static constexpr uint32_t AES_HASH_STRIDE_X2[8] = { 0, 4, 8, 12, 32, 36, 40, 44 };
static constexpr uint32_t AES_HASH_STRIDE_X4[8] = { 0, 4, 8, 12, 64, 68, 72, 76 };
#define lutEnc0 lutEnc[0]
#define lutEnc1 lutEnc[1]
#define lutEnc2 lutEnc[2]
#define lutEnc3 lutEnc[3]
#define lutDec0 lutDec[0]
#define lutDec1 lutDec[1]
#define lutDec2 lutDec[2]
#define lutDec3 lutDec[3]
void hashAes1Rx4_RVV(const void *input, size_t inputSize, void *hash) {
const uint8_t* inptr = (const uint8_t*)input;
const uint8_t* inputEnd = inptr + inputSize;
//intial state
vuint32m1_t state02 = __riscv_vle32_v_u32m1(AES_HASH_1R_STATE02, 8);
vuint32m1_t state13 = __riscv_vle32_v_u32m1(AES_HASH_1R_STATE13, 8);
const vuint32m1_t stride = __riscv_vle32_v_u32m1(AES_HASH_STRIDE_X2, 8);
const vuint8m1_t lutenc_index0 = __riscv_vle8_v_u8m1(lutEncIndex[0], 32);
const vuint8m1_t lutenc_index1 = __riscv_vle8_v_u8m1(lutEncIndex[1], 32);
const vuint8m1_t lutenc_index2 = __riscv_vle8_v_u8m1(lutEncIndex[2], 32);
const vuint8m1_t lutenc_index3 = __riscv_vle8_v_u8m1(lutEncIndex[3], 32);
const vuint8m1_t& lutdec_index0 = lutenc_index0;
const vuint8m1_t lutdec_index1 = __riscv_vle8_v_u8m1(lutDecIndex[1], 32);
const vuint8m1_t& lutdec_index2 = lutenc_index2;
const vuint8m1_t lutdec_index3 = __riscv_vle8_v_u8m1(lutDecIndex[3], 32);
//process 64 bytes at a time in 4 lanes
while (inptr < inputEnd) {
state02 = softaes_vector_double(state02, __riscv_vluxei32_v_u32m1((uint32_t*)inptr + 0, stride, 8), lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3);
state13 = softaes_vector_double(state13, __riscv_vluxei32_v_u32m1((uint32_t*)inptr + 4, stride, 8), lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3);
inptr += 64;
}
//two extra rounds to achieve full diffusion
const vuint32m1_t xkey00 = __riscv_vle32_v_u32m1(AES_HASH_1R_XKEY00, 8);
const vuint32m1_t xkey11 = __riscv_vle32_v_u32m1(AES_HASH_1R_XKEY11, 8);
state02 = softaes_vector_double(state02, xkey00, lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3);
state13 = softaes_vector_double(state13, xkey00, lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3);
state02 = softaes_vector_double(state02, xkey11, lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3);
state13 = softaes_vector_double(state13, xkey11, lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3);
//output hash
__riscv_vsuxei32_v_u32m1((uint32_t*)hash + 0, stride, state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)hash + 4, stride, state13, 8);
}
void fillAes1Rx4_RVV(void *state, size_t outputSize, void *buffer) {
const uint8_t* outptr = (uint8_t*)buffer;
const uint8_t* outputEnd = outptr + outputSize;
const vuint32m1_t key02 = __riscv_vle32_v_u32m1(AES_GEN_1R_KEY02, 8);
const vuint32m1_t key13 = __riscv_vle32_v_u32m1(AES_GEN_1R_KEY13, 8);
const vuint32m1_t stride = __riscv_vle32_v_u32m1(AES_HASH_STRIDE_X2, 8);
vuint32m1_t state02 = __riscv_vluxei32_v_u32m1((uint32_t*)state + 0, stride, 8);
vuint32m1_t state13 = __riscv_vluxei32_v_u32m1((uint32_t*)state + 4, stride, 8);
const vuint8m1_t lutenc_index0 = __riscv_vle8_v_u8m1(lutEncIndex[0], 32);
const vuint8m1_t lutenc_index1 = __riscv_vle8_v_u8m1(lutEncIndex[1], 32);
const vuint8m1_t lutenc_index2 = __riscv_vle8_v_u8m1(lutEncIndex[2], 32);
const vuint8m1_t lutenc_index3 = __riscv_vle8_v_u8m1(lutEncIndex[3], 32);
const vuint8m1_t& lutdec_index0 = lutenc_index0;
const vuint8m1_t lutdec_index1 = __riscv_vle8_v_u8m1(lutDecIndex[1], 32);
const vuint8m1_t& lutdec_index2 = lutenc_index2;
const vuint8m1_t lutdec_index3 = __riscv_vle8_v_u8m1(lutDecIndex[3], 32);
while (outptr < outputEnd) {
state02 = softaes_vector_double(state02, key02, lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3);
state13 = softaes_vector_double(state13, key13, lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3);
__riscv_vsuxei32_v_u32m1((uint32_t*)outptr + 0, stride, state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)outptr + 4, stride, state13, 8);
outptr += 64;
}
__riscv_vsuxei32_v_u32m1((uint32_t*)state + 0, stride, state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)state + 4, stride, state13, 8);
}
void fillAes4Rx4_RVV(void *state, size_t outputSize, void *buffer) {
const uint8_t* outptr = (uint8_t*)buffer;
const uint8_t* outputEnd = outptr + outputSize;
const vuint32m1_t stride4 = __riscv_vle32_v_u32m1(AES_HASH_STRIDE_X4, 8);
const vuint32m1_t key04 = __riscv_vluxei32_v_u32m1((uint32_t*)(RandomX_CurrentConfig.fillAes4Rx4_Key + 0), stride4, 8);
const vuint32m1_t key15 = __riscv_vluxei32_v_u32m1((uint32_t*)(RandomX_CurrentConfig.fillAes4Rx4_Key + 1), stride4, 8);
const vuint32m1_t key26 = __riscv_vluxei32_v_u32m1((uint32_t*)(RandomX_CurrentConfig.fillAes4Rx4_Key + 2), stride4, 8);
const vuint32m1_t key37 = __riscv_vluxei32_v_u32m1((uint32_t*)(RandomX_CurrentConfig.fillAes4Rx4_Key + 3), stride4, 8);
const vuint32m1_t stride = __riscv_vle32_v_u32m1(AES_HASH_STRIDE_X2, 8);
vuint32m1_t state02 = __riscv_vluxei32_v_u32m1((uint32_t*)state + 0, stride, 8);
vuint32m1_t state13 = __riscv_vluxei32_v_u32m1((uint32_t*)state + 4, stride, 8);
const vuint8m1_t lutenc_index0 = __riscv_vle8_v_u8m1(lutEncIndex[0], 32);
const vuint8m1_t lutenc_index1 = __riscv_vle8_v_u8m1(lutEncIndex[1], 32);
const vuint8m1_t lutenc_index2 = __riscv_vle8_v_u8m1(lutEncIndex[2], 32);
const vuint8m1_t lutenc_index3 = __riscv_vle8_v_u8m1(lutEncIndex[3], 32);
const vuint8m1_t& lutdec_index0 = lutenc_index0;
const vuint8m1_t lutdec_index1 = __riscv_vle8_v_u8m1(lutDecIndex[1], 32);
const vuint8m1_t& lutdec_index2 = lutenc_index2;
const vuint8m1_t lutdec_index3 = __riscv_vle8_v_u8m1(lutDecIndex[3], 32);
while (outptr < outputEnd) {
state02 = softaes_vector_double(state02, key04, lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3);
state13 = softaes_vector_double(state13, key04, lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3);
state02 = softaes_vector_double(state02, key15, lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3);
state13 = softaes_vector_double(state13, key15, lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3);
state02 = softaes_vector_double(state02, key26, lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3);
state13 = softaes_vector_double(state13, key26, lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3);
state02 = softaes_vector_double(state02, key37, lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3);
state13 = softaes_vector_double(state13, key37, lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3);
__riscv_vsuxei32_v_u32m1((uint32_t*)outptr + 0, stride, state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)outptr + 4, stride, state13, 8);
outptr += 64;
}
}
void hashAndFillAes1Rx4_RVV(void *scratchpad, size_t scratchpadSize, void *hash, void* fill_state) {
uint8_t* scratchpadPtr = (uint8_t*)scratchpad;
const uint8_t* scratchpadEnd = scratchpadPtr + scratchpadSize;
vuint32m1_t hash_state02 = __riscv_vle32_v_u32m1(AES_HASH_1R_STATE02, 8);
vuint32m1_t hash_state13 = __riscv_vle32_v_u32m1(AES_HASH_1R_STATE13, 8);
const vuint32m1_t key02 = __riscv_vle32_v_u32m1(AES_GEN_1R_KEY02, 8);
const vuint32m1_t key13 = __riscv_vle32_v_u32m1(AES_GEN_1R_KEY13, 8);
const vuint32m1_t stride = __riscv_vle32_v_u32m1(AES_HASH_STRIDE_X2, 8);
vuint32m1_t fill_state02 = __riscv_vluxei32_v_u32m1((uint32_t*)fill_state + 0, stride, 8);
vuint32m1_t fill_state13 = __riscv_vluxei32_v_u32m1((uint32_t*)fill_state + 4, stride, 8);
const vuint8m1_t lutenc_index0 = __riscv_vle8_v_u8m1(lutEncIndex[0], 32);
const vuint8m1_t lutenc_index1 = __riscv_vle8_v_u8m1(lutEncIndex[1], 32);
const vuint8m1_t lutenc_index2 = __riscv_vle8_v_u8m1(lutEncIndex[2], 32);
const vuint8m1_t lutenc_index3 = __riscv_vle8_v_u8m1(lutEncIndex[3], 32);
const vuint8m1_t& lutdec_index0 = lutenc_index0;
const vuint8m1_t lutdec_index1 = __riscv_vle8_v_u8m1(lutDecIndex[1], 32);
const vuint8m1_t& lutdec_index2 = lutenc_index2;
const vuint8m1_t lutdec_index3 = __riscv_vle8_v_u8m1(lutDecIndex[3], 32);
//process 64 bytes at a time in 4 lanes
while (scratchpadPtr < scratchpadEnd) {
#define HASH_STATE(k) \
hash_state02 = softaes_vector_double(hash_state02, __riscv_vluxei32_v_u32m1((uint32_t*)scratchpadPtr + k * 16 + 0, stride, 8), lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3); \
hash_state13 = softaes_vector_double(hash_state13, __riscv_vluxei32_v_u32m1((uint32_t*)scratchpadPtr + k * 16 + 4, stride, 8), lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3);
#define FILL_STATE(k) \
fill_state02 = softaes_vector_double(fill_state02, key02, lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3); \
fill_state13 = softaes_vector_double(fill_state13, key13, lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3); \
__riscv_vsuxei32_v_u32m1((uint32_t*)scratchpadPtr + k * 16 + 0, stride, fill_state02, 8); \
__riscv_vsuxei32_v_u32m1((uint32_t*)scratchpadPtr + k * 16 + 4, stride, fill_state13, 8);
HASH_STATE(0);
HASH_STATE(1);
FILL_STATE(0);
FILL_STATE(1);
scratchpadPtr += 128;
}
#undef HASH_STATE
#undef FILL_STATE
__riscv_vsuxei32_v_u32m1((uint32_t*)fill_state + 0, stride, fill_state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)fill_state + 4, stride, fill_state13, 8);
//two extra rounds to achieve full diffusion
const vuint32m1_t xkey00 = __riscv_vle32_v_u32m1(AES_HASH_1R_XKEY00, 8);
const vuint32m1_t xkey11 = __riscv_vle32_v_u32m1(AES_HASH_1R_XKEY11, 8);
hash_state02 = softaes_vector_double(hash_state02, xkey00, lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3);
hash_state13 = softaes_vector_double(hash_state13, xkey00, lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3);
hash_state02 = softaes_vector_double(hash_state02, xkey11, lutenc_index0, lutenc_index1, lutenc_index2, lutenc_index3, lutEnc0, lutEnc1, lutEnc2, lutEnc3);
hash_state13 = softaes_vector_double(hash_state13, xkey11, lutdec_index0, lutdec_index1, lutdec_index2, lutdec_index3, lutDec0, lutDec1, lutDec2, lutDec3);
//output hash
__riscv_vsuxei32_v_u32m1((uint32_t*)hash + 0, stride, hash_state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)hash + 4, stride, hash_state13, 8);
}

View File

@@ -0,0 +1,35 @@
/*
Copyright (c) 2025 SChernykh <https://github.com/SChernykh>
Copyright (c) 2025 XMRig <support@xmrig.com>
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#pragma once
void hashAes1Rx4_RVV(const void *input, size_t inputSize, void *hash);
void fillAes1Rx4_RVV(void *state, size_t outputSize, void *buffer);
void fillAes4Rx4_RVV(void *state, size_t outputSize, void *buffer);
void hashAndFillAes1Rx4_RVV(void *scratchpad, size_t scratchpadSize, void *hash, void* fill_state);

View File

@@ -0,0 +1,199 @@
/*
Copyright (c) 2025 SChernykh <https://github.com/SChernykh>
Copyright (c) 2025 XMRig <support@xmrig.com>
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "crypto/randomx/aes_hash.hpp"
#include "crypto/randomx/randomx.h"
#include "crypto/rx/Profiler.h"
#include <riscv_vector.h>
static FORCE_INLINE vuint32m1_t aesenc_zvkned(vuint32m1_t a, vuint32m1_t b) { return __riscv_vaesem_vv_u32m1(a, b, 8); }
static FORCE_INLINE vuint32m1_t aesdec_zvkned(vuint32m1_t a, vuint32m1_t b, vuint32m1_t zero) { return __riscv_vxor_vv_u32m1(__riscv_vaesdm_vv_u32m1(a, zero, 8), b, 8); }
static constexpr uint32_t AES_HASH_1R_STATE02[8] = { 0x92b52c0d, 0x9fa856de, 0xcc82db47, 0xd7983aad, 0x6a770017, 0xae62c7d0, 0x5079506b, 0xe8a07ce4 };
static constexpr uint32_t AES_HASH_1R_STATE13[8] = { 0x338d996e, 0x15c7b798, 0xf59e125a, 0xace78057, 0x630a240c, 0x07ad828d, 0x79a10005, 0x7e994948 };
static constexpr uint32_t AES_GEN_1R_KEY02[8] = { 0x6daca553, 0x62716609, 0xdbb5552b, 0xb4f44917, 0x3f1262f1, 0x9f947ec6, 0xf4c0794f, 0x3e20e345 };
static constexpr uint32_t AES_GEN_1R_KEY13[8] = { 0x6d7caf07, 0x846a710d, 0x1725d378, 0x0da1dc4e, 0x6aef8135, 0xb1ba317c, 0x16314c88, 0x49169154 };
static constexpr uint32_t AES_HASH_1R_XKEY00[8] = { 0xf6fa8389, 0x8b24949f, 0x90dc56bf, 0x06890201, 0xf6fa8389, 0x8b24949f, 0x90dc56bf, 0x06890201 };
static constexpr uint32_t AES_HASH_1R_XKEY11[8] = { 0x61b263d1, 0x51f4e03c, 0xee1043c6, 0xed18f99b, 0x61b263d1, 0x51f4e03c, 0xee1043c6, 0xed18f99b };
static constexpr uint32_t AES_HASH_STRIDE_X2[8] = { 0, 4, 8, 12, 32, 36, 40, 44 };
static constexpr uint32_t AES_HASH_STRIDE_X4[8] = { 0, 4, 8, 12, 64, 68, 72, 76 };
void hashAes1Rx4_zvkned(const void *input, size_t inputSize, void *hash)
{
const uint8_t* inptr = (const uint8_t*)input;
const uint8_t* inputEnd = inptr + inputSize;
//intial state
vuint32m1_t state02 = __riscv_vle32_v_u32m1(AES_HASH_1R_STATE02, 8);
vuint32m1_t state13 = __riscv_vle32_v_u32m1(AES_HASH_1R_STATE13, 8);
const vuint32m1_t stride = __riscv_vle32_v_u32m1(AES_HASH_STRIDE_X2, 8);
const vuint32m1_t zero = {};
//process 64 bytes at a time in 4 lanes
while (inptr < inputEnd) {
state02 = aesenc_zvkned(state02, __riscv_vluxei32_v_u32m1((uint32_t*)inptr + 0, stride, 8));
state13 = aesdec_zvkned(state13, __riscv_vluxei32_v_u32m1((uint32_t*)inptr + 4, stride, 8), zero);
inptr += 64;
}
//two extra rounds to achieve full diffusion
const vuint32m1_t xkey00 = __riscv_vle32_v_u32m1(AES_HASH_1R_XKEY00, 8);
const vuint32m1_t xkey11 = __riscv_vle32_v_u32m1(AES_HASH_1R_XKEY11, 8);
state02 = aesenc_zvkned(state02, xkey00);
state13 = aesdec_zvkned(state13, xkey00, zero);
state02 = aesenc_zvkned(state02, xkey11);
state13 = aesdec_zvkned(state13, xkey11, zero);
//output hash
__riscv_vsuxei32_v_u32m1((uint32_t*)hash + 0, stride, state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)hash + 4, stride, state13, 8);
}
void fillAes1Rx4_zvkned(void *state, size_t outputSize, void *buffer)
{
const uint8_t* outptr = (uint8_t*)buffer;
const uint8_t* outputEnd = outptr + outputSize;
const vuint32m1_t key02 = __riscv_vle32_v_u32m1(AES_GEN_1R_KEY02, 8);
const vuint32m1_t key13 = __riscv_vle32_v_u32m1(AES_GEN_1R_KEY13, 8);
const vuint32m1_t stride = __riscv_vle32_v_u32m1(AES_HASH_STRIDE_X2, 8);
const vuint32m1_t zero = {};
vuint32m1_t state02 = __riscv_vluxei32_v_u32m1((uint32_t*)state + 0, stride, 8);
vuint32m1_t state13 = __riscv_vluxei32_v_u32m1((uint32_t*)state + 4, stride, 8);
while (outptr < outputEnd) {
state02 = aesdec_zvkned(state02, key02, zero);
state13 = aesenc_zvkned(state13, key13);
__riscv_vsuxei32_v_u32m1((uint32_t*)outptr + 0, stride, state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)outptr + 4, stride, state13, 8);
outptr += 64;
}
__riscv_vsuxei32_v_u32m1((uint32_t*)state + 0, stride, state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)state + 4, stride, state13, 8);
}
void fillAes4Rx4_zvkned(void *state, size_t outputSize, void *buffer)
{
const uint8_t* outptr = (uint8_t*)buffer;
const uint8_t* outputEnd = outptr + outputSize;
const vuint32m1_t stride4 = __riscv_vle32_v_u32m1(AES_HASH_STRIDE_X4, 8);
const vuint32m1_t key04 = __riscv_vluxei32_v_u32m1((uint32_t*)(RandomX_CurrentConfig.fillAes4Rx4_Key + 0), stride4, 8);
const vuint32m1_t key15 = __riscv_vluxei32_v_u32m1((uint32_t*)(RandomX_CurrentConfig.fillAes4Rx4_Key + 1), stride4, 8);
const vuint32m1_t key26 = __riscv_vluxei32_v_u32m1((uint32_t*)(RandomX_CurrentConfig.fillAes4Rx4_Key + 2), stride4, 8);
const vuint32m1_t key37 = __riscv_vluxei32_v_u32m1((uint32_t*)(RandomX_CurrentConfig.fillAes4Rx4_Key + 3), stride4, 8);
const vuint32m1_t stride = __riscv_vle32_v_u32m1(AES_HASH_STRIDE_X2, 8);
const vuint32m1_t zero = {};
vuint32m1_t state02 = __riscv_vluxei32_v_u32m1((uint32_t*)state + 0, stride, 8);
vuint32m1_t state13 = __riscv_vluxei32_v_u32m1((uint32_t*)state + 4, stride, 8);
while (outptr < outputEnd) {
state02 = aesdec_zvkned(state02, key04, zero);
state13 = aesenc_zvkned(state13, key04);
state02 = aesdec_zvkned(state02, key15, zero);
state13 = aesenc_zvkned(state13, key15);
state02 = aesdec_zvkned(state02, key26, zero);
state13 = aesenc_zvkned(state13, key26);
state02 = aesdec_zvkned(state02, key37, zero);
state13 = aesenc_zvkned(state13, key37);
__riscv_vsuxei32_v_u32m1((uint32_t*)outptr + 0, stride, state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)outptr + 4, stride, state13, 8);
outptr += 64;
}
}
void hashAndFillAes1Rx4_zvkned(void *scratchpad, size_t scratchpadSize, void *hash, void* fill_state)
{
uint8_t* scratchpadPtr = (uint8_t*)scratchpad;
const uint8_t* scratchpadEnd = scratchpadPtr + scratchpadSize;
vuint32m1_t hash_state02 = __riscv_vle32_v_u32m1(AES_HASH_1R_STATE02, 8);
vuint32m1_t hash_state13 = __riscv_vle32_v_u32m1(AES_HASH_1R_STATE13, 8);
const vuint32m1_t key02 = __riscv_vle32_v_u32m1(AES_GEN_1R_KEY02, 8);
const vuint32m1_t key13 = __riscv_vle32_v_u32m1(AES_GEN_1R_KEY13, 8);
const vuint32m1_t stride = __riscv_vle32_v_u32m1(AES_HASH_STRIDE_X2, 8);
const vuint32m1_t zero = {};
vuint32m1_t fill_state02 = __riscv_vluxei32_v_u32m1((uint32_t*)fill_state + 0, stride, 8);
vuint32m1_t fill_state13 = __riscv_vluxei32_v_u32m1((uint32_t*)fill_state + 4, stride, 8);
//process 64 bytes at a time in 4 lanes
while (scratchpadPtr < scratchpadEnd) {
hash_state02 = aesenc_zvkned(hash_state02, __riscv_vluxei32_v_u32m1((uint32_t*)scratchpadPtr + 0, stride, 8));
hash_state13 = aesdec_zvkned(hash_state13, __riscv_vluxei32_v_u32m1((uint32_t*)scratchpadPtr + 4, stride, 8), zero);
fill_state02 = aesdec_zvkned(fill_state02, key02, zero);
fill_state13 = aesenc_zvkned(fill_state13, key13);
__riscv_vsuxei32_v_u32m1((uint32_t*)scratchpadPtr + 0, stride, fill_state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)scratchpadPtr + 4, stride, fill_state13, 8);
scratchpadPtr += 64;
}
__riscv_vsuxei32_v_u32m1((uint32_t*)fill_state + 0, stride, fill_state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)fill_state + 4, stride, fill_state13, 8);
//two extra rounds to achieve full diffusion
const vuint32m1_t xkey00 = __riscv_vle32_v_u32m1(AES_HASH_1R_XKEY00, 8);
const vuint32m1_t xkey11 = __riscv_vle32_v_u32m1(AES_HASH_1R_XKEY11, 8);
hash_state02 = aesenc_zvkned(hash_state02, xkey00);
hash_state13 = aesdec_zvkned(hash_state13, xkey00, zero);
hash_state02 = aesenc_zvkned(hash_state02, xkey11);
hash_state13 = aesdec_zvkned(hash_state13, xkey11, zero);
//output hash
__riscv_vsuxei32_v_u32m1((uint32_t*)hash + 0, stride, hash_state02, 8);
__riscv_vsuxei32_v_u32m1((uint32_t*)hash + 4, stride, hash_state13, 8);
}

View File

@@ -0,0 +1,35 @@
/*
Copyright (c) 2025 SChernykh <https://github.com/SChernykh>
Copyright (c) 2025 XMRig <support@xmrig.com>
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#pragma once
void hashAes1Rx4_zvkned(const void *input, size_t inputSize, void *hash);
void fillAes1Rx4_zvkned(void *state, size_t outputSize, void *buffer);
void fillAes4Rx4_zvkned(void *state, size_t outputSize, void *buffer);
void hashAndFillAes1Rx4_zvkned(void *scratchpad, size_t scratchpadSize, void *hash, void* fill_state);

View File

@@ -0,0 +1,148 @@
/*
Copyright (c) 2018-2019, tevador <tevador@gmail.com>
Copyright (c) 2026 XMRig <support@xmrig.com>
Copyright (c) 2026 SChernykh <https://github.com/SChernykh>
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <cstddef>
#include <cstdint>
#include <immintrin.h>
#define REVERSE_4(A, B, C, D) D, C, B, A
alignas(64) static const uint32_t AES_HASH_1R_STATE[] = {
REVERSE_4(0xd7983aad, 0xcc82db47, 0x9fa856de, 0x92b52c0d),
REVERSE_4(0xace78057, 0xf59e125a, 0x15c7b798, 0x338d996e),
REVERSE_4(0xe8a07ce4, 0x5079506b, 0xae62c7d0, 0x6a770017),
REVERSE_4(0x7e994948, 0x79a10005, 0x07ad828d, 0x630a240c)
};
alignas(64) static const uint32_t AES_GEN_1R_KEY[] = {
REVERSE_4(0xb4f44917, 0xdbb5552b, 0x62716609, 0x6daca553),
REVERSE_4(0x0da1dc4e, 0x1725d378, 0x846a710d, 0x6d7caf07),
REVERSE_4(0x3e20e345, 0xf4c0794f, 0x9f947ec6, 0x3f1262f1),
REVERSE_4(0x49169154, 0x16314c88, 0xb1ba317c, 0x6aef8135)
};
alignas(64) static const uint32_t AES_HASH_1R_XKEY0[] = {
REVERSE_4(0x06890201, 0x90dc56bf, 0x8b24949f, 0xf6fa8389),
REVERSE_4(0x06890201, 0x90dc56bf, 0x8b24949f, 0xf6fa8389),
REVERSE_4(0x06890201, 0x90dc56bf, 0x8b24949f, 0xf6fa8389),
REVERSE_4(0x06890201, 0x90dc56bf, 0x8b24949f, 0xf6fa8389)
};
alignas(64) static const uint32_t AES_HASH_1R_XKEY1[] = {
REVERSE_4(0xed18f99b, 0xee1043c6, 0x51f4e03c, 0x61b263d1),
REVERSE_4(0xed18f99b, 0xee1043c6, 0x51f4e03c, 0x61b263d1),
REVERSE_4(0xed18f99b, 0xee1043c6, 0x51f4e03c, 0x61b263d1),
REVERSE_4(0xed18f99b, 0xee1043c6, 0x51f4e03c, 0x61b263d1)
};
void hashAndFillAes1Rx4_VAES512(void *scratchpad, size_t scratchpadSize, void *hash, void* fill_state)
{
uint8_t* scratchpadPtr = (uint8_t*)scratchpad;
const uint8_t* scratchpadEnd = scratchpadPtr + scratchpadSize;
const __m512i fill_key = _mm512_load_si512(AES_GEN_1R_KEY);
const __m512i initial_hash_state = _mm512_load_si512(AES_HASH_1R_STATE);
const __m512i initial_fill_state = _mm512_load_si512(fill_state);
constexpr uint8_t mask = 0b11001100;
// enc_data[0] = hash_state[0]
// enc_data[1] = fill_state[1]
// enc_data[2] = hash_state[2]
// enc_data[3] = fill_state[3]
__m512i enc_data = _mm512_mask_blend_epi64(mask, initial_hash_state, initial_fill_state);
// dec_data[0] = fill_state[0]
// dec_data[1] = hash_state[1]
// dec_data[2] = fill_state[2]
// dec_data[3] = hash_state[3]
__m512i dec_data = _mm512_mask_blend_epi64(mask, initial_fill_state, initial_hash_state);
constexpr int PREFETCH_DISTANCE = 7168;
const uint8_t* prefetchPtr = scratchpadPtr + PREFETCH_DISTANCE;
scratchpadEnd -= PREFETCH_DISTANCE;
for (const uint8_t* p = scratchpadPtr; p < prefetchPtr; p += 256) {
_mm_prefetch((const char*)(p + 0), _MM_HINT_T0);
_mm_prefetch((const char*)(p + 64), _MM_HINT_T0);
_mm_prefetch((const char*)(p + 128), _MM_HINT_T0);
_mm_prefetch((const char*)(p + 192), _MM_HINT_T0);
}
for (int i = 0; i < 2; ++i) {
while (scratchpadPtr < scratchpadEnd) {
const __m512i scratchpad_data = _mm512_load_si512(scratchpadPtr);
// enc_key[0] = scratchpad_data[0]
// enc_key[1] = fill_key[1]
// enc_key[2] = scratchpad_data[2]
// enc_key[3] = fill_key[3]
enc_data = _mm512_aesenc_epi128(enc_data, _mm512_mask_blend_epi64(mask, scratchpad_data, fill_key));
// dec_key[0] = fill_key[0]
// dec_key[1] = scratchpad_data[1]
// dec_key[2] = fill_key[2]
// dec_key[3] = scratchpad_data[3]
dec_data = _mm512_aesdec_epi128(dec_data, _mm512_mask_blend_epi64(mask, fill_key, scratchpad_data));
// fill_state[0] = dec_data[0]
// fill_state[1] = enc_data[1]
// fill_state[2] = dec_data[2]
// fill_state[3] = enc_data[3]
_mm512_store_si512(scratchpadPtr, _mm512_mask_blend_epi64(mask, dec_data, enc_data));
_mm_prefetch((const char*)prefetchPtr, _MM_HINT_T0);
scratchpadPtr += 64;
prefetchPtr += 64;
}
prefetchPtr = (const uint8_t*) scratchpad;
scratchpadEnd += PREFETCH_DISTANCE;
}
_mm512_store_si512(fill_state, _mm512_mask_blend_epi64(mask, dec_data, enc_data));
//two extra rounds to achieve full diffusion
const __m512i xkey0 = _mm512_load_si512(AES_HASH_1R_XKEY0);
const __m512i xkey1 = _mm512_load_si512(AES_HASH_1R_XKEY1);
enc_data = _mm512_aesenc_epi128(enc_data, xkey0);
dec_data = _mm512_aesdec_epi128(dec_data, xkey0);
enc_data = _mm512_aesenc_epi128(enc_data, xkey1);
dec_data = _mm512_aesdec_epi128(dec_data, xkey1);
//output hash
_mm512_store_si512(hash, _mm512_mask_blend_epi64(mask, enc_data, dec_data));
// Just in case
_mm256_zeroupper();
}

View File

@@ -1,5 +1,5 @@
;# save VM register values
add rsp, 40
add rsp, 248
pop rcx
mov qword ptr [rcx+0], r8
mov qword ptr [rcx+8], r9

View File

@@ -0,0 +1,30 @@
mov rcx, [rsp+24]
mov qword ptr [rcx+0], r8
mov qword ptr [rcx+8], r9
mov qword ptr [rcx+16], r10
mov qword ptr [rcx+24], r11
mov qword ptr [rcx+32], r12
mov qword ptr [rcx+40], r13
mov qword ptr [rcx+48], r14
mov qword ptr [rcx+56], r15
mov rcx, [rsp+16]
aesenc xmm0, xmm4
aesdec xmm1, xmm4
aesenc xmm2, xmm4
aesdec xmm3, xmm4
aesenc xmm0, xmm5
aesdec xmm1, xmm5
aesenc xmm2, xmm5
aesdec xmm3, xmm5
aesenc xmm0, xmm6
aesdec xmm1, xmm6
aesenc xmm2, xmm6
aesdec xmm3, xmm6
aesenc xmm0, xmm7
aesdec xmm1, xmm7
aesenc xmm2, xmm7
aesdec xmm3, xmm7
movapd xmmword ptr [rcx+0], xmm0
movapd xmmword ptr [rcx+16], xmm1
movapd xmmword ptr [rcx+32], xmm2
movapd xmmword ptr [rcx+48], xmm3

View File

@@ -0,0 +1,196 @@
mov rcx, [rsp+24]
mov qword ptr [rcx+0], r8
mov qword ptr [rcx+8], r9
mov qword ptr [rcx+16], r10
mov qword ptr [rcx+24], r11
mov qword ptr [rcx+32], r12
mov qword ptr [rcx+40], r13
mov qword ptr [rcx+48], r14
mov qword ptr [rcx+56], r15
movapd xmmword ptr [rsp+40], xmm0
movapd xmmword ptr [rsp+56], xmm1
movapd xmmword ptr [rsp+72], xmm2
movapd xmmword ptr [rsp+88], xmm3
movapd xmmword ptr [rsp+104], xmm4
movapd xmmword ptr [rsp+120], xmm5
movapd xmmword ptr [rsp+136], xmm6
movapd xmmword ptr [rsp+152], xmm7
mov [rsp+168], rax
mov [rsp+176], rbx
mov [rsp+184], rdx
mov [rsp+192], rsi
mov [rsp+200], rdi
mov [rsp+208], rbp
mov [rsp+216], r8
mov [rsp+224], r9
mov r8, [rsp+232] ;# aes_lut_enc
mov r9, [rsp+240] ;# aes_lut_dec
movapd xmm12, xmmword ptr [rsp-8] ;# "call" will overwrite IMUL_RCP's data on stack, so save it
lea rsi, [rsp+104]
lea rdi, [rsp+40]
call soft_aes_enc
lea rdi, [rsp+56]
call soft_aes_dec
lea rdi, [rsp+72]
call soft_aes_enc
lea rdi, [rsp+88]
call soft_aes_dec
lea rsi, [rsp+120]
lea rdi, [rsp+40]
call soft_aes_enc
lea rdi, [rsp+56]
call soft_aes_dec
lea rdi, [rsp+72]
call soft_aes_enc
lea rdi, [rsp+88]
call soft_aes_dec
lea rsi, [rsp+136]
lea rdi, [rsp+40]
call soft_aes_enc
lea rdi, [rsp+56]
call soft_aes_dec
lea rdi, [rsp+72]
call soft_aes_enc
lea rdi, [rsp+88]
call soft_aes_dec
lea rsi, [rsp+152]
lea rdi, [rsp+40]
call soft_aes_enc
lea rdi, [rsp+56]
call soft_aes_dec
lea rdi, [rsp+72]
call soft_aes_enc
lea rdi, [rsp+88]
call soft_aes_dec
movapd xmmword ptr [rsp-8], xmm12
jmp soft_aes_end
soft_aes_enc:
mov eax, dword ptr [rsi+0]
mov ebx, dword ptr [rsi+4]
mov ecx, dword ptr [rsi+8]
mov edx, dword ptr [rsi+12]
movzx ebp, byte ptr [rdi+0]
xor eax, dword ptr [r8+rbp*4]
movzx ebp, byte ptr [rdi+1]
xor edx, dword ptr [r8+rbp*4+1024]
movzx ebp, byte ptr [rdi+2]
xor ecx, dword ptr [r8+rbp*4+2048]
movzx ebp, byte ptr [rdi+3]
xor ebx, dword ptr [r8+rbp*4+3072]
movzx ebp, byte ptr [rdi+4]
xor ebx, dword ptr [r8+rbp*4]
movzx ebp, byte ptr [rdi+5]
xor eax, dword ptr [r8+rbp*4+1024]
movzx ebp, byte ptr [rdi+6]
xor edx, dword ptr [r8+rbp*4+2048]
movzx ebp, byte ptr [rdi+7]
xor ecx, dword ptr [r8+rbp*4+3072]
movzx ebp, byte ptr [rdi+8]
xor ecx, dword ptr [r8+rbp*4]
movzx ebp, byte ptr [rdi+9]
xor ebx, dword ptr [r8+rbp*4+1024]
movzx ebp, byte ptr [rdi+10]
xor eax, dword ptr [r8+rbp*4+2048]
movzx ebp, byte ptr [rdi+11]
xor edx, dword ptr [r8+rbp*4+3072]
movzx ebp, byte ptr [rdi+12]
xor edx, dword ptr [r8+rbp*4]
movzx ebp, byte ptr [rdi+13]
xor ecx, dword ptr [r8+rbp*4+1024]
movzx ebp, byte ptr [rdi+14]
xor ebx, dword ptr [r8+rbp*4+2048]
movzx ebp, byte ptr [rdi+15]
xor eax, dword ptr [r8+rbp*4+3072]
mov dword ptr [rdi+0], eax
mov dword ptr [rdi+4], ebx
mov dword ptr [rdi+8], ecx
mov dword ptr [rdi+12], edx
ret
soft_aes_dec:
mov eax, dword ptr [rsi+0]
mov ebx, dword ptr [rsi+4]
mov ecx, dword ptr [rsi+8]
mov edx, dword ptr [rsi+12]
movzx ebp, byte ptr [rdi+0]
xor eax, dword ptr [r9+rbp*4]
movzx ebp, byte ptr [rdi+1]
xor ebx, dword ptr [r9+rbp*4+1024]
movzx ebp, byte ptr [rdi+2]
xor ecx, dword ptr [r9+rbp*4+2048]
movzx ebp, byte ptr [rdi+3]
xor edx, dword ptr [r9+rbp*4+3072]
movzx ebp, byte ptr [rdi+4]
xor ebx, dword ptr [r9+rbp*4]
movzx ebp, byte ptr [rdi+5]
xor ecx, dword ptr [r9+rbp*4+1024]
movzx ebp, byte ptr [rdi+6]
xor edx, dword ptr [r9+rbp*4+2048]
movzx ebp, byte ptr [rdi+7]
xor eax, dword ptr [r9+rbp*4+3072]
movzx ebp, byte ptr [rdi+8]
xor ecx, dword ptr [r9+rbp*4]
movzx ebp, byte ptr [rdi+9]
xor edx, dword ptr [r9+rbp*4+1024]
movzx ebp, byte ptr [rdi+10]
xor eax, dword ptr [r9+rbp*4+2048]
movzx ebp, byte ptr [rdi+11]
xor ebx, dword ptr [r9+rbp*4+3072]
movzx ebp, byte ptr [rdi+12]
xor edx, dword ptr [r9+rbp*4]
movzx ebp, byte ptr [rdi+13]
xor eax, dword ptr [r9+rbp*4+1024]
movzx ebp, byte ptr [rdi+14]
xor ebx, dword ptr [r9+rbp*4+2048]
movzx ebp, byte ptr [rdi+15]
xor ecx, dword ptr [r9+rbp*4+3072]
mov dword ptr [rdi+0], eax
mov dword ptr [rdi+4], ebx
mov dword ptr [rdi+8], ecx
mov dword ptr [rdi+12], edx
ret
soft_aes_end:
mov rax, [rsp+168]
mov rbx, [rsp+176]
mov rcx, [rsp+16]
mov rdx, [rsp+184]
mov rsi, [rsp+192]
mov rdi, [rsp+200]
mov rbp, [rsp+208]
mov r8, [rsp+216]
mov r9, [rsp+224]
movapd xmm0, xmmword ptr [rsp+40]
movapd xmm1, xmmword ptr [rsp+56]
movapd xmm2, xmmword ptr [rsp+72]
movapd xmm3, xmmword ptr [rsp+88]
movapd xmmword ptr [rcx+0], xmm0
movapd xmmword ptr [rcx+16], xmm1
movapd xmmword ptr [rcx+32], xmm2
movapd xmmword ptr [rcx+48], xmm3

View File

@@ -13,12 +13,6 @@
mov rbp, qword ptr [rsi] ;# "mx", "ma"
mov rdi, qword ptr [rsi+8] ;# uint8_t* dataset
;# dataset prefetch for the first iteration of the main loop
mov rax, rbp
shr rax, 32
and eax, RANDOMX_DATASET_BASE_MASK
prefetchnta byte ptr [rdi+rax]
mov rsi, rdx ;# uint8_t* scratchpad
mov rax, rbp

View File

@@ -25,12 +25,6 @@
mov rbp, qword ptr [rdx] ;# "mx", "ma"
mov rdi, qword ptr [rdx+8] ;# uint8_t* dataset
;# dataset prefetch for the first iteration of the main loop
mov rax, rbp
shr rax, 32
and eax, RANDOMX_DATASET_BASE_MASK
prefetchnta byte ptr [rdi+rax]
mov rsi, r8 ;# uint8_t* scratchpad
mov rbx, r9 ;# loop counter

View File

@@ -0,0 +1,16 @@
mov ecx, ebp ;# ecx = ma
and ecx, RANDOMX_DATASET_BASE_MASK
xor r8, qword ptr [rdi+rcx]
xor rbp, rax ;# modify "ma"
mov edx, ebp ;# edx = "ma"
ror rbp, 32 ;# swap "ma" and "mx"
and edx, RANDOMX_DATASET_BASE_MASK
prefetchnta byte ptr [rdi+rdx]
xor r9, qword ptr [rdi+rcx+8]
xor r10, qword ptr [rdi+rcx+16]
xor r11, qword ptr [rdi+rcx+24]
xor r12, qword ptr [rdi+rcx+32]
xor r13, qword ptr [rdi+rcx+40]
xor r14, qword ptr [rdi+rcx+48]
xor r15, qword ptr [rdi+rcx+56]

View File

@@ -225,7 +225,10 @@ namespace randomx {
}
static void exe_CFROUND(RANDOMX_EXE_ARGS) {
rx_set_rounding_mode(rotr64(*ibc.isrc, static_cast<uint32_t>(ibc.imm)) % 4);
uint64_t isrc = rotr64(*ibc.isrc, ibc.imm);
if (!RandomX_CurrentConfig.Tweak_V2_CFROUND || ((isrc & 60) == 0)) {
rx_set_rounding_mode(isrc % 4);
}
}
static void exe_ISTORE(RANDOMX_EXE_ARGS) {

View File

@@ -41,7 +41,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#define RANDOMX_DATASET_MAX_SIZE 2181038080
// Increase it if some configs use larger programs
#define RANDOMX_PROGRAM_MAX_SIZE 280
#define RANDOMX_PROGRAM_MAX_SIZE 384
// Increase it if some configs use larger scratchpad
#define RANDOMX_SCRATCHPAD_L3_MAX_SIZE 2097152

View File

@@ -174,7 +174,7 @@ FORCE_INLINE void rx_set_rounding_mode(uint32_t mode) {
_mm_setcsr(rx_mxcsr_default | (mode << 13));
}
#elif defined(__PPC64__) && defined(__ALTIVEC__) && defined(__VSX__) //sadly only POWER7 and newer will be able to use SIMD acceleration. Earlier processors cant use doubles or 64 bit integers with SIMD
#elif defined(__PPC64__) && defined(__ALTIVEC__) && defined(__VSX__) //sadly only POWER7 and newer will be able to use SIMD acceleration. Earlier processors can't use doubles or 64 bit integers with SIMD
#include <cstdint>
#include <stdexcept>
#include <cstdlib>
@@ -420,7 +420,7 @@ inline void* rx_aligned_alloc(size_t size, size_t align) {
# define rx_aligned_free(a) free(a)
#endif
inline void rx_prefetch_nta(void* ptr) {
inline void rx_prefetch_nta(const void* ptr) {
asm volatile ("prfm pldl1strm, [%0]\n" : : "r" (ptr));
}
@@ -577,8 +577,13 @@ inline void* rx_aligned_alloc(size_t size, size_t align) {
# define rx_aligned_free(a) free(a)
#endif
#if defined(__GNUC__) && (!defined(__clang__) || __has_builtin(__builtin_prefetch))
#define rx_prefetch_nta(x) __builtin_prefetch((x), 0, 0)
#define rx_prefetch_t0(x) __builtin_prefetch((x), 0, 3)
#else
#define rx_prefetch_nta(x)
#define rx_prefetch_t0(x)
#endif
FORCE_INLINE rx_vec_f128 rx_load_vec_f128(const double* pd) {
rx_vec_f128 x;

View File

@@ -34,6 +34,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include "crypto/randomx/reciprocal.h"
#include "crypto/randomx/superscalar.hpp"
#include "crypto/randomx/virtual_memory.hpp"
#include "crypto/randomx/soft_aes.h"
static bool hugePagesJIT = false;
static int optimizedDatasetInit = -1;
@@ -114,7 +115,7 @@ JitCompilerA64::~JitCompilerA64()
freePagedMemory(code, allocatedSize);
}
void JitCompilerA64::generateProgram(Program& program, ProgramConfiguration& config, uint32_t)
void JitCompilerA64::generateProgram(Program& program, ProgramConfiguration& config, uint32_t flags)
{
if (!allocatedSize) {
allocate(CodeSize);
@@ -125,6 +126,8 @@ void JitCompilerA64::generateProgram(Program& program, ProgramConfiguration& con
}
#endif
vm_flags = flags;
uint32_t codePos = MainLoopBegin + 4;
uint32_t mask = ((RandomX_CurrentConfig.Log2_ScratchpadL3 - 7) << 10);
@@ -156,9 +159,9 @@ void JitCompilerA64::generateProgram(Program& program, ProgramConfiguration& con
emit32(ARMV8A::B | (offset / 4), code, codePos);
mask = ((RandomX_CurrentConfig.Log2_DatasetBaseSize - 7) << 10);
// and w20, w9, CacheLineAlignMask
// and w20, w20, CacheLineAlignMask
codePos = (((uint8_t*)randomx_program_aarch64_cacheline_align_mask1) - ((uint8_t*)randomx_program_aarch64));
emit32(0x121A0000 | 20 | (9 << 5) | mask, code, codePos);
emit32(0x121A0000 | 20 | (20 << 5) | mask, code, codePos);
// and w10, w10, CacheLineAlignMask
codePos = (((uint8_t*)randomx_program_aarch64_cacheline_align_mask2) - ((uint8_t*)randomx_program_aarch64));
@@ -169,6 +172,43 @@ void JitCompilerA64::generateProgram(Program& program, ProgramConfiguration& con
codePos = ((uint8_t*)randomx_program_aarch64_update_spMix1) - ((uint8_t*)randomx_program_aarch64);
emit32(ARMV8A::EOR | 10 | (IntRegMap[config.readReg0] << 5) | (IntRegMap[config.readReg1] << 16), code, codePos);
codePos = ((uint8_t*)randomx_program_aarch64_v2_FE_mix) - ((uint8_t*)randomx_program_aarch64);
// Enable RandomX v2 AES tweak
if (RandomX_CurrentConfig.Tweak_V2_AES) {
if (flags & RANDOMX_FLAG_HARD_AES) {
// Disable the jump to RandomX v1 FE mix code by writing "movi v28.4s, 0" instruction
emit32(0x4F00041C, code, codePos);
}
else {
// Jump to RandomX v2 FE mix soft AES code by writing "b randomx_program_aarch64_v2_FE_mix_soft_aes" instruction
uint32_t offset = (uint8_t*)randomx_program_aarch64_v2_FE_mix_soft_aes - (uint8_t*)randomx_program_aarch64_v2_FE_mix;
emit32(ARMV8A::B | (offset / 4), code, codePos);
offset = (uint8_t*)randomx_program_aarch64_aes_lut_pointers - (uint8_t*)randomx_program_aarch64;
*(uint64_t*)(code + offset + 0) = (uint64_t) &lutEnc[0][0];
*(uint64_t*)(code + offset + 8) = (uint64_t) &lutDec[0][0];
}
}
else {
// Restore the jump to RandomX v1 FE mix code
const uint32_t offset = (uint8_t*)randomx_program_aarch64_v1_FE_mix - (uint8_t*)randomx_program_aarch64_v2_FE_mix;
emit32(ARMV8A::B | (offset / 4), code, codePos);
}
// Apply v2 prefetch tweak
if (RandomX_CurrentConfig.Tweak_V2_PREFETCH) {
uint32_t dst = (((uint8_t*)randomx_program_aarch64_vm_instructions_end) - ((uint8_t*)randomx_program_aarch64));
uint32_t src = (((uint8_t*)randomx_program_aarch64_vm_instructions_end_v2) - ((uint8_t*)randomx_program_aarch64));
memcpy(code + dst, code + src, 16);
}
else {
uint32_t dst = (((uint8_t*)randomx_program_aarch64_vm_instructions_end) - ((uint8_t*)randomx_program_aarch64));
uint32_t src = (((uint8_t*)randomx_program_aarch64_vm_instructions_end_v1) - ((uint8_t*)randomx_program_aarch64));
memcpy(code + dst, code + src, 16);
}
# ifndef XMRIG_OS_APPLE
xmrig::VirtualMemory::flushInstructionCache(reinterpret_cast<char*>(code + MainLoopBegin), codePos - MainLoopBegin);
# endif
@@ -210,19 +250,56 @@ void JitCompilerA64::generateProgramLight(Program& program, ProgramConfiguration
// eor w20, config.readReg2, config.readReg3
emit32(ARMV8A::EOR32 | 20 | (IntRegMap[config.readReg2] << 5) | (IntRegMap[config.readReg3] << 16), code, codePos);
// Apply v2 prefetch tweak
if (RandomX_CurrentConfig.Tweak_V2_PREFETCH) {
uint32_t dst = (((uint8_t*)randomx_program_aarch64_vm_instructions_end_light_tweak) - ((uint8_t*)randomx_program_aarch64));
uint32_t src = (((uint8_t*)randomx_program_aarch64_vm_instructions_end_light_v2) - ((uint8_t*)randomx_program_aarch64));
memcpy(code + dst, code + src, 8);
}
else {
uint32_t dst = (((uint8_t*)randomx_program_aarch64_vm_instructions_end_light_tweak) - ((uint8_t*)randomx_program_aarch64));
uint32_t src = (((uint8_t*)randomx_program_aarch64_vm_instructions_end_light_v1) - ((uint8_t*)randomx_program_aarch64));
memcpy(code + dst, code + src, 8);
}
// Jump back to the main loop
const uint32_t offset = (((uint8_t*)randomx_program_aarch64_vm_instructions_end_light) - ((uint8_t*)randomx_program_aarch64)) - codePos;
emit32(ARMV8A::B | (offset / 4), code, codePos);
// and w2, w9, CacheLineAlignMask
// and w2, w2, CacheLineAlignMask
codePos = (((uint8_t*)randomx_program_aarch64_light_cacheline_align_mask) - ((uint8_t*)randomx_program_aarch64));
emit32(0x121A0000 | 2 | (9 << 5) | ((RandomX_CurrentConfig.Log2_DatasetBaseSize - 7) << 10), code, codePos);
emit32(0x121A0000 | 2 | (2 << 5) | ((RandomX_CurrentConfig.Log2_DatasetBaseSize - 7) << 10), code, codePos);
// Update spMix1
// eor x10, config.readReg0, config.readReg1
codePos = ((uint8_t*)randomx_program_aarch64_update_spMix1) - ((uint8_t*)randomx_program_aarch64);
emit32(ARMV8A::EOR | 10 | (IntRegMap[config.readReg0] << 5) | (IntRegMap[config.readReg1] << 16), code, codePos);
codePos = ((uint8_t*)randomx_program_aarch64_v2_FE_mix) - ((uint8_t*)randomx_program_aarch64);
// Enable RandomX v2 AES tweak
if (RandomX_CurrentConfig.Tweak_V2_AES) {
if (vm_flags & RANDOMX_FLAG_HARD_AES) {
// Disable the jump to RandomX v1 FE mix code by writing "movi v28.4s, 0" instruction
emit32(0x4F00041C, code, codePos);
}
else {
// Jump to RandomX v2 FE mix soft AES code by writing "b randomx_program_aarch64_v2_FE_mix_soft_aes" instruction
uint32_t offset = (uint8_t*)randomx_program_aarch64_v2_FE_mix_soft_aes - (uint8_t*)randomx_program_aarch64_v2_FE_mix;
emit32(ARMV8A::B | (offset / 4), code, codePos);
offset = (uint8_t*)randomx_program_aarch64_aes_lut_pointers - (uint8_t*)randomx_program_aarch64;
*(uint64_t*)(code + offset + 0) = (uint64_t) &lutEnc[0][0];
*(uint64_t*)(code + offset + 8) = (uint64_t) &lutDec[0][0];
}
}
else {
// Restore the jump to RandomX v1 FE mix code
const uint32_t offset = (uint8_t*)randomx_program_aarch64_v1_FE_mix - (uint8_t*)randomx_program_aarch64_v2_FE_mix;
emit32(ARMV8A::B | (offset / 4), code, codePos);
}
// Apply dataset offset
codePos = ((uint8_t*)randomx_program_aarch64_light_dataset_offset) - ((uint8_t*)randomx_program_aarch64);
@@ -1035,20 +1112,20 @@ void JitCompilerA64::h_CFROUND(Instruction& instr, uint32_t& codePos)
constexpr uint32_t tmp_reg = 20;
constexpr uint32_t fpcr_tmp_reg = 8;
if (instr.getImm32() & 63)
{
// ror tmp_reg, src, imm
emit32(ARMV8A::ROR_IMM | tmp_reg | (src << 5) | ((instr.getImm32() & 63) << 10) | (src << 16), code, k);
// ror tmp_reg, src, imm
emit32(ARMV8A::ROR_IMM | tmp_reg | (src << 5) | ((instr.getImm32() & 63) << 10) | (src << 16), code, k);
// bfi fpcr_tmp_reg, tmp_reg, 40, 2
emit32(0xB3580400 | fpcr_tmp_reg | (tmp_reg << 5), code, k);
}
else // no rotation
{
// bfi fpcr_tmp_reg, src, 40, 2
emit32(0xB3580400 | fpcr_tmp_reg | (src << 5), code, k);
if (RandomX_CurrentConfig.Tweak_V2_CFROUND) {
// tst tmp_reg, 60
emit32(0xF27E0E9F, code, k);
// bne next
emit32(0x54000081, code, k);
}
// bfi fpcr_tmp_reg, tmp_reg, 40, 2
emit32(0xB3580400 | fpcr_tmp_reg | (tmp_reg << 5), code, k);
// rbit tmp_reg, fpcr_tmp_reg
emit32(0xDAC00000 | tmp_reg | (fpcr_tmp_reg << 5), code, k);

View File

@@ -83,6 +83,7 @@ namespace randomx {
uint32_t literalPos;
uint32_t num32bitLiterals = 0;
size_t allocatedSize = 0;
uint32_t vm_flags = 0;
void allocate(size_t size);

View File

@@ -31,7 +31,7 @@
#define DECL(x) x
#endif
.arch armv8-a
.arch armv8-a+crypto
.text
.global DECL(randomx_program_aarch64)
.global DECL(randomx_program_aarch64_main_loop)
@@ -41,9 +41,18 @@
.global DECL(randomx_program_aarch64_cacheline_align_mask1)
.global DECL(randomx_program_aarch64_cacheline_align_mask2)
.global DECL(randomx_program_aarch64_update_spMix1)
.global DECL(randomx_program_aarch64_v2_FE_mix)
.global DECL(randomx_program_aarch64_v1_FE_mix)
.global DECL(randomx_program_aarch64_v2_FE_mix_soft_aes)
.global DECL(randomx_program_aarch64_aes_lut_pointers)
.global DECL(randomx_program_aarch64_vm_instructions_end_light)
.global DECL(randomx_program_aarch64_vm_instructions_end_light_tweak)
.global DECL(randomx_program_aarch64_light_cacheline_align_mask)
.global DECL(randomx_program_aarch64_light_dataset_offset)
.global DECL(randomx_program_aarch64_vm_instructions_end_v1)
.global DECL(randomx_program_aarch64_vm_instructions_end_v2)
.global DECL(randomx_program_aarch64_vm_instructions_end_light_v1)
.global DECL(randomx_program_aarch64_vm_instructions_end_light_v2)
.global DECL(randomx_init_dataset_aarch64)
.global DECL(randomx_init_dataset_aarch64_end)
.global DECL(randomx_calc_dataset_item_aarch64)
@@ -242,8 +251,8 @@ DECL(randomx_program_aarch64_main_loop):
# Execute VM instructions
DECL(randomx_program_aarch64_vm_instructions):
# 16 KB buffer for generated instructions
.fill 4096,4,0
# 24 KB buffer for generated instructions
.fill 6144,4,0
literal_x0: .fill 1,8,0
literal_x11: .fill 1,8,0
@@ -285,17 +294,19 @@ DECL(randomx_program_aarch64_vm_instructions_end):
eor x9, x9, x20
# Calculate dataset pointer for dataset prefetch
mov w20, w9
# mx <-> ma
ror x9, x9, 32
DECL(randomx_program_aarch64_cacheline_align_mask1):
# Actual mask will be inserted by JIT compiler
and x20, x9, 1
and x20, x20, 1
add x20, x20, x1
# Prefetch dataset data
prfm pldl2strm, [x20]
# mx <-> ma
ror x9, x9, 32
DECL(randomx_program_aarch64_cacheline_align_mask2):
# Actual mask will be inserted by JIT compiler
and x10, x10, 1
@@ -326,12 +337,93 @@ DECL(randomx_program_aarch64_update_spMix1):
stp x12, x13, [x17, 32]
stp x14, x15, [x17, 48]
# xor group F and group E registers
# RandomX v2 AES tweak (mix group F and group E registers using AES)
DECL(randomx_program_aarch64_v2_FE_mix):
# Jump to v1 FE mix code if we're running RandomX v1
# JIT compiler will write a "movi v28.4s, 0" (set v28 to all 0) here if we're running RandomX v2
# Or, JIT compiler will write a "b randomx_program_aarch64_v2_FE_mix_soft_aes" if we're running RandomX v2 with soft AES
b DECL(randomx_program_aarch64_v1_FE_mix)
# f0 = aesenc(f0, e0), f1 = aesdec(f1, e0), f2 = aesenc(f2, e0), f3 = aesdec(f3, e0)
aese v16.16b, v28.16b
aesd v17.16b, v28.16b
aese v18.16b, v28.16b
aesd v19.16b, v28.16b
aesmc v16.16b, v16.16b
aesimc v17.16b, v17.16b
aesmc v18.16b, v18.16b
aesimc v19.16b, v19.16b
eor v16.16b, v16.16b, v20.16b
eor v17.16b, v17.16b, v20.16b
eor v18.16b, v18.16b, v20.16b
eor v19.16b, v19.16b, v20.16b
# f0 = aesenc(f0, e1), f1 = aesdec(f1, e1), f2 = aesenc(f2, e1), f3 = aesdec(f3, e1)
aese v16.16b, v28.16b
aesd v17.16b, v28.16b
aese v18.16b, v28.16b
aesd v19.16b, v28.16b
aesmc v16.16b, v16.16b
aesimc v17.16b, v17.16b
aesmc v18.16b, v18.16b
aesimc v19.16b, v19.16b
eor v16.16b, v16.16b, v21.16b
eor v17.16b, v17.16b, v21.16b
eor v18.16b, v18.16b, v21.16b
eor v19.16b, v19.16b, v21.16b
# f0 = aesenc(f0, e2), f1 = aesdec(f1, e2), f2 = aesenc(f2, e2), f3 = aesdec(f3, e2)
aese v16.16b, v28.16b
aesd v17.16b, v28.16b
aese v18.16b, v28.16b
aesd v19.16b, v28.16b
aesmc v16.16b, v16.16b
aesimc v17.16b, v17.16b
aesmc v18.16b, v18.16b
aesimc v19.16b, v19.16b
eor v16.16b, v16.16b, v22.16b
eor v17.16b, v17.16b, v22.16b
eor v18.16b, v18.16b, v22.16b
eor v19.16b, v19.16b, v22.16b
# f0 = aesenc(f0, e3), f1 = aesdec(f1, e3), f2 = aesenc(f2, e3), f3 = aesdec(f3, e3)
aese v16.16b, v28.16b
aesd v17.16b, v28.16b
aese v18.16b, v28.16b
aesd v19.16b, v28.16b
aesmc v16.16b, v16.16b
aesimc v17.16b, v17.16b
aesmc v18.16b, v18.16b
aesimc v19.16b, v19.16b
eor v16.16b, v16.16b, v23.16b
eor v17.16b, v17.16b, v23.16b
eor v18.16b, v18.16b, v23.16b
eor v19.16b, v19.16b, v23.16b
# Skip v1 FE mix code because we already did v2 FE mix
b randomx_program_aarch64_FE_store
DECL(randomx_program_aarch64_v1_FE_mix):
eor v16.16b, v16.16b, v20.16b
eor v17.16b, v17.16b, v21.16b
eor v18.16b, v18.16b, v22.16b
eor v19.16b, v19.16b, v23.16b
randomx_program_aarch64_FE_store:
# Store FP registers to scratchpad (spAddr0)
stp q16, q17, [x16, 0]
stp q18, q19, [x16, 32]
@@ -376,6 +468,13 @@ DECL(randomx_program_aarch64_vm_instructions_end_light):
stp x0, x1, [sp, 64]
stp x2, x30, [sp, 80]
lsr x2, x9, 32
DECL(randomx_program_aarch64_light_cacheline_align_mask):
# Actual mask will be inserted by JIT compiler
and w2, w2, 1
DECL(randomx_program_aarch64_vm_instructions_end_light_tweak):
# mx ^= r[readReg2] ^ r[readReg3];
eor x9, x9, x20
@@ -388,10 +487,6 @@ DECL(randomx_program_aarch64_vm_instructions_end_light):
# x1 -> pointer to output
mov x1, sp
DECL(randomx_program_aarch64_light_cacheline_align_mask):
# Actual mask will be inserted by JIT compiler
and w2, w9, 1
# x2 -> item number
lsr x2, x2, 6
@@ -409,6 +504,237 @@ DECL(randomx_program_aarch64_light_dataset_offset):
b DECL(randomx_program_aarch64_xor_with_dataset_line)
DECL(randomx_program_aarch64_vm_instructions_end_v1):
lsr x10, x9, 32
eor x9, x9, x20
mov w20, w9
ror x9, x9, 32
DECL(randomx_program_aarch64_vm_instructions_end_v2):
lsr x10, x9, 32
ror x9, x9, 32
eor x9, x9, x20
mov w20, w9
DECL(randomx_program_aarch64_vm_instructions_end_light_v1):
eor x9, x9, x20
ror x9, x9, 32
DECL(randomx_program_aarch64_vm_instructions_end_light_v2):
ror x9, x9, 32
eor x9, x9, x20
DECL(randomx_program_aarch64_v2_FE_mix_soft_aes):
sub sp, sp, 176
stp x0, x1, [sp]
stp x2, x3, [sp, 16]
stp x4, x5, [sp, 32]
stp x6, x7, [sp, 48]
stp x8, x9, [sp, 64]
stp x10, x11, [sp, 80]
stp x12, x13, [sp, 96]
stp x14, x15, [sp, 112]
stp x16, x30, [sp, 128]
stp q0, q1, [sp, 144]
adr x19, DECL(randomx_program_aarch64_aes_lut_pointers)
ldp x19, x20, [x19]
# f0 = aesenc(f0, e0), f0 = aesenc(f0, e1), f0 = aesenc(f0, e2), f0 = aesenc(f0, e3)
mov v0.16b, v16.16b
mov v1.16b, v20.16b
bl randomx_soft_aesenc
mov v1.16b, v21.16b
bl randomx_soft_aesenc
mov v1.16b, v22.16b
bl randomx_soft_aesenc
mov v1.16b, v23.16b
bl randomx_soft_aesenc
mov v16.16b, v0.16b
# f1 = aesdec(f1, e0), f1 = aesdec(f1, e1), f1 = aesdec(f1, e2), f1 = aesdec(f1, e3)
mov v0.16b, v17.16b
mov v1.16b, v20.16b
bl randomx_soft_aesdec
mov v1.16b, v21.16b
bl randomx_soft_aesdec
mov v1.16b, v22.16b
bl randomx_soft_aesdec
mov v1.16b, v23.16b
bl randomx_soft_aesdec
mov v17.16b, v0.16b
# f2 = aesenc(f2, e0), f2 = aesenc(f2, e1), f2 = aesenc(f2, e2), f2 = aesenc(f2, e3)
mov v0.16b, v18.16b
mov v1.16b, v20.16b
bl randomx_soft_aesenc
mov v1.16b, v21.16b
bl randomx_soft_aesenc
mov v1.16b, v22.16b
bl randomx_soft_aesenc
mov v1.16b, v23.16b
bl randomx_soft_aesenc
mov v18.16b, v0.16b
# f3 = aesdec(f3, e0), f3 = aesdec(f3, e1), f3 = aesdec(f3, e2), f3 = aesdec(f3, e3)
mov v0.16b, v19.16b
mov v1.16b, v20.16b
bl randomx_soft_aesdec
mov v1.16b, v21.16b
bl randomx_soft_aesdec
mov v1.16b, v22.16b
bl randomx_soft_aesdec
mov v1.16b, v23.16b
bl randomx_soft_aesdec
mov v19.16b, v0.16b
ldp x0, x1, [sp]
ldp x2, x3, [sp, 16]
ldp x4, x5, [sp, 32]
ldp x6, x7, [sp, 48]
ldp x8, x9, [sp, 64]
ldp x10, x11, [sp, 80]
ldp x12, x13, [sp, 96]
ldp x14, x15, [sp, 112]
ldp x16, x30, [sp, 128]
ldp q0, q1, [sp, 144]
add sp, sp, 176
b randomx_program_aarch64_FE_store
randomx_soft_aesenc:
umov w4, v0.b[5]
umov w1, v0.b[10]
umov w12, v0.b[15]
umov w9, v0.b[9]
umov w2, v0.b[14]
umov w11, v0.b[3]
umov w5, v0.b[0]
umov w16, v0.b[4]
add x4, x4, 256
add x1, x1, 512
add x12, x12, 768
umov w3, v0.b[13]
umov w8, v0.b[2]
umov w7, v0.b[7]
add x9, x9, 256
add x2, x2, 512
add x11, x11, 768
ldr w10, [x19, x4, lsl 2]
ldr w15, [x19, x5, lsl 2]
umov w13, v0.b[8]
ldr w14, [x19, x12, lsl 2]
umov w6, v0.b[1]
ldr w1, [x19, x1, lsl 2]
eor w10, w10, w15
ldr w2, [x19, x2, lsl 2]
umov w5, v0.b[6]
ldr w9, [x19, x9, lsl 2]
umov w4, v0.b[11]
ldr w12, [x19, x16, lsl 2]
eor w1, w1, w14
ldr w11, [x19, x11, lsl 2]
eor w1, w1, w10
add x8, x8, 512
add x3, x3, 256
add x7, x7, 768
eor w9, w9, w12
fmov s28, w1
eor w1, w2, w11
umov w10, v0.b[12]
eor w1, w1, w9
ldr w3, [x19, x3, lsl 2]
add x6, x6, 256
ldr w9, [x19, x13, lsl 2]
ins v28.s[1], w1
ldr w2, [x19, x8, lsl 2]
add x5, x5, 512
ldr w7, [x19, x7, lsl 2]
add x4, x4, 768
eor w1, w3, w9
ldr w3, [x19, x6, lsl 2]
eor w2, w2, w7
ldr w6, [x19, x10, lsl 2]
eor w2, w2, w1
ldr w1, [x19, x5, lsl 2]
ldr w0, [x19, x4, lsl 2]
eor w3, w3, w6
ins v28.s[2], w2
eor w0, w1, w0
eor w0, w0, w3
ins v28.s[3], w0
eor v0.16b, v1.16b, v28.16b
ret
randomx_soft_aesdec:
umov w1, v0.b[10]
umov w3, v0.b[7]
umov w12, v0.b[13]
umov w2, v0.b[14]
umov w9, v0.b[11]
umov w11, v0.b[1]
umov w4, v0.b[0]
umov w16, v0.b[4]
add x3, x3, 768
add x1, x1, 512
add x12, x12, 256
umov w8, v0.b[5]
umov w6, v0.b[2]
umov w7, v0.b[15]
add x9, x9, 768
add x2, x2, 512
add x11, x11, 256
ldr w15, [x20, x3, lsl 2]
ldr w10, [x20, x4, lsl 2]
umov w13, v0.b[8]
ldr w14, [x20, x12, lsl 2]
umov w5, v0.b[9]
ldr w1, [x20, x1, lsl 2]
umov w3, v0.b[6]
ldr w12, [x20, x9, lsl 2]
umov w4, v0.b[3]
ldr w9, [x20, x16, lsl 2]
eor w1, w1, w15
ldr w2, [x20, x2, lsl 2]
eor w10, w10, w14
ldr w11, [x20, x11, lsl 2]
eor w1, w1, w10
add x8, x8, 256
add x6, x6, 512
add x7, x7, 768
eor w2, w2, w12
fmov s28, w1
eor w1, w9, w11
eor w1, w2, w1
umov w9, v0.b[12]
ldr w2, [x20, x13, lsl 2]
add x5, x5, 256
ldr w8, [x20, x8, lsl 2]
ins v28.s[1], w1
ldr w6, [x20, x6, lsl 2]
add x3, x3, 512
ldr w7, [x20, x7, lsl 2]
add x4, x4, 768
eor w2, w2, w8
ldr w1, [x20, x9, lsl 2]
eor w6, w6, w7
ldr w3, [x20, x3, lsl 2]
eor w2, w2, w6
ldr w4, [x20, x4, lsl 2]
ldr w5, [x20, x5, lsl 2]
ins v28.s[2], w2
eor w0, w1, w5
eor w1, w3, w4
eor w0, w0, w1
ins v28.s[3], w0
eor v0.16b, v1.16b, v28.16b
ret
DECL(randomx_program_aarch64_aes_lut_pointers):
.fill 2, 8, 0
# Input parameters

View File

@@ -38,9 +38,18 @@ extern "C" {
void randomx_program_aarch64_cacheline_align_mask1();
void randomx_program_aarch64_cacheline_align_mask2();
void randomx_program_aarch64_update_spMix1();
void randomx_program_aarch64_v2_FE_mix();
void randomx_program_aarch64_v1_FE_mix();
void randomx_program_aarch64_v2_FE_mix_soft_aes();
void randomx_program_aarch64_aes_lut_pointers();
void randomx_program_aarch64_vm_instructions_end_light();
void randomx_program_aarch64_vm_instructions_end_light_tweak();
void randomx_program_aarch64_light_cacheline_align_mask();
void randomx_program_aarch64_light_dataset_offset();
void randomx_program_aarch64_vm_instructions_end_v1();
void randomx_program_aarch64_vm_instructions_end_v2();
void randomx_program_aarch64_vm_instructions_end_light_v1();
void randomx_program_aarch64_vm_instructions_end_light_v2();
void randomx_init_dataset_aarch64();
void randomx_init_dataset_aarch64_end();
void randomx_calc_dataset_item_aarch64();

View File

@@ -30,6 +30,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include <cstring>
#include <climits>
#include <cassert>
#include "backend/cpu/Cpu.h"
#include "crypto/randomx/jit_compiler_rv64.hpp"
#include "crypto/randomx/jit_compiler_rv64_static.hpp"
#include "crypto/randomx/jit_compiler_rv64_vector.h"
@@ -38,6 +39,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include "crypto/randomx/program.hpp"
#include "crypto/randomx/reciprocal.h"
#include "crypto/randomx/virtual_memory.hpp"
#include "crypto/randomx/soft_aes.h"
#include "crypto/common/VirtualMemory.h"
@@ -252,10 +254,12 @@ namespace randomx {
static const uint8_t* codePrologue = (uint8_t*)&randomx_riscv64_prologue;
static const uint8_t* codeLoopBegin = (uint8_t*)&randomx_riscv64_loop_begin;
static const uint8_t* codeDataRead = (uint8_t*)&randomx_riscv64_data_read;
static const uint8_t* codeDataRead2 = (uint8_t*)&randomx_riscv64_data_read_v2_tweak;
static const uint8_t* codeDataReadLight = (uint8_t*)&randomx_riscv64_data_read_light;
static const uint8_t* codeDataReadLight1 = (uint8_t*)&randomx_riscv64_data_read_light_v1;
static const uint8_t* codeDataReadLight2 = (uint8_t*)&randomx_riscv64_data_read_light_v2;
static const uint8_t* codeFixLoopCall = (uint8_t*)&randomx_riscv64_fix_loop_call;
static const uint8_t* codeSpadStore = (uint8_t*)&randomx_riscv64_spad_store;
static const uint8_t* codeSpadStoreHardAes = (uint8_t*)&randomx_riscv64_spad_store_hardaes;
static const uint8_t* codeSpadStoreSoftAes = (uint8_t*)&randomx_riscv64_spad_store_softaes;
static const uint8_t* codeLoopEnd = (uint8_t*)&randomx_riscv64_loop_end;
static const uint8_t* codeFixContinueLoop = (uint8_t*)&randomx_riscv64_fix_continue_loop;
@@ -271,9 +275,13 @@ namespace randomx {
static const int32_t sizeDataInit = codePrologue - codeDataInit;
static const int32_t sizePrologue = codeLoopBegin - codePrologue;
static const int32_t sizeLoopBegin = codeDataRead - codeLoopBegin;
static const int32_t sizeDataRead = codeDataReadLight - codeDataRead;
static const int32_t sizeDataReadLight = codeSpadStore - codeDataReadLight;
static const int32_t sizeSpadStore = codeSpadStoreHardAes - codeSpadStore;
static const int32_t sizeDataRead = codeDataRead2 - codeDataRead;
static const int32_t sizeDataRead2 = codeDataReadLight - codeDataRead2;
static const int32_t sizeDataReadLight = codeDataReadLight1 - codeDataReadLight;
static const int32_t sizeDataReadLight1 = codeDataReadLight2 - codeDataReadLight1;
static const int32_t sizeDataReadLight2 = codeFixLoopCall - codeDataReadLight2;
static const int32_t sizeFixLoopCall = codeSpadStore - codeFixLoopCall;
static const int32_t sizeSpadStore = codeSpadStoreSoftAes - codeSpadStore;
static const int32_t sizeSpadStoreSoftAes = codeLoopEnd - codeSpadStoreSoftAes;
static const int32_t sizeLoopEnd = codeEpilogue - codeLoopEnd;
static const int32_t sizeEpilogue = codeSoftAes - codeEpilogue;
@@ -283,7 +291,6 @@ namespace randomx {
static const int32_t sizeSshPrefetch = codeSshEnd - codeSshPrefetch;
static const int32_t offsetFixDataCall = codeFixDataCall - codeDataInit;
static const int32_t offsetFixLoopCall = codeFixLoopCall - codeDataReadLight;
static const int32_t offsetFixContinueLoop = codeFixContinueLoop - codeLoopEnd;
static const int32_t LoopTopPos = LiteralPoolSize + sizeDataInit + sizePrologue;
@@ -478,8 +485,15 @@ namespace randomx {
static void emitProgramPrefix(CompilerState& state, Program& prog, ProgramConfiguration& pcfg) {
state.codePos = RandomXCodePos;
state.rcpCount = 0;
state.emitAt(LiteralPoolOffset + sizeLiterals, pcfg.eMask[0]);
state.emitAt(LiteralPoolOffset + sizeLiterals + 8, pcfg.eMask[1]);
if (RandomX_CurrentConfig.Tweak_V2_AES) {
state.emitAt(LiteralPoolOffset + sizeLiterals + 16, (uint64_t) &lutEnc[2][0]);
state.emitAt(LiteralPoolOffset + sizeLiterals + 24, (uint64_t) &lutDec[2][0]);
}
for (unsigned i = 0; i < RegistersCount; ++i) {
state.registerUsage[i] = -1;
}
@@ -492,7 +506,13 @@ namespace randomx {
}
static void emitProgramSuffix(CompilerState& state, ProgramConfiguration& pcfg) {
state.emit(codeSpadStore, sizeSpadStore);
if (RandomX_CurrentConfig.Tweak_V2_AES) {
state.emit(codeSpadStoreSoftAes, sizeSpadStoreSoftAes);
}
else {
state.emit(codeSpadStore, sizeSpadStore);
}
int32_t fixPos = state.codePos;
state.emit(codeLoopEnd, sizeLoopEnd);
//xor x26, x{readReg0}, x{readReg1}
@@ -501,6 +521,10 @@ namespace randomx {
//j LoopTop
emitJump(state, 0, fixPos, LoopTopPos);
state.emit(codeEpilogue, sizeEpilogue);
if (RandomX_CurrentConfig.Tweak_V2_AES) {
state.emit(codeSoftAes, sizeSoftAes);
}
}
static void generateSuperscalarCode(CodeBuffer& buf, Instruction isn, bool lastLiteral) {
@@ -621,13 +645,22 @@ namespace randomx {
//jal x1, SuperscalarHash
emitJump(state, ReturnReg, LiteralPoolSize + offsetFixDataCall, SuperScalarHashOffset);
vectorCodeSize = ((uint8_t*)randomx_riscv64_vector_sshash_end) - ((uint8_t*)randomx_riscv64_vector_sshash_begin);
vectorCode = static_cast<uint8_t*>(allocExecutableMemory(vectorCodeSize, hugePagesJIT && hugePagesEnable));
if (xmrig::Cpu::info()->hasRISCV_Vector()) {
vectorCodeSize = ((uint8_t*)randomx_riscv64_vector_code_end) - ((uint8_t*)randomx_riscv64_vector_code_begin);
vectorCode = static_cast<uint8_t*>(allocExecutableMemory(vectorCodeSize, hugePagesJIT && hugePagesEnable));
if (vectorCode) {
memcpy(vectorCode, reinterpret_cast<uint8_t*>(randomx_riscv64_vector_code_begin), vectorCodeSize);
entryProgramVector = vectorCode + (((uint8_t*)randomx_riscv64_vector_program_begin) - ((uint8_t*)randomx_riscv64_vector_code_begin));
}
}
}
JitCompilerRV64::~JitCompilerRV64() {
freePagedMemory(state.code, CodeSize);
freePagedMemory(vectorCode, vectorCodeSize);
if (vectorCode) {
freePagedMemory(vectorCode, vectorCodeSize);
}
}
void JitCompilerRV64::enableWriting() const
@@ -649,16 +682,29 @@ namespace randomx {
}
void JitCompilerRV64::generateProgram(Program& prog, ProgramConfiguration& pcfg, uint32_t) {
if (vectorCode) {
generateProgramVectorRV64(vectorCode, prog, pcfg, inst_map, nullptr, 0);
return;
}
emitProgramPrefix(state, prog, pcfg);
int32_t fixPos = state.codePos;
state.emit(codeDataRead, sizeDataRead);
//xor x8, x{readReg2}, x{readReg3}
state.emitAt(fixPos, rvi(rv64::XOR, Tmp1Reg, regR(pcfg.readReg2), regR(pcfg.readReg3)));
int32_t fixPos2 = state.codePos;
state.emit(codeDataRead2, sizeDataRead2);
state.emitAt(fixPos2, (uint16_t)(RandomX_CurrentConfig.Tweak_V2_PREFETCH ? 0x1402 : 0x0001));
emitProgramSuffix(state, pcfg);
clearCache(state);
}
void JitCompilerRV64::generateProgramLight(Program& prog, ProgramConfiguration& pcfg, uint32_t datasetOffset) {
if (vectorCode) {
generateProgramVectorRV64(vectorCode, prog, pcfg, inst_map, entryDataInit, datasetOffset);
return;
}
emitProgramPrefix(state, prog, pcfg);
int32_t fixPos = state.codePos;
state.emit(codeDataReadLight, sizeDataReadLight);
@@ -671,7 +717,14 @@ namespace randomx {
state.emitAt(fixPos + 4, rv64::LUI | (uimm << 12) | rvrd(Tmp2Reg));
//addi x9, x9, {limm}
state.emitAt(fixPos + 8, rvi(rv64::ADDI, Tmp2Reg, Tmp2Reg, limm));
fixPos += offsetFixLoopCall;
if (RandomX_CurrentConfig.Tweak_V2_PREFETCH) {
state.emit(codeDataReadLight2, sizeDataReadLight2);
}
else {
state.emit(codeDataReadLight1, sizeDataReadLight1);
}
fixPos = state.codePos;
state.emit(codeFixLoopCall, sizeFixLoopCall);
//jal x1, SuperscalarHash
emitJump(state, ReturnReg, fixPos, SuperScalarHashOffset);
emitProgramSuffix(state, pcfg);
@@ -680,9 +733,9 @@ namespace randomx {
template<size_t N>
void JitCompilerRV64::generateSuperscalarHash(SuperscalarProgram(&programs)[N]) {
if (optimizedDatasetInit > 0) {
entryDataInitOptimized = generateDatasetInitVectorRV64(vectorCode, vectorCodeSize, programs, RandomX_ConfigurationBase::CacheAccesses);
return;
if (vectorCode) {
entryDataInitVector = generateDatasetInitVectorRV64(vectorCode, programs, RandomX_ConfigurationBase::CacheAccesses);
// No return here because we also need the scalar dataset init function for the light mode
}
state.codePos = SuperScalarHashOffset;
@@ -722,10 +775,6 @@ namespace randomx {
template void JitCompilerRV64::generateSuperscalarHash(SuperscalarProgram(&)[RANDOMX_CACHE_MAX_ACCESSES]);
DatasetInitFunc* JitCompilerRV64::getDatasetInitFunc() {
return (DatasetInitFunc*)((optimizedDatasetInit > 0) ? entryDataInitOptimized : entryDataInit);
}
void JitCompilerRV64::v1_IADD_RS(HANDLER_ARGS) {
state.registerUsage[isn.dst] = i;
int shift = isn.getModShift();
@@ -1159,10 +1208,22 @@ namespace randomx {
//c.or x8, x9
state.emit(rvc(rv64::C_OR, Tmp1Reg + OffsetXC, Tmp2Reg + OffsetXC));
#endif
if (RandomX_CurrentConfig.Tweak_V2_CFROUND) {
//andi x9, x8, 240
state.emit(rvi(rv64::ANDI, Tmp2Reg, Tmp1Reg, 240));
//c.bnez x9, +12
state.emit(uint16_t(0xE491));
}
//c.andi x8, 12
state.emit(rvc(rv64::C_ANDI, Tmp1Reg + OffsetXC, 12));
}
else {
if (RandomX_CurrentConfig.Tweak_V2_CFROUND) {
//andi x9, x{src}, 240
state.emit(rvi(rv64::ANDI, Tmp2Reg, regR(isn.src), 240));
//c.bnez x9, +14
state.emit(uint16_t(0xE499));
}
//and x8, x{src}, 12
state.emit(rvi(rv64::ANDI, Tmp1Reg, regR(isn.src), 12));
}
@@ -1183,5 +1244,6 @@ namespace randomx {
void JitCompilerRV64::v1_NOP(HANDLER_ARGS) {
}
InstructionGeneratorRV64 JitCompilerRV64::engine[256] = {};
alignas(64) InstructionGeneratorRV64 JitCompilerRV64::engine[256] = {};
alignas(64) uint8_t JitCompilerRV64::inst_map[256] = {};
}

View File

@@ -90,9 +90,11 @@ namespace randomx {
void generateDatasetInitCode() {}
ProgramFunc* getProgramFunc() {
return (ProgramFunc*)entryProgram;
return (ProgramFunc*)(vectorCode ? entryProgramVector : entryProgram);
}
DatasetInitFunc* getDatasetInitFunc() {
return (DatasetInitFunc*)(vectorCode ? entryDataInitVector : entryDataInit);
}
DatasetInitFunc* getDatasetInitFunc();
uint8_t* getCode() {
return state.code;
}
@@ -102,15 +104,17 @@ namespace randomx {
void enableExecution() const;
static InstructionGeneratorRV64 engine[256];
static uint8_t inst_map[256];
private:
CompilerState state;
uint8_t* vectorCode;
size_t vectorCodeSize;
uint8_t* vectorCode = nullptr;
size_t vectorCodeSize = 0;
void* entryDataInit;
void* entryDataInitOptimized;
void* entryProgram;
void* entryDataInit = nullptr;
void* entryDataInitVector = nullptr;
void* entryProgram = nullptr;
void* entryProgramVector = nullptr;
public:
static void v1_IADD_RS(HANDLER_ARGS);

View File

@@ -40,10 +40,12 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.global DECL(randomx_riscv64_prologue)
.global DECL(randomx_riscv64_loop_begin)
.global DECL(randomx_riscv64_data_read)
.global DECL(randomx_riscv64_data_read_v2_tweak)
.global DECL(randomx_riscv64_data_read_light)
.global DECL(randomx_riscv64_data_read_light_v1)
.global DECL(randomx_riscv64_data_read_light_v2)
.global DECL(randomx_riscv64_fix_loop_call)
.global DECL(randomx_riscv64_spad_store)
.global DECL(randomx_riscv64_spad_store_hardaes)
.global DECL(randomx_riscv64_spad_store_softaes)
.global DECL(randomx_riscv64_loop_end)
.global DECL(randomx_riscv64_fix_continue_loop)
@@ -408,7 +410,9 @@ DECL(randomx_riscv64_data_read):
slli x8, x8, 32
srli x8, x8, 32
#endif
/* update "mx" */
DECL(randomx_riscv64_data_read_v2_tweak):
slli x8, x8, 32 /* JIT compiler will replace it with "nop" for RandomX v1 */
/* update "mp" */
xor x25, x25, x8
/* read dataset and update registers */
ld x8, 0(x7)
@@ -456,13 +460,22 @@ DECL(randomx_riscv64_data_read_light):
slli x25, x25, 32
or x25, x25, x31
#endif
DECL(randomx_riscv64_data_read_light_v1):
slli x8, x8, 32
/* update "mx" */
/* update "mp" */
xor x25, x25, x8
/* the next dataset item */
and x7, x25, x1
srli x7, x7, 6
add x7, x7, x9
DECL(randomx_riscv64_data_read_light_v2):
/* the next dataset item */
and x7, x25, x1
srli x7, x7, 6
add x7, x7, x9
and x8, x8, x1
/* update "mp" */
xor x25, x25, x8
DECL(randomx_riscv64_fix_loop_call):
jal superscalar_hash /* JIT compiler will adjust the offset */
xor x16, x16, x8
@@ -536,9 +549,6 @@ DECL(randomx_riscv64_spad_store):
sd x30, 56(x26)
fmv.d.x f7, x30
DECL(randomx_riscv64_spad_store_hardaes):
nop /* not implemented */
DECL(randomx_riscv64_spad_store_softaes):
/* store integer registers */
sd x16, 0(x27)

View File

@@ -36,10 +36,12 @@ extern "C" {
void randomx_riscv64_prologue();
void randomx_riscv64_loop_begin();
void randomx_riscv64_data_read();
void randomx_riscv64_data_read_v2_tweak();
void randomx_riscv64_data_read_light();
void randomx_riscv64_data_read_light_v1();
void randomx_riscv64_data_read_light_v2();
void randomx_riscv64_fix_loop_call();
void randomx_riscv64_spad_store();
void randomx_riscv64_spad_store_hardaes();
void randomx_riscv64_spad_store_softaes();
void randomx_riscv64_loop_end();
void randomx_riscv64_fix_continue_loop();

View File

@@ -33,19 +33,22 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include "crypto/randomx/jit_compiler_rv64_vector_static.h"
#include "crypto/randomx/reciprocal.h"
#include "crypto/randomx/superscalar.hpp"
#include "crypto/randomx/program.hpp"
#include "crypto/randomx/soft_aes.h"
#include "backend/cpu/Cpu.h"
namespace randomx {
#define ADDR(x) ((uint8_t*) &(x))
#define DIST(x, y) (ADDR(y) - ADDR(x))
void* generateDatasetInitVectorRV64(uint8_t* buf, size_t buf_size, SuperscalarProgram* programs, size_t num_programs)
#define JUMP(offset) (0x6F | (((offset) & 0x7FE) << 20) | (((offset) & 0x800) << 9) | ((offset) & 0xFF000))
void* generateDatasetInitVectorRV64(uint8_t* buf, SuperscalarProgram* programs, size_t num_programs)
{
memcpy(buf, reinterpret_cast<void*>(randomx_riscv64_vector_sshash_begin), buf_size);
uint8_t* p = buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_sshash_generated_instructions);
uint8_t* p = buf + DIST(randomx_riscv64_vector_sshash_begin, randomx_riscv64_vector_sshash_generated_instructions);
uint8_t* literals = buf + DIST(randomx_riscv64_vector_sshash_begin, randomx_riscv64_vector_sshash_imul_rcp_literals);
uint8_t* literals = buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_sshash_imul_rcp_literals);
uint8_t* cur_literal = literals;
for (size_t i = 0; i < num_programs; ++i) {
@@ -76,10 +79,16 @@ void* generateDatasetInitVectorRV64(uint8_t* buf, size_t buf_size, SuperscalarPr
break;
case SuperscalarInstructionType::IADD_RS:
// 57 39 00 96 vsll.vi v18, v0, 0
// 57 00 09 02 vadd.vv v0, v0, v18
EMIT(0x96003957 | (modShift << 15) | (src << 20));
EMIT(0x02090057 | (dst << 7) | (dst << 20));
if (modShift == 0) {
// 57 00 00 02 vadd.vv v0, v0, v0
EMIT(0x02000057 | (dst << 7) | (src << 15) | (dst << 20));
}
else {
// 57 39 00 96 vsll.vi v18, v0, 0
// 57 00 09 02 vadd.vv v0, v0, v18
EMIT(0x96003957 | (modShift << 15) | (src << 20));
EMIT(0x02090057 | (dst << 7) | (dst << 20));
}
break;
case SuperscalarInstructionType::IMUL_R:
@@ -89,6 +98,10 @@ void* generateDatasetInitVectorRV64(uint8_t* buf, size_t buf_size, SuperscalarPr
case SuperscalarInstructionType::IROR_C:
{
#ifdef __riscv_zvkb
// 57 30 00 52 vror.vi v0, v0, 0
EMIT(0x52003057 | (dst << 7) | (dst << 20) | ((imm32 & 31) << 15) | ((imm32 & 32) << 21));
#else // __riscv_zvkb
const uint32_t shift_right = imm32 & 63;
const uint32_t shift_left = 64 - shift_right;
@@ -116,6 +129,7 @@ void* generateDatasetInitVectorRV64(uint8_t* buf, size_t buf_size, SuperscalarPr
// 57 00 20 2B vor.vv v0, v18, v0
EMIT(0x2B200057 | (dst << 7) | (dst << 15));
#endif // __riscv_zvkb
}
break;
@@ -126,7 +140,7 @@ void* generateDatasetInitVectorRV64(uint8_t* buf, size_t buf_size, SuperscalarPr
// 9B 82 02 00 addiw x5, x5, 0
// 57 C0 02 02 vadd.vx v0, v0, x5
EMIT(0x000002B7 | ((imm32 + ((imm32 & 0x800) << 1)) & 0xFFFFF000));
EMIT(0x0002829B | ((imm32 & 0x00000FFF)) << 20);
EMIT(0x0002829B | ((imm32 & 0x00000FFF) << 20));
EMIT(0x0202C057 | (dst << 7) | (dst << 20));
break;
@@ -137,7 +151,7 @@ void* generateDatasetInitVectorRV64(uint8_t* buf, size_t buf_size, SuperscalarPr
// 9B 82 02 00 addiw x5, x5, 0
// 57 C0 02 2E vxor.vx v0, v0, x5
EMIT(0x000002B7 | ((imm32 + ((imm32 & 0x800) << 1)) & 0xFFFFF000));
EMIT(0x0002829B | ((imm32 & 0x00000FFF)) << 20);
EMIT(0x0002829B | ((imm32 & 0x00000FFF) << 20));
EMIT(0x2E02C057 | (dst << 7) | (dst << 20));
break;
@@ -175,33 +189,725 @@ void* generateDatasetInitVectorRV64(uint8_t* buf, size_t buf_size, SuperscalarPr
break;
default:
break;
UNREACHABLE;
}
}
// Step 6
k = DIST(randomx_riscv64_vector_sshash_xor, randomx_riscv64_vector_sshash_set_cache_index);
k = DIST(randomx_riscv64_vector_sshash_xor, randomx_riscv64_vector_sshash_end);
memcpy(p, reinterpret_cast<void*>(randomx_riscv64_vector_sshash_xor), k);
p += k;
// Step 7
// Step 7. Set cacheIndex to the value of the register that has the longest dependency chain in the SuperscalarHash function executed in step 5.
if (i + 1 < num_programs) {
memcpy(p, reinterpret_cast<uint8_t*>(randomx_riscv64_vector_sshash_set_cache_index) + programs[i].getAddressRegister() * 4, 4);
// vmv.v.v v9, v0 + programs[i].getAddressRegister()
const uint32_t t = 0x5E0004D7 + (static_cast<uint32_t>(programs[i].getAddressRegister()) << 15);
memcpy(p, &t, 4);
p += 4;
}
}
// Emit "J randomx_riscv64_vector_sshash_generated_instructions_end" instruction
const uint8_t* e = buf + DIST(randomx_riscv64_vector_sshash_begin, randomx_riscv64_vector_sshash_generated_instructions_end);
const uint32_t k = e - p;
const uint32_t j = 0x6F | ((k & 0x7FE) << 20) | ((k & 0x800) << 9) | (k & 0xFF000);
const uint8_t* e = buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_sshash_generated_instructions_end);
const uint32_t j = JUMP(e - p);
memcpy(p, &j, 4);
char* result = (char*)(buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_sshash_dataset_init));
#ifdef __GNUC__
__builtin___clear_cache((char*) buf, (char*)(buf + buf_size));
__builtin___clear_cache(result, (char*)(buf + DIST(randomx_riscv64_vector_sshash_begin, randomx_riscv64_vector_sshash_end)));
#endif
return buf + DIST(randomx_riscv64_vector_sshash_begin, randomx_riscv64_vector_sshash_dataset_init);
return result;
}
#define emit16(value) { const uint16_t t = value; memcpy(p, &t, 2); p += 2; }
#define emit32(value) { const uint32_t t = value; memcpy(p, &t, 4); p += 4; }
#define emit64(value) { const uint64_t t = value; memcpy(p, &t, 8); p += 8; }
#define emit_data(arr) { memcpy(p, arr, sizeof(arr)); p += sizeof(arr); }
static void imm_to_x5(uint32_t imm, uint8_t*& p)
{
const uint32_t imm_hi = (imm + ((imm & 0x800) << 1)) & 0xFFFFF000U;
const uint32_t imm_lo = imm & 0x00000FFFU;
if (imm_hi == 0) {
// li x5, imm_lo
emit32(0x00000293 + (imm_lo << 20));
return;
}
if (imm_lo == 0) {
// lui x5, imm_hi
emit32(0x000002B7 + imm_hi);
return;
}
if (imm_hi < (32 << 12)) {
//c.lui x5, imm_hi
emit16(0x6281 + (imm_hi >> 10));
}
else {
// lui x5, imm_hi
emit32(0x000002B7 + imm_hi);
}
// addiw x5, x5, imm_lo
emit32(0x0002829B | (imm_lo << 20));
}
static void loadFromScratchpad(uint32_t src, uint32_t dst, uint32_t mod, uint32_t imm, uint8_t*& p)
{
if (src == dst) {
imm &= RandomX_CurrentConfig.ScratchpadL3Mask_Calculated;
if (imm <= 2047) {
// ld x5, imm(x12)
emit32(0x00063283 | (imm << 20));
}
else if (imm <= 2047 * 2) {
// addi x5, x12, 2047
emit32(0x7FF60293);
// ld x5, (imm - 2047)(x5)
emit32(0x0002B283 | ((imm - 2047) << 20));
}
else {
// lui x5, imm & 0xFFFFF000U
emit32(0x000002B7 | ((imm + ((imm & 0x800) << 1)) & 0xFFFFF000U));
// c.add x5, x12
emit16(0x92B2);
// ld x5, (imm & 0xFFF)(x5)
emit32(0x0002B283 | ((imm & 0xFFF) << 20));
}
return;
}
uint32_t shift = 32;
uint32_t mask_reg;
if ((mod & 3) == 0) {
shift -= RandomX_CurrentConfig.Log2_ScratchpadL2;
mask_reg = 17;
}
else {
shift -= RandomX_CurrentConfig.Log2_ScratchpadL1;
mask_reg = 16;
}
imm = static_cast<uint32_t>(static_cast<int32_t>(imm << shift) >> shift);
// 0-0x7FF, 0xFFFFF800-0xFFFFFFFF fit into 12 bit (a single addi instruction)
if (imm - 0xFFFFF800U < 0x1000U) {
// addi x5, x20 + src, imm
emit32(0x000A0293 + (src << 15) + (imm << 20));
}
else {
imm_to_x5(imm, p);
// c.add x5, x20 + src
emit16(0x92D2 + (src << 2));
}
// and x5, x5, mask_reg
emit32(0x0002F2B3 + (mask_reg << 20));
// c.add x5, x12
emit16(0x92B2);
// ld x5, 0(x5)
emit32(0x0002B283);
}
void* generateProgramVectorRV64(uint8_t* buf, Program& prog, ProgramConfiguration& pcfg, const uint8_t (&inst_map)[256], void* entryDataInitScalar, uint32_t datasetOffset)
{
uint64_t* params = (uint64_t*)(buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_params));
params[0] = RandomX_CurrentConfig.ScratchpadL1_Size - 8;
params[1] = RandomX_CurrentConfig.ScratchpadL2_Size - 8;
params[2] = RandomX_CurrentConfig.ScratchpadL3_Size - 8;
params[3] = RandomX_CurrentConfig.DatasetBaseSize - 64;
params[4] = (1 << RandomX_ConfigurationBase::JumpBits) - 1;
const bool hasAES = xmrig::Cpu::info()->hasAES();
if (RandomX_CurrentConfig.Tweak_V2_AES && !hasAES) {
params[5] = (uint64_t) &lutEnc[2][0];
params[6] = (uint64_t) &lutDec[2][0];
params[7] = (uint64_t) lutEncIndex;
params[8] = (uint64_t) lutDecIndex;
uint32_t* p1 = (uint32_t*)(buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_v2_soft_aes_init));
// Restore vsetivli zero, 4, e32, m1, ta, ma
*p1 = 0xCD027057;
}
else {
uint32_t* p1 = (uint32_t*)(buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_v2_soft_aes_init));
// Emit "J randomx_riscv64_vector_program_main_loop" instruction
*p1 = JUMP(DIST(randomx_riscv64_vector_program_v2_soft_aes_init, randomx_riscv64_vector_program_main_loop));
}
uint64_t* imul_rcp_literals = (uint64_t*)(buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_imul_rcp_literals));
uint64_t* cur_literal = imul_rcp_literals;
uint32_t* spaddr_xor = (uint32_t*)(buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_main_loop_spaddr_xor));
uint32_t* spaddr_xor2 = (uint32_t*)(buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_scratchpad_prefetch));
uint32_t* mx_xor = (uint32_t*)(buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_main_loop_mx_xor));
uint32_t* mx_xor_light = (uint32_t*)(buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_main_loop_mx_xor_light_mode));
*spaddr_xor = 0x014A47B3 + (pcfg.readReg0 << 15) + (pcfg.readReg1 << 20); // xor x15, readReg0, readReg1
*spaddr_xor2 = 0x014A42B3 + (pcfg.readReg0 << 15) + (pcfg.readReg1 << 20); // xor x5, readReg0, readReg1
const uint32_t mx_xor_value = 0x014A42B3 + (pcfg.readReg2 << 15) + (pcfg.readReg3 << 20); // xor x5, readReg2, readReg3
*mx_xor = mx_xor_value;
*mx_xor_light = mx_xor_value;
// "slli x5, x5, 32" for RandomX v2, "nop" for RandomX v1
const uint16_t mp_reg_value = RandomX_CurrentConfig.Tweak_V2_PREFETCH ? 0x1282 : 0x0001;
memcpy(((uint8_t*)mx_xor) + 8, &mp_reg_value, sizeof(mp_reg_value));
memcpy(((uint8_t*)mx_xor_light) + 8, &mp_reg_value, sizeof(mp_reg_value));
// "srli x5, x14, 32" for RandomX v2, "srli x5, x14, 0" for RandomX v1
const uint32_t mp_reg_value2 = RandomX_CurrentConfig.Tweak_V2_PREFETCH ? 0x02075293 : 0x00075293;
memcpy(((uint8_t*)mx_xor) + 14, &mp_reg_value2, sizeof(mp_reg_value2));
if (entryDataInitScalar) {
void* light_mode_data = buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_main_loop_light_mode_data);
const uint64_t data[2] = { reinterpret_cast<uint64_t>(entryDataInitScalar), datasetOffset };
memcpy(light_mode_data, &data, sizeof(data));
}
uint8_t* p = (uint8_t*)(buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_main_loop_instructions));
// 57C8025E vmv.v.x v16, x5
// 57A9034B vsext.vf2 v18, v16
// 5798214B vfcvt.f.x.v v16, v18
static constexpr uint8_t group_f_convert[] = {
0x57, 0xC8, 0x02, 0x5E, 0x57, 0xA9, 0x03, 0x4B, 0x57, 0x98, 0x21, 0x4B
};
// 57080627 vand.vv v16, v16, v12
// 5788062B vor.vv v16, v16, v13
static constexpr uint8_t group_e_post_process[] = { 0x57, 0x08, 0x06, 0x27, 0x57, 0x88, 0x06, 0x2B };
uint8_t* last_modified[RegistersCount] = { p, p, p, p, p, p, p, p };
for (uint32_t i = 0, n = prog.getSize(); i < n; ++i) {
Instruction instr = prog(i);
uint32_t src = instr.src % RegistersCount;
uint32_t dst = instr.dst % RegistersCount;
const uint32_t shift = instr.getModShift();
uint32_t imm = instr.getImm32();
const uint32_t mod = instr.mod;
switch (static_cast<InstructionType>(inst_map[instr.opcode])) {
case InstructionType::IADD_RS:
if (shift == 0) {
// c.add x20 + dst, x20 + src
emit16(0x9A52 + (src << 2) + (dst << 7));
}
else {
#ifdef __riscv_zba
// sh{shift}add x20 + dst, x20 + src, x20 + dst
emit32(0x214A0A33 + (shift << 13) + (dst << 7) + (src << 15) + (dst << 20));
#else // __riscv_zba
// slli x5, x20 + src, shift
emit32(0x000A1293 + (src << 15) + (shift << 20));
// c.add x20 + dst, x5
emit16(0x9A16 + (dst << 7));
#endif // __riscv_zba
}
if (dst == RegisterNeedsDisplacement) {
imm_to_x5(imm, p);
// c.add x20 + dst, x5
emit16(0x9A16 + (dst << 7));
}
last_modified[dst] = p;
break;
case InstructionType::IADD_M:
loadFromScratchpad(src, dst, mod, imm, p);
// c.add x20 + dst, x5
emit16(0x9A16 + (dst << 7));
last_modified[dst] = p;
break;
case InstructionType::ISUB_R:
if (src != dst) {
// sub x20 + dst, x20 + dst, x20 + src
emit32(0x414A0A33 + (dst << 7) + (dst << 15) + (src << 20));
}
else {
imm_to_x5(-imm, p);
// c.add x20 + dst, x5
emit16(0x9A16 + (dst << 7));
}
last_modified[dst] = p;
break;
case InstructionType::ISUB_M:
loadFromScratchpad(src, dst, mod, imm, p);
// sub x20 + dst, x20 + dst, x5
emit32(0x405A0A33 + (dst << 7) + (dst << 15));
last_modified[dst] = p;
break;
case InstructionType::IMUL_R:
if (src != dst) {
// mul x20 + dst, x20 + dst, x20 + src
emit32(0x034A0A33 + (dst << 7) + (dst << 15) + (src << 20));
}
else {
imm_to_x5(imm, p);
// mul x20 + dst, x20 + dst, x5
emit32(0x025A0A33 + (dst << 7) + (dst << 15));
}
last_modified[dst] = p;
break;
case InstructionType::IMUL_M:
loadFromScratchpad(src, dst, mod, imm, p);
// mul x20 + dst, x20 + dst, x5
emit32(0x025A0A33 + (dst << 7) + (dst << 15));
last_modified[dst] = p;
break;
case InstructionType::IMULH_R:
// mulhu x20 + dst, x20 + dst, x20 + src
emit32(0x034A3A33 + (dst << 7) + (dst << 15) + (src << 20));
last_modified[dst] = p;
break;
case InstructionType::IMULH_M:
loadFromScratchpad(src, dst, mod, imm, p);
// mulhu x20 + dst, x20 + dst, x5
emit32(0x025A3A33 + (dst << 7) + (dst << 15));
last_modified[dst] = p;
break;
case InstructionType::ISMULH_R:
// mulh x20 + dst, x20 + dst, x20 + src
emit32(0x034A1A33 + (dst << 7) + (dst << 15) + (src << 20));
last_modified[dst] = p;
break;
case InstructionType::ISMULH_M:
loadFromScratchpad(src, dst, mod, imm, p);
// mulh x20 + dst, x20 + dst, x5
emit32(0x025A1A33 + (dst << 7) + (dst << 15));
last_modified[dst] = p;
break;
case InstructionType::IMUL_RCP:
if (!isZeroOrPowerOf2(imm)) {
const uint64_t offset = (cur_literal - imul_rcp_literals) * 8;
*(cur_literal++) = randomx_reciprocal_fast(imm);
static constexpr uint32_t rcp_regs[26] = {
/* Integer */ 8, 10, 28, 29, 30, 31,
/* Float */ 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14, 15, 16, 17, 28, 29, 30, 31
};
if (offset < 6 * 8) {
// mul x20 + dst, x20 + dst, rcp_reg
emit32(0x020A0A33 + (dst << 7) + (dst << 15) + (rcp_regs[offset / 8] << 20));
}
else if (offset < 26 * 8) {
// fmv.x.d x5, rcp_reg
emit32(0xE20002D3 + (rcp_regs[offset / 8] << 15));
// mul x20 + dst, x20 + dst, x5
emit32(0x025A0A33 + (dst << 7) + (dst << 15));
}
else {
// ld x5, offset(x18)
emit32(0x00093283 + (offset << 20));
// mul x20 + dst, x20 + dst, x5
emit32(0x025A0A33 + (dst << 7) + (dst << 15));
}
last_modified[dst] = p;
}
break;
case InstructionType::INEG_R:
// sub x20 + dst, x0, x20 + dst
emit32(0x41400A33 + (dst << 7) + (dst << 20));
last_modified[dst] = p;
break;
case InstructionType::IXOR_R:
if (src != dst) {
// xor x20 + dst, x20 + dst, x20 + src
emit32(0x014A4A33 + (dst << 7) + (dst << 15) + (src << 20));
}
else {
imm_to_x5(imm, p);
// xor x20, x20, x5
emit32(0x005A4A33 + (dst << 7) + (dst << 15));
}
last_modified[dst] = p;
break;
case InstructionType::IXOR_M:
loadFromScratchpad(src, dst, mod, imm, p);
// xor x20, x20, x5
emit32(0x005A4A33 + (dst << 7) + (dst << 15));
last_modified[dst] = p;
break;
#ifdef __riscv_zbb
case InstructionType::IROR_R:
if (src != dst) {
// ror x20 + dst, x20 + dst, x20 + src
emit32(0x614A5A33 + (dst << 7) + (dst << 15) + (src << 20));
}
else {
// rori x20 + dst, x20 + dst, imm
emit32(0x600A5A13 + (dst << 7) + (dst << 15) + ((imm & 63) << 20));
}
last_modified[dst] = p;
break;
case InstructionType::IROL_R:
if (src != dst) {
// rol x20 + dst, x20 + dst, x20 + src
emit32(0x614A1A33 + (dst << 7) + (dst << 15) + (src << 20));
}
else {
// rori x20 + dst, x20 + dst, -imm
emit32(0x600A5A13 + (dst << 7) + (dst << 15) + ((-imm & 63) << 20));
}
last_modified[dst] = p;
break;
#else // __riscv_zbb
case InstructionType::IROR_R:
if (src != dst) {
// sub x5, x0, x20 + src
emit32(0x414002B3 + (src << 20));
// srl x6, x20 + dst, x20 + src
emit32(0x014A5333 + (dst << 15) + (src << 20));
// sll x20 + dst, x20 + dst, x5
emit32(0x005A1A33 + (dst << 7) + (dst << 15));
// or x20 + dst, x20 + dst, x6
emit32(0x006A6A33 + (dst << 7) + (dst << 15));
}
else {
// srli x5, x20 + dst, imm
emit32(0x000A5293 + (dst << 15) + ((imm & 63) << 20));
// slli x6, x20 + dst, -imm
emit32(0x000A1313 + (dst << 15) + ((-imm & 63) << 20));
// or x20 + dst, x5, x6
emit32(0x0062EA33 + (dst << 7));
}
last_modified[dst] = p;
break;
case InstructionType::IROL_R:
if (src != dst) {
// sub x5, x0, x20 + src
emit32(0x414002B3 + (src << 20));
// sll x6, x20 + dst, x20 + src
emit32(0x014A1333 + (dst << 15) + (src << 20));
// srl x20 + dst, x20 + dst, x5
emit32(0x005A5A33 + (dst << 7) + (dst << 15));
// or x20 + dst, x20 + dst, x6
emit32(0x006A6A33 + (dst << 7) + (dst << 15));
}
else {
// srli x5, x20 + dst, -imm
emit32(0x000A5293 + (dst << 15) + ((-imm & 63) << 20));
// slli x6, x20 + dst, imm
emit32(0x000A1313 + (dst << 15) + ((imm & 63) << 20));
// or x20 + dst, x5, x6
emit32(0x0062EA33 + (dst << 7));
}
last_modified[dst] = p;
break;
#endif // __riscv_zbb
case InstructionType::ISWAP_R:
if (src != dst) {
// c.mv x5, x20 + dst
emit16(0x82D2 + (dst << 2));
// c.mv x20 + dst, x20 + src
emit16(0x8A52 + (src << 2) + (dst << 7));
// c.mv x20 + src, x5
emit16(0x8A16 + (src << 7));
last_modified[src] = p;
last_modified[dst] = p;
}
break;
case InstructionType::FSWAP_R:
// vmv.x.s x5, v0 + dst
emit32(0x420022D7 + (dst << 20));
// vslide1down.vx v0 + dst, v0 + dst, x5
emit32(0x3E02E057 + (dst << 7) + (dst << 20));
break;
case InstructionType::FADD_R:
src %= RegisterCountFlt;
dst %= RegisterCountFlt;
// vfadd.vv v0 + dst, v0 + dst, v8 + src
emit32(0x02041057 + (dst << 7) + (src << 15) + (dst << 20));
break;
case InstructionType::FADD_M:
dst %= RegisterCountFlt;
loadFromScratchpad(src, RegistersCount, mod, imm, p);
emit_data(group_f_convert);
// vfadd.vv v0 + dst, v0 + dst, v16
emit32(0x02081057 + (dst << 7) + (dst << 20));
break;
case InstructionType::FSUB_R:
src %= RegisterCountFlt;
dst %= RegisterCountFlt;
// vfsub.vv v0 + dst, v0 + dst, v8 + src
emit32(0x0A041057 + (dst << 7) + (src << 15) + (dst << 20));
break;
case InstructionType::FSUB_M:
dst %= RegisterCountFlt;
loadFromScratchpad(src, RegistersCount, mod, imm, p);
emit_data(group_f_convert);
// vfsub.vv v0 + dst, v0 + dst, v16
emit32(0x0A081057 + (dst << 7) + (dst << 20));
break;
case InstructionType::FSCAL_R:
dst %= RegisterCountFlt;
// vxor.vv v0, v0, v14
emit32(0x2E070057 + (dst << 7) + (dst << 20));
break;
case InstructionType::FMUL_R:
src %= RegisterCountFlt;
dst %= RegisterCountFlt;
// vfmul.vv v4 + dst, v4 + dst, v8 + src
emit32(0x92441257 + (dst << 7) + (src << 15) + (dst << 20));
break;
case InstructionType::FDIV_M:
dst %= RegisterCountFlt;
loadFromScratchpad(src, RegistersCount, mod, imm, p);
emit_data(group_f_convert);
emit_data(group_e_post_process);
// vfdiv.vv v0 + dst, v0 + dst, v16
emit32(0x82481257 + (dst << 7) + (dst << 20));
break;
case InstructionType::FSQRT_R:
dst %= RegisterCountFlt;
// vfsqrt.v v4 + dst, v4 + dst
emit32(0x4E401257 + (dst << 7) + (dst << 20));
break;
case InstructionType::CBRANCH:
{
const uint32_t shift = (mod >> 4) + RandomX_ConfigurationBase::JumpOffset;
imm |= (1UL << shift);
if (RandomX_ConfigurationBase::JumpOffset > 0 || shift > 0) {
imm &= ~(1UL << (shift - 1));
}
// slli x6, x7, shift
// x6 = branchMask
emit32(0x00039313 + (shift << 20));
// x5 = imm
imm_to_x5(imm, p);
// c.add x20 + dst, x5
emit16(0x9A16 + (dst << 7));
// and x5, x20 + dst, x6
emit32(0x006A72B3 + (dst << 15));
const int offset = static_cast<int>(last_modified[dst] - p);
if (offset >= -4096) {
// beqz x5, offset
const uint32_t k = static_cast<uint32_t>(offset);
emit32(0x80028063 | ((k & 0x1E) << 7) | ((k & 0x7E0) << 20) | ((k & 0x800) >> 4));
}
else {
// bnez x5, 8
emit32(0x00029463);
// j offset
const uint32_t k = static_cast<uint32_t>(offset - 4);
emit32(0x8000006F | ((k & 0x7FE) << 20) | ((k & 0x800) << 9) | (k & 0xFF000));
}
for (uint32_t j = 0; j < RegistersCount; ++j) {
last_modified[j] = p;
}
}
break;
case InstructionType::CFROUND:
if ((imm - 1) & 63) {
#ifdef __riscv_zbb
// rori x5, x20 + src, imm - 1
emit32(0x600A5293 + (src << 15) + (((imm - 1) & 63) << 20));
#else // __riscv_zbb
// srli x5, x20 + src, imm - 1
emit32(0x000A5293 + (src << 15) + (((imm - 1) & 63) << 20));
// slli x6, x20 + src, 1 - imm
emit32(0x000A1313 + (src << 15) + (((1 - imm) & 63) << 20));
// or x5, x5, x6
emit32(0x0062E2B3);
#endif // __riscv_zbb
if (RandomX_CurrentConfig.Tweak_V2_CFROUND) {
// andi x6, x5, 120
emit32(0x0782F313);
// bnez x6, +24
emit32(0x00031C63);
}
// andi x5, x5, 6
emit32(0x0062F293);
}
else {
if (RandomX_CurrentConfig.Tweak_V2_CFROUND) {
// andi x6, x20 + src, 120
emit32(0x078A7313 + (src << 15));
// bnez x6, +24
emit32(0x00031C63);
}
// andi x5, x20 + src, 6
emit32(0x006A7293 + (src << 15));
}
// li x6, 01111000b
// x6 = CFROUND lookup table
emit32(0x07800313);
// srl x5, x6, x5
emit32(0x005352B3);
// andi x5, x5, 3
emit32(0x0032F293);
// csrw frm, x5
emit32(0x00229073);
break;
case InstructionType::ISTORE:
{
uint32_t mask_reg;
uint32_t shift = 32;
if ((mod >> 4) >= 14) {
shift -= RandomX_CurrentConfig.Log2_ScratchpadL3;
mask_reg = 1; // x1 = L3 mask
}
else {
if ((mod & 3) == 0) {
shift -= RandomX_CurrentConfig.Log2_ScratchpadL2;
mask_reg = 17; // x17 = L2 mask
}
else {
shift -= RandomX_CurrentConfig.Log2_ScratchpadL1;
mask_reg = 16; // x16 = L1 mask
}
}
imm = static_cast<uint32_t>(static_cast<int32_t>(imm << shift) >> shift);
imm_to_x5(imm, p);
// c.add x5, x20 + dst
emit16(0x92D2 + (dst << 2));
// and x5, x5, x0 + mask_reg
emit32(0x0002F2B3 + (mask_reg << 20));
// c.add x5, x12
emit16(0x92B2);
// sd x20 + src, 0(x5)
emit32(0x0142B023 + (src << 20));
}
break;
case InstructionType::NOP:
break;
default:
UNREACHABLE;
}
}
const uint8_t* e;
if (entryDataInitScalar) {
// Emit "J randomx_riscv64_vector_program_main_loop_instructions_end_light_mode" instruction
e = buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_main_loop_instructions_end_light_mode);
}
else {
// Emit "J randomx_riscv64_vector_program_main_loop_instructions_end" instruction
e = buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_main_loop_instructions_end);
}
emit32(JUMP(e - p));
if (RandomX_CurrentConfig.Tweak_V2_AES) {
uint32_t* p1 = (uint32_t*)(buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_main_loop_fe_mix));
if (hasAES) {
// Restore vsetivli zero, 4, e32, m1, ta, ma
*p1 = 0xCD027057;
}
else {
// Emit "J randomx_riscv64_vector_program_main_loop_fe_mix_v2_soft_aes" instruction
*p1 = JUMP(DIST(randomx_riscv64_vector_program_main_loop_fe_mix, randomx_riscv64_vector_program_main_loop_fe_mix_v2_soft_aes));
}
}
else {
uint32_t* p1 = (uint32_t*)(buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_main_loop_fe_mix));
// Emit "J randomx_riscv64_vector_program_main_loop_fe_mix_v1" instruction
*p1 = JUMP(DIST(randomx_riscv64_vector_program_main_loop_fe_mix, randomx_riscv64_vector_program_main_loop_fe_mix_v1));
}
#ifdef __GNUC__
char* p1 = (char*)(buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_params));
char* p2 = (char*)(buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_end));
__builtin___clear_cache(p1, p2);
#endif
return buf + DIST(randomx_riscv64_vector_code_begin, randomx_riscv64_vector_program_begin);
}
} // namespace randomx

View File

@@ -36,7 +36,10 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
namespace randomx {
class SuperscalarProgram;
struct ProgramConfiguration;
class Program;
void* generateDatasetInitVectorRV64(uint8_t* buf, size_t buf_size, SuperscalarProgram* programs, size_t num_programs);
void* generateDatasetInitVectorRV64(uint8_t* buf, SuperscalarProgram* programs, size_t num_programs);
void* generateProgramVectorRV64(uint8_t* buf, Program& prog, ProgramConfiguration& pcfg, const uint8_t (&inst_map)[256], void* entryDataInitScalar, uint32_t datasetOffset);
} // namespace randomx

View File

@@ -46,9 +46,14 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.text
.option arch, rv64gcv_zicbop
#ifndef __riscv_v
#error This file requires rv64gcv
#endif
.option pic
.global DECL(randomx_riscv64_vector_code_begin)
.global DECL(randomx_riscv64_vector_sshash_begin)
.global DECL(randomx_riscv64_vector_sshash_imul_rcp_literals)
.global DECL(randomx_riscv64_vector_sshash_dataset_init)
@@ -56,11 +61,35 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.global DECL(randomx_riscv64_vector_sshash_generated_instructions_end)
.global DECL(randomx_riscv64_vector_sshash_cache_prefetch)
.global DECL(randomx_riscv64_vector_sshash_xor)
.global DECL(randomx_riscv64_vector_sshash_set_cache_index)
.global DECL(randomx_riscv64_vector_sshash_end)
.global DECL(randomx_riscv64_vector_program_params)
.global DECL(randomx_riscv64_vector_program_imul_rcp_literals)
.global DECL(randomx_riscv64_vector_program_begin)
.global DECL(randomx_riscv64_vector_program_v2_soft_aes_init)
.global DECL(randomx_riscv64_vector_program_main_loop)
.global DECL(randomx_riscv64_vector_program_main_loop_instructions)
.global DECL(randomx_riscv64_vector_program_main_loop_instructions_end)
.global DECL(randomx_riscv64_vector_program_main_loop_mx_xor)
.global DECL(randomx_riscv64_vector_program_main_loop_spaddr_xor)
.global DECL(randomx_riscv64_vector_program_main_loop_fe_mix)
.global DECL(randomx_riscv64_vector_program_main_loop_light_mode_data)
.global DECL(randomx_riscv64_vector_program_main_loop_instructions_end_light_mode)
.global DECL(randomx_riscv64_vector_program_main_loop_mx_xor_light_mode)
.global DECL(randomx_riscv64_vector_program_scratchpad_prefetch)
.global DECL(randomx_riscv64_vector_program_main_loop_fe_mix_v1)
.global DECL(randomx_riscv64_vector_program_main_loop_fe_mix_v2_soft_aes)
.global DECL(randomx_riscv64_vector_program_end)
.global DECL(randomx_riscv64_vector_code_end)
.balign 8
DECL(randomx_riscv64_vector_code_begin):
DECL(randomx_riscv64_vector_sshash_begin):
sshash_constant_0: .dword 6364136223846793005
@@ -104,8 +133,7 @@ v19 = dataset item store offsets
DECL(randomx_riscv64_vector_sshash_dataset_init):
// Process 4 64-bit values at a time
li x5, 4
vsetvli x5, x5, e64, m1, ta, ma
vsetivli zero, 4, e64, m1, ta, ma
// Load cache->memory pointer
ld x10, (x10)
@@ -182,7 +210,6 @@ DECL(randomx_riscv64_vector_sshash_generated_instructions):
// Step 4. randomx_riscv64_vector_sshash_cache_prefetch
// Step 5. SuperscalarHash[i]
// Step 6. randomx_riscv64_vector_sshash_xor
// Step 7. randomx_riscv64_vector_sshash_set_cache_index
//
// Above steps will be repeated RANDOMX_CACHE_ACCESSES times
.fill RANDOMX_CACHE_ACCESSES * 2048, 4, 0
@@ -228,22 +255,38 @@ DECL(randomx_riscv64_vector_sshash_cache_prefetch):
// Prefetch element 0
vmv.x.s x5, v9
#ifdef __riscv_zicbop
prefetch.r (x5)
#else
ld x5, (x5)
#endif
// Prefetch element 1
vslidedown.vi v18, v9, 1
vmv.x.s x5, v18
#ifdef __riscv_zicbop
prefetch.r (x5)
#else
ld x5, (x5)
#endif
// Prefetch element 2
vslidedown.vi v18, v9, 2
vmv.x.s x5, v18
#ifdef __riscv_zicbop
prefetch.r (x5)
#else
ld x5, (x5)
#endif
// Prefetch element 3
vslidedown.vi v18, v9, 3
vmv.x.s x5, v18
#ifdef __riscv_zicbop
prefetch.r (x5)
#else
ld x5, (x5)
#endif
// v9 = byte offset into cache->memory
vsub.vx v9, v9, x10
@@ -281,16 +324,767 @@ DECL(randomx_riscv64_vector_sshash_xor):
vluxei64.v v18, (x5), v9
vxor.vv v7, v7, v18
// Step 7. Set cacheIndex to the value of the register that has the longest dependency chain in the SuperscalarHash function executed in step 5.
DECL(randomx_riscv64_vector_sshash_set_cache_index):
// JIT compiler will pick a single instruction reading from the required register
vmv.v.v v9, v0
vmv.v.v v9, v1
vmv.v.v v9, v2
vmv.v.v v9, v3
vmv.v.v v9, v4
vmv.v.v v9, v5
vmv.v.v v9, v6
vmv.v.v v9, v7
DECL(randomx_riscv64_vector_sshash_end):
/*
Reference: https://github.com/tevador/RandomX/blob/master/doc/specs.md#46-vm-execution
C declarations:
struct RegisterFile {
uint64_t r[8];
double f[4][2];
double e[4][2];
double a[4][2];
};
struct MemoryRegisters {
uint32_t mx, ma;
uint8_t* memory; // dataset (fast mode) or cache (light mode)
};
void ProgramFunc(RegisterFile* reg, MemoryRegisters* mem, uint8_t* scratchpad, uint64_t iterations);
Register layout
---------------
x0 = zero
x1 = scratchpad L3 mask
x2 = stack pointer
x3 = global pointer (unused)
x4 = thread pointer (unused)
x5 = temporary
x6 = temporary
x7 = branch mask (unshifted)
x8 = frame pointer, also 64-bit literal inside the loop
x9 = scratchpad L3 mask (64-byte aligned)
x10 = RegisterFile* reg, also 64-bit literal inside the loop
x11 = MemoryRegisters* mem, then dataset/cache pointer
x12 = scratchpad
x13 = iterations
x14 = mx, ma (always stored with dataset mask applied)
x15 = spAddr0, spAddr1
x16 = scratchpad L1 mask
x17 = scratchpad L2 mask
x18 = IMUL_RCP literals pointer
x19 = dataset mask
x20-x27 = r0-r7
x28-x31 = 64-bit literals
f0-f7 = 64-bit literals
f10-f17 = 64-bit literals
f28-f31 = 64-bit literals
v0-v3 = f0-f3
v4-v7 = e0-e3
v8-v11 = a0-a3
v12 = E 'and' mask = 0x00ffffffffffffff'00ffffffffffffff
v13 = E 'or' mask = 0x3*00000000******'3*00000000******
v14 = scale mask = 0x80f0000000000000'80f0000000000000
v15 = all zeroes
v16 = temporary
v17 = unused
v18 = temporary
v19 = unused
v20 = randomx_aes_lut_enc_index[0]
v21 = randomx_aes_lut_enc_index[1]
v22 = randomx_aes_lut_enc_index[2]
v23 = randomx_aes_lut_enc_index[3]
v24 = randomx_aes_lut_dec_index[0]
v25 = randomx_aes_lut_dec_index[1]
v26 = randomx_aes_lut_dec_index[2]
v27 = randomx_aes_lut_dec_index[3]
v28-v31 = temporary in aesenc_soft/aesdec_soft
*/
.balign 8
DECL(randomx_riscv64_vector_program_params):
// JIT compiler will adjust these values for different RandomX variants
randomx_masks: .dword 16376, 262136, 2097144, 2147483584, 255
randomx_aes_lut_enc_ptr: .dword 0
randomx_aes_lut_dec_ptr: .dword 0
randomx_aes_lut_enc_index_ptr: .dword 0
randomx_aes_lut_dec_index_ptr: .dword 0
DECL(randomx_riscv64_vector_program_imul_rcp_literals):
imul_rcp_literals: .fill RANDOMX_PROGRAM_MAX_SIZE, 8, 0
DECL(randomx_riscv64_vector_program_begin):
addi sp, sp, -112
sd x8, 96(sp) // save old frame pointer
addi x8, sp, 112 // setup new frame pointer
sd x1, 104(sp) // save return address
// Save callee-saved registers
sd x9, 0(sp)
sd x18, 8(sp)
sd x19, 16(sp)
sd x20, 24(sp)
sd x21, 32(sp)
sd x22, 40(sp)
sd x23, 48(sp)
sd x24, 56(sp)
sd x25, 64(sp)
sd x26, 72(sp)
sd x27, 80(sp)
// Save x10 as it will be used as an IMUL_RCP literal
sd x10, 88(sp)
// Load mx, ma and dataset pointer
ld x14, (x11)
ld x11, 8(x11)
// Initialize spAddr0-spAddr1
mv x15, x14
// Set registers r0-r7 to zero
li x20, 0
li x21, 0
li x22, 0
li x23, 0
li x24, 0
li x25, 0
li x26, 0
li x27, 0
// Load masks
lla x5, randomx_masks
ld x16, 0(x5)
ld x17, 8(x5)
ld x1, 16(x5)
ld x19, 24(x5)
ld x7, 32(x5)
addi x9, x1, -56
// Set vector registers to 2x64 bit
vsetivli zero, 2, e64, m1, ta, ma
// Apply dataset mask to mx, ma
slli x5, x19, 32
or x5, x5, x19
and x14, x14, x5
// Load group A registers
addi x5, x10, 192
vle64.v v8, (x5)
addi x5, x10, 208
vle64.v v9, (x5)
addi x5, x10, 224
vle64.v v10, (x5)
addi x5, x10, 240
vle64.v v11, (x5)
// Load E 'and' mask
vmv.v.i v12, -1
vsrl.vi v12, v12, 8
// Load E 'or' mask (stored in reg.f[0])
addi x5, x10, 64
vle64.v v13, (x5)
// Load scale mask
lui x5, 0x80f00
slli x5, x5, 32
vmv.v.x v14, x5
// IMUL_RCP literals pointer
lla x18, imul_rcp_literals
// Load IMUL_RCP literals
ld x8, 0(x18)
ld x10, 8(x18)
ld x28, 16(x18)
ld x29, 24(x18)
ld x30, 32(x18)
ld x31, 40(x18)
fld f0, 48(x18)
fld f1, 56(x18)
fld f2, 64(x18)
fld f3, 72(x18)
fld f4, 80(x18)
fld f5, 88(x18)
fld f6, 96(x18)
fld f7, 104(x18)
fld f10, 112(x18)
fld f11, 120(x18)
fld f12, 128(x18)
fld f13, 136(x18)
fld f14, 144(x18)
fld f15, 152(x18)
fld f16, 160(x18)
fld f17, 168(x18)
fld f28, 176(x18)
fld f29, 184(x18)
fld f30, 192(x18)
fld f31, 200(x18)
// Set v15 to zero
vxor.vv v15, v15, v15
DECL(randomx_riscv64_vector_program_v2_soft_aes_init):
// JIT compiler will place a jump to the main loop here if needed
// Load randomx_aes_lut_enc_index/randomx_aes_lut_dec_index
vsetivli zero, 4, e32, m1, ta, ma
lla x5, randomx_aes_lut_enc_index_ptr
ld x5, (x5)
vle32.v v20, (x5)
addi x6, x5, 32
vle32.v v21, (x6)
addi x6, x5, 64
vle32.v v22, (x6)
addi x6, x5, 96
vle32.v v23, (x6)
lla x5, randomx_aes_lut_dec_index_ptr
ld x5, (x5)
vle32.v v24, (x5)
addi x6, x5, 32
vle32.v v25, (x6)
addi x6, x5, 64
vle32.v v26, (x6)
addi x6, x5, 96
vle32.v v27, (x6)
vsetivli zero, 2, e64, m1, ta, ma
DECL(randomx_riscv64_vector_program_main_loop):
and x5, x15, x9 // x5 = spAddr0 & 64-byte aligned L3 mask
add x5, x5, x12 // x5 = &scratchpad[spAddr0 & 64-byte aligned L3 mask]
// read a 64-byte line from scratchpad (indexed by spAddr0) and XOR it with r0-r7
ld x6, 0(x5)
xor x20, x20, x6
ld x6, 8(x5)
xor x21, x21, x6
ld x6, 16(x5)
xor x22, x22, x6
ld x6, 24(x5)
xor x23, x23, x6
ld x6, 32(x5)
xor x24, x24, x6
ld x6, 40(x5)
xor x25, x25, x6
ld x6, 48(x5)
xor x26, x26, x6
ld x6, 56(x5)
xor x27, x27, x6
srli x5, x15, 32 // x5 = spAddr1
and x5, x5, x9 // x5 = spAddr1 & 64-byte aligned L3 mask
add x5, x5, x12 // x5 = &scratchpad[spAddr1 & 64-byte aligned L3 mask]
// read a 64-byte line from scratchpad (indexed by spAddr1) and initialize f0-f3, e0-e3 registers
// Set vector registers to 2x32 bit
vsetivli zero, 2, e32, m1, ta, ma
// load f0
vle32.v v16, (x5)
vfwcvt.f.x.v v0, v16
// load f1
addi x6, x5, 8
vle32.v v1, (x6)
// Use v16 as an intermediary register because vfwcvt accepts only registers with even numbers here
vfwcvt.f.x.v v16, v1
vmv1r.v v1, v16
// load f2
addi x6, x5, 16
vle32.v v16, (x6)
vfwcvt.f.x.v v2, v16
// load f3
addi x6, x5, 24
vle32.v v3, (x6)
vfwcvt.f.x.v v16, v3
vmv1r.v v3, v16
// load e0
addi x6, x5, 32
vle32.v v16, (x6)
vfwcvt.f.x.v v4, v16
// load e1
addi x6, x5, 40
vle32.v v5, (x6)
vfwcvt.f.x.v v16, v5
vmv1r.v v5, v16
// load e2
addi x6, x5, 48
vle32.v v16, (x6)
vfwcvt.f.x.v v6, v16
// load e3
addi x6, x5, 56
vle32.v v7, (x6)
vfwcvt.f.x.v v16, v7
vmv1r.v v7, v16
// Set vector registers back to 2x64 bit
vsetivli zero, 2, e64, m1, ta, ma
// post-process e0-e3
vand.vv v4, v4, v12
vand.vv v5, v5, v12
vand.vv v6, v6, v12
vand.vv v7, v7, v12
vor.vv v4, v4, v13
vor.vv v5, v5, v13
vor.vv v6, v6, v13
vor.vv v7, v7, v13
DECL(randomx_riscv64_vector_program_main_loop_instructions):
// Generated by JIT compiler
// FDIV_M can generate up to 50 bytes of code (round it up to 52 - a multiple of 4)
// +32 bytes for the scratchpad prefetch and the final jump instruction
.fill RANDOMX_PROGRAM_MAX_SIZE * 52 + 32, 1, 0
DECL(randomx_riscv64_vector_program_main_loop_instructions_end):
// Calculate dataset pointer for dataset read
// Do it here to break false dependency from readReg2 and readReg3 (see below)
srli x6, x14, 32 // x6 = ma & dataset mask
DECL(randomx_riscv64_vector_program_main_loop_mx_xor):
xor x5, x24, x26 // x5 = readReg2 ^ readReg3 (JIT compiler will substitute the actual registers)
and x5, x5, x19 // x5 = (readReg2 ^ readReg3) & dataset mask
slli x5, x5, 32 // JIT compiler will replace it with "nop" for v1
xor x14, x14, x5 // mp ^= (readReg2 ^ readReg3) & dataset mask
srli x5, x14, 32 // JIT compiler will replace it with "srli x5, x14, 0" for v1
and x5, x5, x19 // x5 = mp & dataset mask
add x5, x5, x11 // x5 = &dataset[mp & dataset mask]
#ifdef __riscv_zicbop
prefetch.r (x5)
#else
ld x5, (x5)
#endif
add x5, x6, x11 // x5 = &dataset[ma & dataset mask]
// read a 64-byte line from dataset and XOR it with r0-r7
ld x6, 0(x5)
xor x20, x20, x6
ld x6, 8(x5)
xor x21, x21, x6
ld x6, 16(x5)
xor x22, x22, x6
ld x6, 24(x5)
xor x23, x23, x6
ld x6, 32(x5)
xor x24, x24, x6
ld x6, 40(x5)
xor x25, x25, x6
ld x6, 48(x5)
xor x26, x26, x6
ld x6, 56(x5)
xor x27, x27, x6
DECL(randomx_riscv64_vector_program_scratchpad_prefetch):
xor x5, x20, x22 // spAddr0-spAddr1 = readReg0 ^ readReg1 (JIT compiler will substitute the actual registers)
srli x6, x5, 32 // x6 = spAddr1
and x5, x5, x9 // x5 = spAddr0 & 64-byte aligned L3 mask
and x6, x6, x9 // x6 = spAddr1 & 64-byte aligned L3 mask
c.add x5, x12 // x5 = &scratchpad[spAddr0 & 64-byte aligned L3 mask]
c.add x6, x12 // x6 = &scratchpad[spAddr1 & 64-byte aligned L3 mask]
#ifdef __riscv_zicbop
prefetch.r (x5)
prefetch.r (x6)
#else
ld x5, (x5)
ld x6, (x6)
#endif
// swap mx <-> ma
#ifdef __riscv_zbb
rori x14, x14, 32
#else
srli x5, x14, 32
slli x14, x14, 32
or x14, x14, x5
#endif
srli x5, x15, 32 // x5 = spAddr1
and x5, x5, x9 // x5 = spAddr1 & 64-byte aligned L3 mask
add x5, x5, x12 // x5 = &scratchpad[spAddr1 & 64-byte aligned L3 mask]
// store registers r0-r7 to the scratchpad
sd x20, 0(x5)
sd x21, 8(x5)
sd x22, 16(x5)
sd x23, 24(x5)
sd x24, 32(x5)
sd x25, 40(x5)
sd x26, 48(x5)
sd x27, 56(x5)
and x5, x15, x9 // x5 = spAddr0 & 64-byte aligned L3 mask
add x5, x5, x12 // x5 = &scratchpad[spAddr0 & 64-byte aligned L3 mask]
DECL(randomx_riscv64_vector_program_main_loop_spaddr_xor):
xor x15, x20, x22 // spAddr0-spAddr1 = readReg0 ^ readReg1 (JIT compiler will substitute the actual registers)
// store registers f0-f3 to the scratchpad (f0-f3 are first combined with e0-e3)
// v2 FE mix code is the main code path
// JIT compiler will place a jump to v1 or v2 soft AES code here if needed
DECL(randomx_riscv64_vector_program_main_loop_fe_mix):
vsetivli zero, 4, e32, m1, ta, ma
// f0 = aesenc(f0, e0), f1 = aesdec(f1, e0), f2 = aesenc(f2, e0), f3 = aesdec(f3, e0)
vaesem.vv v0, v4
vaesdm.vv v1, v15
vaesem.vv v2, v4
vaesdm.vv v3, v15
vxor.vv v1, v1, v4
vxor.vv v3, v3, v4
// f0 = aesenc(f0, e1), f1 = aesdec(f1, e1), f2 = aesenc(f2, e1), f3 = aesdec(f3, e1)
vaesem.vv v0, v5
vaesdm.vv v1, v15
vaesem.vv v2, v5
vaesdm.vv v3, v15
vxor.vv v1, v1, v5
vxor.vv v3, v3, v5
// f0 = aesenc(f0, e2), f1 = aesdec(f1, e2), f2 = aesenc(f2, e2), f3 = aesdec(f3, e2)
vaesem.vv v0, v6
vaesdm.vv v1, v15
vaesem.vv v2, v6
vaesdm.vv v3, v15
vxor.vv v1, v1, v6
vxor.vv v3, v3, v6
// f0 = aesenc(f0, e3), f1 = aesdec(f1, e3), f2 = aesenc(f2, e3), f3 = aesdec(f3, e3)
vaesem.vv v0, v7
vaesdm.vv v1, v15
vaesem.vv v2, v7
vaesdm.vv v3, v15
vxor.vv v1, v1, v7
vxor.vv v3, v3, v7
vsetivli zero, 2, e64, m1, ta, ma
randomx_riscv64_vector_program_main_loop_fe_store:
vse64.v v0, (x5)
addi x6, x5, 16
vse64.v v1, (x6)
addi x6, x5, 32
vse64.v v2, (x6)
addi x6, x5, 48
vse64.v v3, (x6)
addi x13, x13, -1
beqz x13, randomx_riscv64_vector_program_main_loop_end
j randomx_riscv64_vector_program_main_loop
randomx_riscv64_vector_program_main_loop_end:
// Restore x8 and x10
addi x8, sp, 112
ld x10, 88(sp)
// Store integer registers
sd x20, 0(x10)
sd x21, 8(x10)
sd x22, 16(x10)
sd x23, 24(x10)
sd x24, 32(x10)
sd x25, 40(x10)
sd x26, 48(x10)
sd x27, 56(x10)
// Store FP registers
addi x5, x10, 64
vse64.v v0, (x5)
addi x5, x10, 80
vse64.v v1, (x5)
addi x5, x10, 96
vse64.v v2, (x5)
addi x5, x10, 112
vse64.v v3, (x5)
addi x5, x10, 128
vse64.v v4, (x5)
addi x5, x10, 144
vse64.v v5, (x5)
addi x5, x10, 160
vse64.v v6, (x5)
addi x5, x10, 176
vse64.v v7, (x5)
// Restore callee-saved registers
ld x9, 0(sp)
ld x18, 8(sp)
ld x19, 16(sp)
ld x20, 24(sp)
ld x21, 32(sp)
ld x22, 40(sp)
ld x23, 48(sp)
ld x24, 56(sp)
ld x25, 64(sp)
ld x26, 72(sp)
ld x27, 80(sp)
ld x8, 96(sp) // old frame pointer
ld x1, 104(sp) // return address
addi sp, sp, 112
ret
DECL(randomx_riscv64_vector_program_main_loop_light_mode_data):
// 1) Pointer to the scalar dataset init function
// 2) Dataset offset
.dword 0, 0
DECL(randomx_riscv64_vector_program_main_loop_instructions_end_light_mode):
// Calculate dataset pointer for dataset read
// Do it here to break false dependency from readReg2 and readReg3 (see below)
srli x6, x14, 32 // x6 = ma & dataset mask
DECL(randomx_riscv64_vector_program_main_loop_mx_xor_light_mode):
xor x5, x24, x26 // x5 = readReg2 ^ readReg3 (JIT compiler will substitute the actual registers)
and x5, x5, x19 // x5 = (readReg2 ^ readReg3) & dataset mask
slli x5, x5, 32 // JIT compiler will replace it with "nop" for v1
xor x14, x14, x5 // mx ^= (readReg2 ^ readReg3) & dataset mask
// Save all registers modified when calling dataset_init_scalar_func_ptr
addi sp, sp, -192
// bytes [0, 127] - saved registers
// bytes [128, 191] - output buffer
sd x1, 0(sp)
sd x7, 16(sp)
sd x10, 24(sp)
sd x11, 32(sp)
sd x12, 40(sp)
sd x13, 48(sp)
sd x14, 56(sp)
sd x15, 64(sp)
sd x16, 72(sp)
sd x17, 80(sp)
sd x28, 88(sp)
sd x29, 96(sp)
sd x30, 104(sp)
sd x31, 112(sp)
// setup randomx_riscv64_vector_sshash_dataset_init's parameters
// x10 = pointer to pointer to cache memory
// pointer to cache memory was saved in "sd x11, 32(sp)", so x10 = sp + 32
addi x10, sp, 32
// x11 = output buffer (64 bytes)
addi x11, sp, 128
// x12 = start block
lla x5, randomx_riscv64_vector_program_main_loop_light_mode_data
ld x12, 8(x5)
add x12, x12, x6
srli x12, x12, 6
// x13 = end block
addi x13, x12, 1
ld x5, 0(x5)
jalr x1, 0(x5)
// restore registers
ld x1, 0(sp)
ld x7, 16(sp)
ld x10, 24(sp)
ld x11, 32(sp)
ld x12, 40(sp)
ld x13, 48(sp)
ld x14, 56(sp)
ld x15, 64(sp)
ld x16, 72(sp)
ld x17, 80(sp)
ld x28, 88(sp)
ld x29, 96(sp)
ld x30, 104(sp)
ld x31, 112(sp)
// read a 64-byte line from dataset and XOR it with r0-r7
ld x5, 128(sp)
xor x20, x20, x5
ld x5, 136(sp)
xor x21, x21, x5
ld x5, 144(sp)
xor x22, x22, x5
ld x5, 152(sp)
xor x23, x23, x5
ld x5, 160(sp)
xor x24, x24, x5
ld x5, 168(sp)
xor x25, x25, x5
ld x5, 176(sp)
xor x26, x26, x5
ld x5, 184(sp)
xor x27, x27, x5
addi sp, sp, 192
j randomx_riscv64_vector_program_scratchpad_prefetch
DECL(randomx_riscv64_vector_program_main_loop_fe_mix_v1):
vxor.vv v0, v0, v4
vxor.vv v1, v1, v5
vxor.vv v2, v2, v6
vxor.vv v3, v3, v7
j randomx_riscv64_vector_program_main_loop_fe_store
/*
aesenc middle round
x5 = pointer to aesenc LUT
v16 = input and return value
*/
.macro aesenc_soft input, key
vsetivli zero, 16, e8, m1, ta, ma
vrgather.vv v28, \input, v20
vrgather.vv v29, \input, v21
vrgather.vv v30, \input, v22
vrgather.vv v31, \input, v23
vsetivli zero, 4, e32, m1, ta, ma
vsll.vi v28, v28, 2
vsll.vi v29, v29, 2
vsll.vi v30, v30, 2
vsll.vi v31, v31, 2
addi x6, x5, -2048
vluxei32.v v28, (x6), v28
addi x6, x5, -1024
vluxei32.v v29, (x6), v29
vluxei32.v v30, (x5), v30
addi x6, x5, 1024
vluxei32.v v31, (x6), v31
vxor.vv v28, v28, v29
vxor.vv v30, v30, v31
vxor.vv \input, v28, v30
vxor.vv \input, \input, \key
.endm
/*
aesdec middle round
x5 = pointer to aesdec LUT
v16 = input and return value
*/
.macro aesdec_soft input, key
vsetivli zero, 16, e8, m1, ta, ma
vrgather.vv v28, \input, v24
vrgather.vv v29, \input, v25
vrgather.vv v30, \input, v26
vrgather.vv v31, \input, v27
vsetivli zero, 4, e32, m1, ta, ma
vsll.vi v28, v28, 2
vsll.vi v29, v29, 2
vsll.vi v30, v30, 2
vsll.vi v31, v31, 2
addi x6, x5, -2048
vluxei32.v v28, (x6), v28
addi x6, x5, -1024
vluxei32.v v29, (x6), v29
vluxei32.v v30, (x5), v30
addi x6, x5, 1024
vluxei32.v v31, (x6), v31
vxor.vv v28, v28, v29
vxor.vv v30, v30, v31
vxor.vv \input, v28, v30
vxor.vv \input, \input, \key
.endm
DECL(randomx_riscv64_vector_program_main_loop_fe_mix_v2_soft_aes):
// save x5
vmv.s.x v16, x5
lla x5, randomx_aes_lut_enc_ptr
ld x5, (x5)
// f0 = aesenc(f0, e0), f0 = aesenc(f0, e1), f0 = aesenc(f0, e2), f0 = aesenc(f0, e3)
aesenc_soft v0, v4
aesenc_soft v0, v5
aesenc_soft v0, v6
aesenc_soft v0, v7
// f2 = aesenc(f2, e0), f2 = aesenc(f2, e1), f2 = aesenc(f2, e2), f2 = aesenc(f2, e3)
aesenc_soft v2, v4
aesenc_soft v2, v5
aesenc_soft v2, v6
aesenc_soft v2, v7
lla x5, randomx_aes_lut_dec_ptr
ld x5, (x5)
// f1 = aesdec(f1, e0), f1 = aesdec(f1, e1), f1 = aesdec(f1, e2), f1 = aesdec(f1, e3)
aesdec_soft v1, v4
aesdec_soft v1, v5
aesdec_soft v1, v6
aesdec_soft v1, v7
// f3 = aesdec(f3, e0), f3 = aesdec(f3, e1), f3 = aesdec(f3, e2), f3 = aesdec(f3, e3)
aesdec_soft v3, v4
aesdec_soft v3, v5
aesdec_soft v3, v6
aesdec_soft v3, v7
// Set vector registers back to 2x64 bit
vsetivli zero, 2, e64, m1, ta, ma
// restore x5
vmv.x.s x5, v16
j randomx_riscv64_vector_program_main_loop_fe_store
DECL(randomx_riscv64_vector_program_end):
DECL(randomx_riscv64_vector_code_end):

View File

@@ -42,6 +42,8 @@ extern "C" {
struct randomx_cache;
void randomx_riscv64_vector_code_begin();
void randomx_riscv64_vector_sshash_begin();
void randomx_riscv64_vector_sshash_imul_rcp_literals();
void randomx_riscv64_vector_sshash_dataset_init(struct randomx_cache* cache, uint8_t* output_buf, uint32_t startBlock, uint32_t endBlock);
@@ -50,9 +52,29 @@ void randomx_riscv64_vector_sshash_generated_instructions();
void randomx_riscv64_vector_sshash_generated_instructions_end();
void randomx_riscv64_vector_sshash_cache_prefetch();
void randomx_riscv64_vector_sshash_xor();
void randomx_riscv64_vector_sshash_set_cache_index();
void randomx_riscv64_vector_sshash_end();
void randomx_riscv64_vector_program_params();
void randomx_riscv64_vector_program_imul_rcp_literals();
void randomx_riscv64_vector_program_begin();
void randomx_riscv64_vector_program_v2_soft_aes_init();
void randomx_riscv64_vector_program_main_loop();
void randomx_riscv64_vector_program_main_loop_instructions();
void randomx_riscv64_vector_program_main_loop_instructions_end();
void randomx_riscv64_vector_program_main_loop_mx_xor();
void randomx_riscv64_vector_program_main_loop_spaddr_xor();
void randomx_riscv64_vector_program_main_loop_fe_mix();
void randomx_riscv64_vector_program_main_loop_light_mode_data();
void randomx_riscv64_vector_program_main_loop_instructions_end_light_mode();
void randomx_riscv64_vector_program_main_loop_mx_xor_light_mode();
void randomx_riscv64_vector_program_end();
void randomx_riscv64_vector_program_scratchpad_prefetch();
void randomx_riscv64_vector_program_main_loop_fe_mix_v1();
void randomx_riscv64_vector_program_main_loop_fe_mix_v2_soft_aes();
void randomx_riscv64_vector_code_end();
#if defined(__cplusplus)
}
#endif

View File

@@ -41,6 +41,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include "crypto/randomx/reciprocal.h"
#include "crypto/randomx/superscalar.hpp"
#include "crypto/randomx/virtual_memory.hpp"
#include "crypto/randomx/soft_aes.h"
#include "crypto/rx/Profiler.h"
#ifdef XMRIG_FIX_RYZEN
@@ -116,6 +117,7 @@ namespace randomx {
#define codeLoopLoadXOP ADDR(randomx_program_loop_load_xop)
#define codeProgramStart ADDR(randomx_program_start)
#define codeReadDataset ADDR(randomx_program_read_dataset)
#define codeReadDatasetV2 ADDR(randomx_program_read_dataset_v2)
#define codeReadDatasetLightSshInit ADDR(randomx_program_read_dataset_sshash_init)
#define codeReadDatasetLightSshFin ADDR(randomx_program_read_dataset_sshash_fin)
#define codeDatasetInit ADDR(randomx_dataset_init)
@@ -125,6 +127,8 @@ namespace randomx {
#define codeDatasetInitAVX2SshLoad ADDR(randomx_dataset_init_avx2_ssh_load)
#define codeDatasetInitAVX2SshPrefetch ADDR(randomx_dataset_init_avx2_ssh_prefetch)
#define codeLoopStore ADDR(randomx_program_loop_store)
#define codeLoopStoreHardAES ADDR(randomx_program_loop_store_hard_aes)
#define codeLoopStoreSoftAES ADDR(randomx_program_loop_store_soft_aes)
#define codeLoopEnd ADDR(randomx_program_loop_end)
#define codeEpilogue ADDR(randomx_program_epilogue)
#define codeProgramEnd ADDR(randomx_program_end)
@@ -136,10 +140,13 @@ namespace randomx {
#define prologueSize (codeLoopBegin - codePrologue)
#define loopLoadSize (codeLoopLoadXOP - codeLoopLoad)
#define loopLoadXOPSize (codeProgramStart - codeLoopLoadXOP)
#define readDatasetSize (codeReadDatasetLightSshInit - codeReadDataset)
#define readDatasetSize (codeReadDatasetV2 - codeReadDataset)
#define readDatasetV2Size (codeReadDatasetLightSshInit - codeReadDatasetV2)
#define readDatasetLightInitSize (codeReadDatasetLightSshFin - codeReadDatasetLightSshInit)
#define readDatasetLightFinSize (codeLoopStore - codeReadDatasetLightSshFin)
#define loopStoreSize (codeLoopEnd - codeLoopStore)
#define loopStoreSize (codeLoopStoreHardAES - codeLoopStore)
#define loopStoreHardAESSize (codeLoopStoreSoftAES - codeLoopStoreHardAES)
#define loopStoreSoftAESSize (codeLoopEnd - codeLoopStoreSoftAES)
#define datasetInitSize (codeDatasetInitAVX2Prologue - codeDatasetInit)
#define datasetInitAVX2PrologueSize (codeDatasetInitAVX2LoopEnd - codeDatasetInitAVX2Prologue)
#define datasetInitAVX2LoopEndSize (codeDatasetInitAVX2Epilogue - codeDatasetInitAVX2LoopEnd)
@@ -223,6 +230,8 @@ namespace randomx {
JitCompilerX86::JitCompilerX86(bool hugePagesEnable, bool optimizedInitDatasetEnable) {
BranchesWithin32B = xmrig::Cpu::info()->jccErratum();
hasAES = xmrig::Cpu::info()->hasAES();
hasAVX = xmrig::Cpu::info()->hasAVX();
hasAVX2 = xmrig::Cpu::info()->hasAVX2();
@@ -341,7 +350,14 @@ namespace randomx {
vm_flags = flags;
generateProgramPrologue(prog, pcfg);
emit(codeReadDataset, readDatasetSize, code, codePos);
if (RandomX_CurrentConfig.Tweak_V2_PREFETCH) {
emit(codeReadDatasetV2, readDatasetV2Size, code, codePos);
}
else {
emit(codeReadDataset, readDatasetSize, code, codePos);
}
generateProgramEpilogue(prog, pcfg);
}
@@ -424,8 +440,15 @@ namespace randomx {
void JitCompilerX86::generateProgramPrologue(Program& prog, ProgramConfiguration& pcfg) {
codePos = ADDR(randomx_program_prologue_first_load) - ADDR(randomx_program_prologue);
*(uint32_t*)(code + codePos + 4) = RandomX_CurrentConfig.ScratchpadL3Mask64_Calculated;
*(uint32_t*)(code + codePos + 14) = RandomX_CurrentConfig.ScratchpadL3Mask64_Calculated;
if (RandomX_CurrentConfig.Tweak_V2_AES && !hasAES) {
*(uint64_t*)(code + codePos + 9) = reinterpret_cast<uint64_t>(lutEnc);
*(uint64_t*)(code + codePos + 27) = reinterpret_cast<uint64_t>(lutDec);
}
*(uint32_t*)(code + codePos + 47) = RandomX_CurrentConfig.ScratchpadL3Mask64_Calculated;
*(uint32_t*)(code + codePos + 57) = RandomX_CurrentConfig.ScratchpadL3Mask64_Calculated;
if (hasAVX) {
uint32_t* p = (uint32_t*)(code + codePos + 61);
*p = (*p & 0xFF000000U) | 0x0077F8C5U; // vzeroupper
@@ -476,8 +499,21 @@ namespace randomx {
*(uint64_t*)(code + codePos) = 0xc03349c08b49ull + (static_cast<uint64_t>(pcfg.readReg0) << 16) + (static_cast<uint64_t>(pcfg.readReg1) << 40);
codePos += 6;
emit(RandomX_CurrentConfig.codePrefetchScratchpadTweaked, RandomX_CurrentConfig.codePrefetchScratchpadTweakedSize, code, codePos);
memcpy(code + codePos, codeLoopStore, loopStoreSize);
codePos += loopStoreSize;
if (RandomX_CurrentConfig.Tweak_V2_AES) {
if (hasAES) {
memcpy(code + codePos, codeLoopStoreHardAES, loopStoreHardAESSize);
codePos += loopStoreHardAESSize;
}
else {
memcpy(code + codePos, codeLoopStoreSoftAES, loopStoreSoftAESSize);
codePos += loopStoreSoftAESSize;
}
}
else {
memcpy(code + codePos, codeLoopStore, loopStoreSize);
codePos += loopStoreSize;
}
if (BranchesWithin32B) {
const uint32_t branch_begin = static_cast<uint32_t>(codePos);
@@ -1307,7 +1343,7 @@ namespace randomx {
uint8_t* const p = code;
int32_t t = prevCFROUND;
if (t > prevFPOperation) {
if ((t > prevFPOperation) && !RandomX_CurrentConfig.Tweak_V2_CFROUND) {
if (vm_flags & RANDOMX_FLAG_AMD) {
memcpy(p + t, NOP26, 26);
}
@@ -1326,14 +1362,38 @@ namespace randomx {
*(uint32_t*)(p + pos + 3) = 0x00C8C148 + (rotate << 24);
if (vm_flags & RANDOMX_FLAG_AMD) {
*(uint64_t*)(p + pos + 7) = 0x742024443B0CE083ULL;
*(uint64_t*)(p + pos + 15) = 0x8900EB0414AE0F0AULL;
*(uint32_t*)(p + pos + 23) = 0x202444;
pos += 26;
if (RandomX_CurrentConfig.Tweak_V2_CFROUND) {
*(uint32_t*)(p + pos + 7) = 0x1375F0A8;
pos += 11;
}
else {
pos += 7;
}
*(uint64_t*)(p + pos) = 0x742024443B0CE083ULL;
*(uint64_t*)(p + pos + 8) = 0x8900EB0414AE0F0AULL;
*(uint32_t*)(p + pos + 16) = 0x202444;
pos += 19;
}
else {
*(uint64_t*)(p + pos + 7) = 0x0414AE0F0CE083ULL;
pos += 14;
pos += 7;
if (RandomX_CurrentConfig.Tweak_V2_CFROUND) {
if (BranchesWithin32B) {
const uint32_t branch_begin = pos & 31;
// If the jump crosses or touches 32-byte boundary, align it
if (branch_begin >= 28) {
const uint32_t alignment_size = 32 - branch_begin;
emit(NOPX[alignment_size - 1], alignment_size, code, pos);
}
}
*(uint32_t*)(p + pos) = 0x0775F0A8;
pos += 4;
}
*(uint64_t*)(p + pos) = 0x0414AE0F0CE083ULL;
pos += 7;
}
codePos = pos;
@@ -1343,7 +1403,7 @@ namespace randomx {
uint8_t* const p = code;
int32_t t = prevCFROUND;
if (t > prevFPOperation) {
if ((t > prevFPOperation) && !RandomX_CurrentConfig.Tweak_V2_CFROUND){
if (vm_flags & RANDOMX_FLAG_AMD) {
memcpy(p + t, NOP25, 25);
}
@@ -1361,14 +1421,38 @@ namespace randomx {
*(uint64_t*)(p + pos) = 0xC0F0FBC3C4ULL | (src << 32) | (rotate << 40);
if (vm_flags & RANDOMX_FLAG_AMD) {
*(uint64_t*)(p + pos + 6) = 0x742024443B0CE083ULL;
*(uint64_t*)(p + pos + 14) = 0x8900EB0414AE0F0AULL;
*(uint32_t*)(p + pos + 22) = 0x202444;
pos += 25;
if (RandomX_CurrentConfig.Tweak_V2_CFROUND) {
*(uint32_t*)(p + pos + 6) = 0x1375F0A8;
pos += 10;
}
else {
pos += 6;
}
*(uint64_t*)(p + pos) = 0x742024443B0CE083ULL;
*(uint64_t*)(p + pos + 8) = 0x8900EB0414AE0F0AULL;
*(uint32_t*)(p + pos + 16) = 0x202444;
pos += 19;
}
else {
*(uint64_t*)(p + pos + 6) = 0x0414AE0F0CE083ULL;
pos += 13;
pos += 6;
if (RandomX_CurrentConfig.Tweak_V2_CFROUND) {
if (BranchesWithin32B) {
const uint32_t branch_begin = pos & 31;
// If the jump crosses or touches 32-byte boundary, align it
if (branch_begin >= 28) {
const uint32_t alignment_size = 32 - branch_begin;
emit(NOPX[alignment_size - 1], alignment_size, code, pos);
}
}
*(uint32_t*)(p + pos) = 0x0775F0A8;
pos += 4;
}
*(uint64_t*)(p + pos) = 0x0414AE0F0CE083ULL;
pos += 7;
}
codePos = pos;

View File

@@ -97,6 +97,7 @@ namespace randomx {
# endif
bool BranchesWithin32B = false;
bool hasAES;
bool hasAVX;
bool hasAVX2;
bool initDatasetAVX2;

View File

@@ -48,9 +48,12 @@
.global DECL(randomx_program_loop_load_xop)
.global DECL(randomx_program_start)
.global DECL(randomx_program_read_dataset)
.global DECL(randomx_program_read_dataset_v2)
.global DECL(randomx_program_read_dataset_sshash_init)
.global DECL(randomx_program_read_dataset_sshash_fin)
.global DECL(randomx_program_loop_store)
.global DECL(randomx_program_loop_store_hard_aes)
.global DECL(randomx_program_loop_store_soft_aes)
.global DECL(randomx_program_loop_end)
.global DECL(randomx_dataset_init)
.global DECL(randomx_dataset_init_avx2_prologue)
@@ -101,19 +104,23 @@ DECL(randomx_program_prologue):
movapd xmm15, xmmword ptr [scaleMask+rip]
DECL(randomx_program_prologue_first_load):
sub rsp, 248
mov rdx, 0x1111111111111111
mov [rsp+232], rdx ;# aes_lut_enc
mov rdx, 0x1111111111111111
mov [rsp+240], rdx ;# aes_lut_dec
mov rdx, rax
and eax, RANDOMX_SCRATCHPAD_MASK
ror rdx, 32
and edx, RANDOMX_SCRATCHPAD_MASK
sub rsp, 40
nop
nop
nop
mov dword ptr [rsp], 0x9FC0
mov dword ptr [rsp+4], 0xBFC0
mov dword ptr [rsp+8], 0xDFC0
mov dword ptr [rsp+12], 0xFFC0
mov dword ptr [rsp+32], -1
nop
nop
nop
jmp DECL(randomx_program_imul_rcp_store)
.balign 64
@@ -139,6 +146,9 @@ DECL(randomx_program_start):
DECL(randomx_program_read_dataset):
#include "asm/program_read_dataset.inc"
DECL(randomx_program_read_dataset_v2):
#include "asm/program_read_dataset_v2.inc"
DECL(randomx_program_read_dataset_sshash_init):
#include "asm/program_read_dataset_sshash_init.inc"
@@ -148,6 +158,12 @@ DECL(randomx_program_read_dataset_sshash_fin):
DECL(randomx_program_loop_store):
#include "asm/program_loop_store.inc"
DECL(randomx_program_loop_store_hard_aes):
#include "asm/program_loop_store_hard_aes.inc"
DECL(randomx_program_loop_store_soft_aes):
#include "asm/program_loop_store_soft_aes.inc"
DECL(randomx_program_loop_end):
nop

View File

@@ -39,6 +39,7 @@ PUBLIC randomx_program_loop_load
PUBLIC randomx_program_loop_load_xop
PUBLIC randomx_program_start
PUBLIC randomx_program_read_dataset
PUBLIC randomx_program_read_dataset_v2
PUBLIC randomx_program_read_dataset_sshash_init
PUBLIC randomx_program_read_dataset_sshash_fin
PUBLIC randomx_dataset_init
@@ -48,6 +49,8 @@ PUBLIC randomx_dataset_init_avx2_epilogue
PUBLIC randomx_dataset_init_avx2_ssh_load
PUBLIC randomx_dataset_init_avx2_ssh_prefetch
PUBLIC randomx_program_loop_store
PUBLIC randomx_program_loop_store_hard_aes
PUBLIC randomx_program_loop_store_soft_aes
PUBLIC randomx_program_loop_end
PUBLIC randomx_program_epilogue
PUBLIC randomx_sshash_load
@@ -90,19 +93,23 @@ randomx_program_prologue PROC
randomx_program_prologue ENDP
randomx_program_prologue_first_load PROC
sub rsp, 248
mov rdx, 01111111111111111h
mov [rsp+232], rdx ;# aes_lut_enc
mov rdx, 01111111111111111h
mov [rsp+240], rdx ;# aes_lut_dec
mov rdx, rax
and eax, RANDOMX_SCRATCHPAD_MASK
ror rdx, 32
and edx, RANDOMX_SCRATCHPAD_MASK
sub rsp, 40
nop
nop
nop
mov dword ptr [rsp], 9FC0h
mov dword ptr [rsp+4], 0BFC0h
mov dword ptr [rsp+8], 0DFC0h
mov dword ptr [rsp+12], 0FFC0h
mov dword ptr [rsp+32], -1
nop
nop
nop
jmp randomx_program_imul_rcp_store
randomx_program_prologue_first_load ENDP
@@ -135,6 +142,10 @@ randomx_program_read_dataset PROC
include asm/program_read_dataset.inc
randomx_program_read_dataset ENDP
randomx_program_read_dataset_v2 PROC
include asm/program_read_dataset_v2.inc
randomx_program_read_dataset_v2 ENDP
randomx_program_read_dataset_sshash_init PROC
include asm/program_read_dataset_sshash_init.inc
randomx_program_read_dataset_sshash_init ENDP
@@ -147,6 +158,14 @@ randomx_program_loop_store PROC
include asm/program_loop_store.inc
randomx_program_loop_store ENDP
randomx_program_loop_store_hard_aes PROC
include asm/program_loop_store_hard_aes.inc
randomx_program_loop_store_hard_aes ENDP
randomx_program_loop_store_soft_aes PROC
include asm/program_loop_store_soft_aes.inc
randomx_program_loop_store_soft_aes ENDP
randomx_program_loop_end PROC
nop
randomx_program_loop_end ENDP

View File

@@ -40,9 +40,12 @@ extern "C" {
void randomx_program_loop_load_xop();
void randomx_program_start();
void randomx_program_read_dataset();
void randomx_program_read_dataset_v2();
void randomx_program_read_dataset_sshash_init();
void randomx_program_read_dataset_sshash_fin();
void randomx_program_loop_store();
void randomx_program_loop_store_hard_aes();
void randomx_program_loop_store_soft_aes();
void randomx_program_loop_end();
void randomx_dataset_init();
void randomx_dataset_init_avx2_prologue();

View File

@@ -50,6 +50,17 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include <cassert>
#include "crypto/rx/Profiler.h"
#include "base/net/stratum/Job.h"
RandomX_ConfigurationMoneroV2::RandomX_ConfigurationMoneroV2()
{
ProgramSize = 384;
Tweak_V2_CFROUND = 1;
Tweak_V2_AES = 1;
Tweak_V2_PREFETCH = 1;
Tweak_V2_COMMITMENT = 1;
}
RandomX_ConfigurationWownero::RandomX_ConfigurationWownero()
{
@@ -150,6 +161,10 @@ RandomX_ConfigurationBase::RandomX_ConfigurationBase()
, RANDOMX_FREQ_CFROUND(1)
, RANDOMX_FREQ_ISTORE(16)
, RANDOMX_FREQ_NOP(0)
, Tweak_V2_CFROUND(0)
, Tweak_V2_AES(0)
, Tweak_V2_PREFETCH(0)
, Tweak_V2_COMMITMENT(0)
{
fillAes4Rx4_Key[0] = rx_set_int_vec_i128(0x99e5d23f, 0x2f546d2b, 0xd1833ddb, 0x6421aadd);
fillAes4Rx4_Key[1] = rx_set_int_vec_i128(0xa5dfcde5, 0x06f79d53, 0xb6913f55, 0xb20e3450);
@@ -282,7 +297,10 @@ typedef void(randomx::JitCompilerX86::* InstructionGeneratorX86_2)(const randomx
Log2_ScratchpadL2 = Log2(ScratchpadL2_Size);
Log2_ScratchpadL3 = Log2(ScratchpadL3_Size);
#define JIT_HANDLE(x, prev) randomx::JitCompilerRV64::engine[k] = &randomx::JitCompilerRV64::v1_##x
#define JIT_HANDLE(x, prev) do { \
randomx::JitCompilerRV64::engine[k] = &randomx::JitCompilerRV64::v1_##x; \
randomx::JitCompilerRV64::inst_map[k] = static_cast<uint8_t>(randomx::InstructionType::x); \
} while (0)
#else
#define JIT_HANDLE(x, prev)
@@ -364,6 +382,7 @@ typedef void(randomx::JitCompilerX86::* InstructionGeneratorX86_2)(const randomx
}
RandomX_ConfigurationMonero RandomX_MoneroConfig;
RandomX_ConfigurationMoneroV2 RandomX_MoneroConfigV2;
RandomX_ConfigurationWownero RandomX_WowneroConfig;
RandomX_ConfigurationArqma RandomX_ArqmaConfig;
RandomX_ConfigurationGraft RandomX_GraftConfig;
@@ -611,4 +630,11 @@ extern "C" {
machine->hashAndFill(output, tempHash);
}
void randomx_calculate_commitment(const void* input, size_t inputSize, const void* hash_in, void* com_out) {
uint8_t buf[xmrig::Job::kMaxBlobSize + RANDOMX_HASH_SIZE];
memcpy(buf, input, inputSize);
memcpy(buf + inputSize, hash_in, RANDOMX_HASH_SIZE);
rx_blake2b_wrapper::run(com_out, RANDOMX_HASH_SIZE, buf, inputSize + RANDOMX_HASH_SIZE);
}
}

View File

@@ -125,6 +125,11 @@ struct RandomX_ConfigurationBase
rx_vec_i128 fillAes4Rx4_Key[8];
uint32_t Tweak_V2_CFROUND : 1;
uint32_t Tweak_V2_AES : 1;
uint32_t Tweak_V2_PREFETCH : 1;
uint32_t Tweak_V2_COMMITMENT : 1;
uint8_t codeSshPrefetchTweaked[20];
uint8_t codePrefetchScratchpadTweaked[28];
uint32_t codePrefetchScratchpadTweakedSize;
@@ -143,6 +148,7 @@ struct RandomX_ConfigurationBase
};
struct RandomX_ConfigurationMonero : public RandomX_ConfigurationBase {};
struct RandomX_ConfigurationMoneroV2 : public RandomX_ConfigurationBase { RandomX_ConfigurationMoneroV2(); };
struct RandomX_ConfigurationWownero : public RandomX_ConfigurationBase { RandomX_ConfigurationWownero(); };
struct RandomX_ConfigurationArqma : public RandomX_ConfigurationBase { RandomX_ConfigurationArqma(); };
struct RandomX_ConfigurationGraft : public RandomX_ConfigurationBase { RandomX_ConfigurationGraft(); };
@@ -150,6 +156,7 @@ struct RandomX_ConfigurationSafex : public RandomX_ConfigurationBase { RandomX_C
struct RandomX_ConfigurationYada : public RandomX_ConfigurationBase { RandomX_ConfigurationYada(); };
extern RandomX_ConfigurationMonero RandomX_MoneroConfig;
extern RandomX_ConfigurationMoneroV2 RandomX_MoneroConfigV2;
extern RandomX_ConfigurationWownero RandomX_WowneroConfig;
extern RandomX_ConfigurationArqma RandomX_ArqmaConfig;
extern RandomX_ConfigurationGraft RandomX_GraftConfig;
@@ -231,7 +238,7 @@ RANDOMX_EXPORT unsigned long randomx_dataset_item_count(void);
*
* @param dataset is a pointer to a previously allocated randomx_dataset structure. Must not be NULL.
* @param cache is a pointer to a previously allocated and initialized randomx_cache structure. Must not be NULL.
* @param startItem is the item number where intialization should start.
* @param startItem is the item number where initialization should start.
* @param itemCount is the number of items that should be initialized.
*/
RANDOMX_EXPORT void randomx_init_dataset(randomx_dataset *dataset, randomx_cache *cache, unsigned long startItem, unsigned long itemCount);
@@ -318,6 +325,17 @@ RANDOMX_EXPORT void randomx_calculate_hash(randomx_vm *machine, const void *inpu
RANDOMX_EXPORT void randomx_calculate_hash_first(randomx_vm* machine, uint64_t (&tempHash)[8], const void* input, size_t inputSize);
RANDOMX_EXPORT void randomx_calculate_hash_next(randomx_vm* machine, uint64_t (&tempHash)[8], const void* nextInput, size_t nextInputSize, void* output);
/**
* Calculate a RandomX commitment from a RandomX hash and its input.
*
* @param input is a pointer to memory that was hashed. Must not be NULL.
* @param inputSize is the number of bytes in the input.
* @param hash_in is the output from randomx_calculate_hash* (RANDOMX_HASH_SIZE bytes).
* @param com_out is a pointer to memory where the commitment will be stored. Must not
* be NULL and at least RANDOMX_HASH_SIZE bytes must be available for writing.
*/
RANDOMX_EXPORT void randomx_calculate_commitment(const void* input, size_t inputSize, const void* hash_in, void* com_out);
#if defined(__cplusplus)
}
#endif

View File

@@ -29,15 +29,8 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include "crypto/randomx/soft_aes.h"
alignas(64) uint32_t lutEnc0[256];
alignas(64) uint32_t lutEnc1[256];
alignas(64) uint32_t lutEnc2[256];
alignas(64) uint32_t lutEnc3[256];
alignas(64) uint32_t lutDec0[256];
alignas(64) uint32_t lutDec1[256];
alignas(64) uint32_t lutDec2[256];
alignas(64) uint32_t lutDec3[256];
alignas(64) uint32_t lutEnc[4][256];
alignas(64) uint32_t lutDec[4][256];
alignas(64) uint8_t lutEncIndex[4][32];
alignas(64) uint8_t lutDecIndex[4][32];
@@ -102,10 +95,10 @@ static struct SAESInitializer
p[2] = s;
p[3] = mul_gf2(s, 3);
lutEnc0[i] = w; w = (w << 8) | (w >> 24);
lutEnc1[i] = w; w = (w << 8) | (w >> 24);
lutEnc2[i] = w; w = (w << 8) | (w >> 24);
lutEnc3[i] = w;
lutEnc[0][i] = w; w = (w << 8) | (w >> 24);
lutEnc[1][i] = w; w = (w << 8) | (w >> 24);
lutEnc[2][i] = w; w = (w << 8) | (w >> 24);
lutEnc[3][i] = w;
s = sbox_reverse[i];
p[0] = mul_gf2(s, 0xe);
@@ -113,10 +106,10 @@ static struct SAESInitializer
p[2] = mul_gf2(s, 0xd);
p[3] = mul_gf2(s, 0xb);
lutDec0[i] = w; w = (w << 8) | (w >> 24);
lutDec1[i] = w; w = (w << 8) | (w >> 24);
lutDec2[i] = w; w = (w << 8) | (w >> 24);
lutDec3[i] = w;
lutDec[0][i] = w; w = (w << 8) | (w >> 24);
lutDec[1][i] = w; w = (w << 8) | (w >> 24);
lutDec[2][i] = w; w = (w << 8) | (w >> 24);
lutDec[3][i] = w;
}
memset(lutEncIndex, -1, sizeof(lutEncIndex));

View File

@@ -32,14 +32,8 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include <stdint.h>
#include "crypto/randomx/intrin_portable.h"
extern uint32_t lutEnc0[256];
extern uint32_t lutEnc1[256];
extern uint32_t lutEnc2[256];
extern uint32_t lutEnc3[256];
extern uint32_t lutDec0[256];
extern uint32_t lutDec1[256];
extern uint32_t lutDec2[256];
extern uint32_t lutDec3[256];
extern uint32_t lutEnc[4][256];
extern uint32_t lutDec[4][256];
extern uint8_t lutEncIndex[4][32];
extern uint8_t lutDecIndex[4][32];
@@ -52,25 +46,25 @@ FORCE_INLINE rx_vec_i128 aesenc<1>(rx_vec_i128 in, rx_vec_i128 key) {
volatile uint8_t s[16];
memcpy((void*) s, &in, 16);
uint32_t s0 = lutEnc0[s[ 0]];
uint32_t s1 = lutEnc0[s[ 4]];
uint32_t s2 = lutEnc0[s[ 8]];
uint32_t s3 = lutEnc0[s[12]];
uint32_t s0 = lutEnc[0][s[ 0]];
uint32_t s1 = lutEnc[0][s[ 4]];
uint32_t s2 = lutEnc[0][s[ 8]];
uint32_t s3 = lutEnc[0][s[12]];
s0 ^= lutEnc1[s[ 5]];
s1 ^= lutEnc1[s[ 9]];
s2 ^= lutEnc1[s[13]];
s3 ^= lutEnc1[s[ 1]];
s0 ^= lutEnc[1][s[ 5]];
s1 ^= lutEnc[1][s[ 9]];
s2 ^= lutEnc[1][s[13]];
s3 ^= lutEnc[1][s[ 1]];
s0 ^= lutEnc2[s[10]];
s1 ^= lutEnc2[s[14]];
s2 ^= lutEnc2[s[ 2]];
s3 ^= lutEnc2[s[ 6]];
s0 ^= lutEnc[2][s[10]];
s1 ^= lutEnc[2][s[14]];
s2 ^= lutEnc[2][s[ 2]];
s3 ^= lutEnc[2][s[ 6]];
s0 ^= lutEnc3[s[15]];
s1 ^= lutEnc3[s[ 3]];
s2 ^= lutEnc3[s[ 7]];
s3 ^= lutEnc3[s[11]];
s0 ^= lutEnc[3][s[15]];
s1 ^= lutEnc[3][s[ 3]];
s2 ^= lutEnc[3][s[ 7]];
s3 ^= lutEnc[3][s[11]];
return rx_xor_vec_i128(rx_set_int_vec_i128(s3, s2, s1, s0), key);
}
@@ -80,25 +74,25 @@ FORCE_INLINE rx_vec_i128 aesdec<1>(rx_vec_i128 in, rx_vec_i128 key) {
volatile uint8_t s[16];
memcpy((void*) s, &in, 16);
uint32_t s0 = lutDec0[s[ 0]];
uint32_t s1 = lutDec0[s[ 4]];
uint32_t s2 = lutDec0[s[ 8]];
uint32_t s3 = lutDec0[s[12]];
uint32_t s0 = lutDec[0][s[ 0]];
uint32_t s1 = lutDec[0][s[ 4]];
uint32_t s2 = lutDec[0][s[ 8]];
uint32_t s3 = lutDec[0][s[12]];
s0 ^= lutDec1[s[13]];
s1 ^= lutDec1[s[ 1]];
s2 ^= lutDec1[s[ 5]];
s3 ^= lutDec1[s[ 9]];
s0 ^= lutDec[1][s[13]];
s1 ^= lutDec[1][s[ 1]];
s2 ^= lutDec[1][s[ 5]];
s3 ^= lutDec[1][s[ 9]];
s0 ^= lutDec2[s[10]];
s1 ^= lutDec2[s[14]];
s2 ^= lutDec2[s[ 2]];
s3 ^= lutDec2[s[ 6]];
s0 ^= lutDec[2][s[10]];
s1 ^= lutDec[2][s[14]];
s2 ^= lutDec[2][s[ 2]];
s3 ^= lutDec[2][s[ 6]];
s0 ^= lutDec3[s[ 7]];
s1 ^= lutDec3[s[11]];
s2 ^= lutDec3[s[15]];
s3 ^= lutDec3[s[ 3]];
s0 ^= lutDec[3][s[ 7]];
s1 ^= lutDec[3][s[11]];
s2 ^= lutDec[3][s[15]];
s3 ^= lutDec[3][s[ 3]];
return rx_xor_vec_i128(rx_set_int_vec_i128(s3, s2, s1, s0), key);
}
@@ -113,10 +107,10 @@ FORCE_INLINE rx_vec_i128 aesenc<2>(rx_vec_i128 in, rx_vec_i128 key) {
s3 = rx_vec_i128_x(in);
rx_vec_i128 out = rx_set_int_vec_i128(
(lutEnc0[s0 & 0xff] ^ lutEnc1[(s3 >> 8) & 0xff] ^ lutEnc2[(s2 >> 16) & 0xff] ^ lutEnc3[s1 >> 24]),
(lutEnc0[s1 & 0xff] ^ lutEnc1[(s0 >> 8) & 0xff] ^ lutEnc2[(s3 >> 16) & 0xff] ^ lutEnc3[s2 >> 24]),
(lutEnc0[s2 & 0xff] ^ lutEnc1[(s1 >> 8) & 0xff] ^ lutEnc2[(s0 >> 16) & 0xff] ^ lutEnc3[s3 >> 24]),
(lutEnc0[s3 & 0xff] ^ lutEnc1[(s2 >> 8) & 0xff] ^ lutEnc2[(s1 >> 16) & 0xff] ^ lutEnc3[s0 >> 24])
(lutEnc[0][s0 & 0xff] ^ lutEnc[1][(s3 >> 8) & 0xff] ^ lutEnc[2][(s2 >> 16) & 0xff] ^ lutEnc[3][s1 >> 24]),
(lutEnc[0][s1 & 0xff] ^ lutEnc[1][(s0 >> 8) & 0xff] ^ lutEnc[2][(s3 >> 16) & 0xff] ^ lutEnc[3][s2 >> 24]),
(lutEnc[0][s2 & 0xff] ^ lutEnc[1][(s1 >> 8) & 0xff] ^ lutEnc[2][(s0 >> 16) & 0xff] ^ lutEnc[3][s3 >> 24]),
(lutEnc[0][s3 & 0xff] ^ lutEnc[1][(s2 >> 8) & 0xff] ^ lutEnc[2][(s1 >> 16) & 0xff] ^ lutEnc[3][s0 >> 24])
);
return rx_xor_vec_i128(out, key);
@@ -132,10 +126,10 @@ FORCE_INLINE rx_vec_i128 aesdec<2>(rx_vec_i128 in, rx_vec_i128 key) {
s3 = rx_vec_i128_x(in);
rx_vec_i128 out = rx_set_int_vec_i128(
(lutDec0[s0 & 0xff] ^ lutDec1[(s1 >> 8) & 0xff] ^ lutDec2[(s2 >> 16) & 0xff] ^ lutDec3[s3 >> 24]),
(lutDec0[s1 & 0xff] ^ lutDec1[(s2 >> 8) & 0xff] ^ lutDec2[(s3 >> 16) & 0xff] ^ lutDec3[s0 >> 24]),
(lutDec0[s2 & 0xff] ^ lutDec1[(s3 >> 8) & 0xff] ^ lutDec2[(s0 >> 16) & 0xff] ^ lutDec3[s1 >> 24]),
(lutDec0[s3 & 0xff] ^ lutDec1[(s0 >> 8) & 0xff] ^ lutDec2[(s1 >> 16) & 0xff] ^ lutDec3[s2 >> 24])
(lutDec[0][s0 & 0xff] ^ lutDec[1][(s1 >> 8) & 0xff] ^ lutDec[2][(s2 >> 16) & 0xff] ^ lutDec[3][s3 >> 24]),
(lutDec[0][s1 & 0xff] ^ lutDec[1][(s2 >> 8) & 0xff] ^ lutDec[2][(s3 >> 16) & 0xff] ^ lutDec[3][s0 >> 24]),
(lutDec[0][s2 & 0xff] ^ lutDec[1][(s3 >> 8) & 0xff] ^ lutDec[2][(s0 >> 16) & 0xff] ^ lutDec[3][s1 >> 24]),
(lutDec[0][s3 & 0xff] ^ lutDec[1][(s0 >> 8) & 0xff] ^ lutDec[2][(s1 >> 16) & 0xff] ^ lutDec[3][s2 >> 24])
);
return rx_xor_vec_i128(out, key);
@@ -150,32 +144,3 @@ template<>
FORCE_INLINE rx_vec_i128 aesdec<0>(rx_vec_i128 in, rx_vec_i128 key) {
return rx_aesdec_vec_i128(in, key);
}
#if defined(XMRIG_RISCV) && defined(XMRIG_RVV_ENABLED)
#include <riscv_vector.h>
FORCE_INLINE vuint32m1_t softaes_vector_double(
vuint32m1_t in,
vuint32m1_t key,
vuint8m1_t i0, vuint8m1_t i1, vuint8m1_t i2, vuint8m1_t i3,
const uint32_t* lut0, const uint32_t* lut1, const uint32_t *lut2, const uint32_t* lut3)
{
const vuint8m1_t in8 = __riscv_vreinterpret_v_u32m1_u8m1(in);
const vuint32m1_t index0 = __riscv_vreinterpret_v_u8m1_u32m1(__riscv_vrgather_vv_u8m1(in8, i0, 32));
const vuint32m1_t index1 = __riscv_vreinterpret_v_u8m1_u32m1(__riscv_vrgather_vv_u8m1(in8, i1, 32));
const vuint32m1_t index2 = __riscv_vreinterpret_v_u8m1_u32m1(__riscv_vrgather_vv_u8m1(in8, i2, 32));
const vuint32m1_t index3 = __riscv_vreinterpret_v_u8m1_u32m1(__riscv_vrgather_vv_u8m1(in8, i3, 32));
vuint32m1_t s0 = __riscv_vluxei32_v_u32m1(lut0, __riscv_vsll_vx_u32m1(index0, 2, 8), 8);
vuint32m1_t s1 = __riscv_vluxei32_v_u32m1(lut1, __riscv_vsll_vx_u32m1(index1, 2, 8), 8);
vuint32m1_t s2 = __riscv_vluxei32_v_u32m1(lut2, __riscv_vsll_vx_u32m1(index2, 2, 8), 8);
vuint32m1_t s3 = __riscv_vluxei32_v_u32m1(lut3, __riscv_vsll_vx_u32m1(index3, 2, 8), 8);
s0 = __riscv_vxor_vv_u32m1(s0, s1, 8);
s2 = __riscv_vxor_vv_u32m1(s2, s3, 8);
s0 = __riscv_vxor_vv_u32m1(s0, s2, 8);
return __riscv_vxor_vv_u32m1(s0, key, 8);
}
#endif // defined(XMRIG_RISCV) && defined(XMRIG_RVV_ENABLED)

View File

@@ -1,12 +1,10 @@
/* RISC-V - test if the vector extension and prefetch instruction are present */
/* RISC-V - test if the vector extension is present */
.text
.option arch, rv64gcv_zicbop
.option arch, rv64gcv
.global main
main:
lla x5, main
prefetch.r (x5)
li x5, 4
vsetvli x6, x5, e64, m1, ta, ma
vxor.vv v0, v0, v0

View File

@@ -0,0 +1,11 @@
/* RISC-V - test if the prefetch instruction is present */
.text
.option arch, rv64gc_zicbop
.global main
main:
lla x5, main
prefetch.r (x5)
mv x10, x0
ret

View File

@@ -0,0 +1,13 @@
/* RISC-V - test if the vector bit manipulation extension is present */
.text
.option arch, rv64gcv_zvkb
.global main
main:
vsetivli zero, 8, e32, m1, ta, ma
vror.vv v0, v0, v0
vror.vx v0, v0, x5
vror.vi v0, v0, 1
li x10, 0
ret

View File

@@ -0,0 +1,12 @@
/* RISC-V - test if the vector bit manipulation extension is present */
.text
.option arch, rv64gcv_zvkned
.global main
main:
vsetivli zero, 8, e32, m1, ta, ma
vaesem.vv v0, v0
vaesdm.vv v0, v0
li x10, 0
ret

View File

@@ -58,9 +58,20 @@ namespace randomx {
void CompiledVm<softAes>::execute() {
PROFILE_SCOPE(RandomX_JIT_execute);
# ifdef XMRIG_ARM
# if defined(XMRIG_ARM) || defined(XMRIG_RISCV)
memcpy(reg.f, config.eMask, sizeof(config.eMask));
# endif
const uint8_t* p = mem.memory;
// dataset prefetch for the first iteration of the main loop
rx_prefetch_nta(p + (mem.ma & (RandomX_ConfigurationBase::DatasetBaseSize - 64)));
// dataset prefetch for the second iteration of the main loop (RandomX v2)
if (RandomX_CurrentConfig.Tweak_V2_PREFETCH) {
rx_prefetch_nta(p + (mem.mx & (RandomX_ConfigurationBase::DatasetBaseSize - 64)));
}
compiler.getProgramFunc()(reg, mem, scratchpad, RandomX_CurrentConfig.ProgramIterations);
}

View File

@@ -77,10 +77,13 @@ namespace randomx {
executeBytecode(bytecode, scratchpad, config);
mem.mx ^= nreg.r[config.readReg2] ^ nreg.r[config.readReg3];
mem.mx &= CacheLineAlignMask;
datasetPrefetch(datasetOffset + mem.mx);
datasetRead(datasetOffset + mem.ma, nreg.r);
const uint64_t readPtr = datasetOffset + (mem.ma & CacheLineAlignMask);
auto& mp = RandomX_CurrentConfig.Tweak_V2_PREFETCH ? mem.ma : mem.mx;
mp ^= nreg.r[config.readReg2] ^ nreg.r[config.readReg3];
datasetPrefetch(datasetOffset + (mp & CacheLineAlignMask));
datasetRead(readPtr, nreg.r);
std::swap(mem.mx, mem.ma);
for (unsigned i = 0; i < RegistersCount; ++i)

View File

@@ -32,6 +32,9 @@ xmrig::Algorithm::Id xmrig::RxAlgo::apply(Algorithm::Id algorithm)
const RandomX_ConfigurationBase *xmrig::RxAlgo::base(Algorithm::Id algorithm)
{
switch (algorithm) {
case Algorithm::RX_V2:
return &RandomX_MoneroConfigV2;
case Algorithm::RX_WOW:
return &RandomX_WowneroConfig;

View File

@@ -17,6 +17,7 @@
*/
#include "crypto/rx/RxConfig.h"
#include "crypto/randomx/randomx.h"
#include "3rdparty/rapidjson/document.h"
#include "backend/cpu/Cpu.h"
#include "base/io/json/Json.h"
@@ -25,6 +26,7 @@
#include <array>
#include <algorithm>
#include <cmath>
#include <uv.h>
#ifdef _MSC_VER
@@ -183,11 +185,20 @@ rapidjson::Value xmrig::RxConfig::toJSON(rapidjson::Document &doc) const
#ifdef XMRIG_FEATURE_HWLOC
std::vector<uint32_t> xmrig::RxConfig::nodeset() const
{
auto info = Cpu::info();
constexpr uint64_t dataset_mem = RandomX_ConfigurationBase::DatasetBaseSize + RandomX_ConfigurationBase::DatasetExtraSize;
constexpr uint64_t cache_mem = RandomX_ConfigurationBase::ArgonMemory * 1024;
const uint64_t threads_mem = info->threads() << 21;
const uint64_t freem_mem = uv_get_free_memory();
if (!m_nodeset.empty()) {
return m_nodeset;
return (freem_mem > m_nodeset.size() * dataset_mem + cache_mem + threads_mem) ? m_nodeset : std::vector<uint32_t>();
}
return (m_numa && Cpu::info()->nodes() > 1) ? Cpu::info()->nodeset() : std::vector<uint32_t>();
const uint64_t n = info->nodes();
return (m_numa && (n > 1) && (freem_mem > n * dataset_mem + cache_mem + threads_mem)) ? Cpu::info()->nodeset() : std::vector<uint32_t>();
}
#endif

View File

@@ -47,8 +47,8 @@ public:
inline RxSeed(const Algorithm &algorithm, const Buffer &seed) : m_algorithm(algorithm), m_data(seed) {}
inline RxSeed(const Job &job) : m_algorithm(job.algorithm()), m_data(job.seed()) {}
inline bool isEqual(const Job &job) const { return m_algorithm == job.algorithm() && m_data == job.seed(); }
inline bool isEqual(const RxSeed &other) const { return m_algorithm == other.m_algorithm && m_data == other.m_data; }
inline bool isEqual(const Job &job) const { return isEqualSeedAlgo(job.algorithm()) && m_data == job.seed(); }
inline bool isEqual(const RxSeed &other) const { return isEqualSeedAlgo(other.m_algorithm) && m_data == other.m_data; }
inline const Algorithm &algorithm() const { return m_algorithm; }
inline const Buffer &data() const { return m_data; }
@@ -60,6 +60,12 @@ public:
private:
Algorithm m_algorithm;
Buffer m_data;
inline bool isEqualSeedAlgo(Algorithm other) const {
return (m_algorithm == other) ||
((m_algorithm == Algorithm::RX_0) && (other == Algorithm::RX_V2)) ||
((m_algorithm == Algorithm::RX_V2) && (other == Algorithm::RX_0));
}
};

View File

@@ -34,6 +34,8 @@
#include "base/tools/String.h"
#include "base/net/stratum/Job.h"
#include "crypto/randomx/randomx.h"
namespace xmrig {
@@ -43,7 +45,7 @@ class JobResult
public:
JobResult() = delete;
inline JobResult(const Job &job, uint64_t nonce, const uint8_t *result, const uint8_t* header_hash = nullptr, const uint8_t *mix_hash = nullptr, const uint8_t* miner_signature = nullptr) :
inline JobResult(const Job &job, uint64_t nonce, const uint8_t *result, const uint8_t* header_hash = nullptr, const uint8_t *mix_hash = nullptr, const uint8_t* extra_data = nullptr) :
algorithm(job.algorithm()),
index(job.index()),
clientId(job.clientId()),
@@ -62,9 +64,15 @@ public:
memcpy(m_mixHash, mix_hash, sizeof(m_mixHash));
}
if (miner_signature) {
m_hasMinerSignature = true;
memcpy(m_minerSignature, miner_signature, sizeof(m_minerSignature));
if (extra_data) {
if (algorithm == Algorithm::RX_V2) {
m_hasCommitment = true;
memcpy(m_extraData, extra_data, RANDOMX_HASH_SIZE);
}
else if (algorithm == Algorithm::RX_WOW) {
m_hasMinerSignature = true;
memcpy(m_extraData, extra_data, RANDOMX_HASH_SIZE * 2);
}
}
}
@@ -85,7 +93,8 @@ public:
inline const uint8_t *headerHash() const { return m_headerHash; }
inline const uint8_t *mixHash() const { return m_mixHash; }
inline const uint8_t *minerSignature() const { return m_hasMinerSignature ? m_minerSignature : nullptr; }
inline const uint8_t *minerSignature() const { return m_hasMinerSignature ? m_extraData : nullptr; }
inline const uint8_t *commitment() const { return m_hasCommitment ? m_extraData : nullptr; }
const Algorithm algorithm;
const uint8_t index;
@@ -100,8 +109,10 @@ private:
uint8_t m_headerHash[32] = { 0 };
uint8_t m_mixHash[32] = { 0 };
uint8_t m_minerSignature[64] = { 0 };
uint8_t m_extraData[RANDOMX_HASH_SIZE * 2] = { 0 };
bool m_hasMinerSignature = false;
bool m_hasCommitment = false;
};

View File

@@ -339,9 +339,9 @@ void xmrig::JobResults::submit(const Job &job, uint32_t nonce, const uint8_t *re
}
void xmrig::JobResults::submit(const Job& job, uint32_t nonce, const uint8_t* result, const uint8_t* miner_signature)
void xmrig::JobResults::submit(const Job& job, uint32_t nonce, const uint8_t* result, const uint8_t* extra_data)
{
submit(JobResult(job, nonce, result, nullptr, nullptr, miner_signature));
submit(JobResult(job, nonce, result, nullptr, nullptr, extra_data));
}

View File

@@ -46,7 +46,7 @@ public:
static void setListener(IJobResultListener *listener, bool hwAES);
static void stop();
static void submit(const Job &job, uint32_t nonce, const uint8_t *result);
static void submit(const Job& job, uint32_t nonce, const uint8_t* result, const uint8_t* miner_signature);
static void submit(const Job& job, uint32_t nonce, const uint8_t* result, const uint8_t* extra_data);
static void submit(const JobResult &result);
# if defined(XMRIG_FEATURE_OPENCL) || defined(XMRIG_FEATURE_CUDA)

View File

@@ -1,6 +1,6 @@
/* XMRig
* Copyright (c) 2018-2025 SChernykh <https://github.com/SChernykh>
* Copyright (c) 2016-2025 XMRig <https://github.com/xmrig>, <support@xmrig.com>
* Copyright (c) 2018-2026 SChernykh <https://github.com/SChernykh>
* Copyright (c) 2016-2026 XMRig <https://github.com/xmrig>, <support@xmrig.com>
*
* SPDX-License-Identifier: GPL-3.0-or-later
*/
@@ -11,14 +11,14 @@
#define APP_ID "xmrig"
#define APP_NAME "XMRig"
#define APP_DESC "XMRig miner"
#define APP_VERSION "6.25.0"
#define APP_VERSION "6.26.0-dev"
#define APP_DOMAIN "xmrig.com"
#define APP_SITE "www.xmrig.com"
#define APP_COPYRIGHT "Copyright (C) 2016-2025 xmrig.com"
#define APP_COPYRIGHT "Copyright (C) 2016-2026 xmrig.com"
#define APP_KIND "miner"
#define APP_VER_MAJOR 6
#define APP_VER_MINOR 25
#define APP_VER_MINOR 26
#define APP_VER_PATCH 0
#ifdef _MSC_VER