Update CHANGELOG.md

Merge pull request #3129 from SChernykh/dev
Fix: protectRX flushed CPU cache only on MacOS/iOS
2026-06-21 11:52:38 -04:00 · 2022-09-25 17:01:33 +07:00 · 2022-09-22 07:02:28 +07:00 · 2022-09-21 15:18:06 +02:00 · 2022-09-19 19:03:17 +07:00 · 2022-09-19 10:42:08 +02:00
590 changed files with 122962 additions and 21109 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,4 +1,6 @@
 /build
+scripts/build
+scripts/deps
 /CMakeLists.txt.user
 /.idea
 /src/backend/opencl/cl/cn/cryptonight_gen.cl
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,195 @@
+# v6.18.1
+- [#3129](https://github.com/xmrig/xmrig/pull/3129) Fix: protectRX flushed CPU cache only on MacOS/iOS.
+- [#3126](https://github.com/xmrig/xmrig/pull/3126) Don't reset when pool sends the same job blob.
+- [#3120](https://github.com/xmrig/xmrig/pull/3120) RandomX: optimized `CFROUND` elimination.
+- [#3109](https://github.com/xmrig/xmrig/pull/3109) RandomX: added Blake2 AVX2 version.
+- [#3082](https://github.com/xmrig/xmrig/pull/3082) Fixed GCC 12 warnings.
+- [#3075](https://github.com/xmrig/xmrig/pull/3075) Recognize `armv7ve` as valid ARMv7 target.
+
+# v6.18.0
+- [#3067](https://github.com/xmrig/xmrig/pull/3067) Monero v15 network upgrade support and more house keeping.
+  - Removed deprecated AstroBWTv1 and v2.
+  - Fixed debug GhostRider build.
+  - Monero v15 network upgrade support.
+  - Fixed ZMQ debug log.
+  - Improved daemon ZMQ mining stability.
+- [#3054](https://github.com/xmrig/xmrig/pull/3054) Fixes for 32-bit ARM.
+- [#3042](https://github.com/xmrig/xmrig/pull/3042) Fixed being unable to resume from `pause-on-battery`.
+- [#3031](https://github.com/xmrig/xmrig/pull/3031) Fixed `--cpu-priority` not working sometimes.
+- [#3020](https://github.com/xmrig/xmrig/pull/3020) Removed old AstroBWT algorithm.
+
+# v6.17.0
+- [#2954](https://github.com/xmrig/xmrig/pull/2954) **Dero HE fork support (`astrobwt/v2` algorithm).**
+  - [#2961](https://github.com/xmrig/xmrig/pull/2961) Dero HE (`astrobwt/v2`) CUDA config generator.
+  - [#2969](https://github.com/xmrig/xmrig/pull/2969) Dero HE (`astrobwt/v2`) OpenCL support.
+- Fixed displayed DMI memory information for empty slots.
+- [#2932](https://github.com/xmrig/xmrig/pull/2932) Fixed GhostRider with hwloc disabled.
+
+# v6.16.4
+- [#2904](https://github.com/xmrig/xmrig/pull/2904) Fixed unaligned memory accesses.
+- [#2908](https://github.com/xmrig/xmrig/pull/2908) Added MSVC/2022 to `version.h`.
+- [#2910](https://github.com/xmrig/xmrig/issues/2910) Fixed donation for GhostRider/RTM.
+
+# v6.16.3
+- [#2778](https://github.com/xmrig/xmrig/pull/2778) Fixed `READY threads X/X` display after algorithm switching.
+- [#2782](https://github.com/xmrig/xmrig/pull/2782) Updated GhostRider documentation.
+- [#2815](https://github.com/xmrig/xmrig/pull/2815) Fixed `cn-heavy` in 32-bit builds.
+- [#2827](https://github.com/xmrig/xmrig/pull/2827) GhostRider: set correct priority for helper threads.
+- [#2837](https://github.com/xmrig/xmrig/pull/2837) RandomX: don't restart mining threads when the seed changes.
+- [#2848](https://github.com/xmrig/xmrig/pull/2848) GhostRider: added support for `client.reconnect` method.
+- [#2856](https://github.com/xmrig/xmrig/pull/2856) Fix for short responses from some Raptoreum pools.
+- [#2873](https://github.com/xmrig/xmrig/pull/2873) Fixed GhostRider benchmark on single-core systems.
+- [#2882](https://github.com/xmrig/xmrig/pull/2882) Fixed ARMv7 compilation.
+- [#2893](https://github.com/xmrig/xmrig/pull/2893) KawPow OpenCL: use separate UV loop for building programs.
+
+# v6.16.2
+- [#2751](https://github.com/xmrig/xmrig/pull/2751) Fixed crash on CPUs supporting VAES and running GCC-compiled xmrig.
+- [#2761](https://github.com/xmrig/xmrig/pull/2761) Fixed broken auto-tuning in GCC Windows build.
+- [#2771](https://github.com/xmrig/xmrig/issues/2771) Fixed environment variables support for GhostRider and KawPow. 
+- [#2769](https://github.com/xmrig/xmrig/pull/2769) Performance fixes:
+  - Fixed several performance bottlenecks introduced in v6.16.1.
+  - Fixed overall GCC-compiled build performance, it's the same speed as MSVC build now.
+  - **Linux builds are up to 10% faster now compared to v6.16.0 GCC build.**
+  - **Windows builds are up to 5% faster now compared to v6.16.0 MSVC build.**
+
+# v6.16.1
+- [#2729](https://github.com/xmrig/xmrig/pull/2729) GhostRider fixes:
+  - Added average hashrate display.
+  - Fixed the number of threads shown at startup.
+  - Fixed `--threads` or `-t` command line option (but `--cpu-max-threads-hint` is recommended to use).
+- [#2738](https://github.com/xmrig/xmrig/pull/2738) GhostRider fixes:
+  - Fixed "difficulty is not a number" error when diff is high on some pools.
+  - Fixed GhostRider compilation when `WITH_KAWPOW=OFF`.
+- [#2740](https://github.com/xmrig/xmrig/pull/2740) Added VAES support for Cryptonight variants **+4% speedup on Zen3**.
+  - VAES instructions are available on Intel Ice Lake/AMD Zen3 and newer CPUs.
+  - +4% speedup on Ryzen 5 5600X.
+
+# v6.16.0
+- [#2712](https://github.com/xmrig/xmrig/pull/2712) **GhostRider algorithm (Raptoreum) support**: read the [RELEASE NOTES](src/crypto/ghostrider/README.md) for quick start guide and performance comparisons.
+- [#2682](https://github.com/xmrig/xmrig/pull/2682) Fixed: use cn-heavy optimization only for Vermeer CPUs.
+- [#2684](https://github.com/xmrig/xmrig/pull/2684) MSR mod: fix for error 183.
+
+# v6.15.3
+- [#2614](https://github.com/xmrig/xmrig/pull/2614) OpenCL fixes for non-AMD platforms.
+- [#2623](https://github.com/xmrig/xmrig/pull/2623) Fixed compiling without kawpow.
+- [#2636](https://github.com/xmrig/xmrig/pull/2636) [#2639](https://github.com/xmrig/xmrig/pull/2639) AstroBWT speedup (up to +35%).
+- [#2646](https://github.com/xmrig/xmrig/pull/2646) Fixed MSVC compilation error.
+
+# v6.15.2
+- [#2606](https://github.com/xmrig/xmrig/pull/2606) Fixed: AstroBWT auto-config ignored `max-threads-hint`.
+- Fixed possible crash on Windows (regression in v6.15.1).
+
+# v6.15.1
+- [#2586](https://github.com/xmrig/xmrig/pull/2586) Fixed Windows 7 compatibility.
+- [#2594](https://github.com/xmrig/xmrig/pull/2594) Added Windows taskbar icon colors.
+
+# v6.15.0
+- [#2548](https://github.com/xmrig/xmrig/pull/2548) Added automatic coin detection for daemon mining.
+- [#2563](https://github.com/xmrig/xmrig/pull/2563) Added new algorithm RandomX Graft (`rx/graft`).
+- [#2565](https://github.com/xmrig/xmrig/pull/2565) AstroBWT: added AVX2 Salsa20 implementation.
+- Added support for new CUDA plugin API (previous API still supported).
+
+# v6.14.1
+- [#2532](https://github.com/xmrig/xmrig/pull/2532) Refactoring: stable (persistent) algorithms IDs.
+- [#2537](https://github.com/xmrig/xmrig/pull/2537) Fixed Termux build.
+
+# v6.14.0
+- [#2484](https://github.com/xmrig/xmrig/pull/2484) Added ZeroMQ support for solo mining.
+- [#2476](https://github.com/xmrig/xmrig/issues/2476) Fixed crash in DMI memory reader.
+- [#2492](https://github.com/xmrig/xmrig/issues/2492) Added missing `--huge-pages-jit` command line option.
+- [#2512](https://github.com/xmrig/xmrig/pull/2512) Added show the number of transactions in pool job.
+
+# v6.13.1
+- [#2468](https://github.com/xmrig/xmrig/pull/2468) Fixed regression in previous version: don't send miner signature during regular mining.
+
+# v6.13.0
+- [#2445](https://github.com/xmrig/xmrig/pull/2445) Added support for solo mining with miner signatures for the upcoming Wownero fork.
+
+# v6.12.2
+- [#2280](https://github.com/xmrig/xmrig/issues/2280) GPU backends are now disabled in benchmark mode.
+- [#2322](https://github.com/xmrig/xmrig/pull/2322) Improved MSR compatibility with recent Linux kernels and updated `randomx_boost.sh`.
+- [#2340](https://github.com/xmrig/xmrig/pull/2340) Fixed AES detection on FreeBSD on ARM.
+- [#2341](https://github.com/xmrig/xmrig/pull/2341) `sse2neon` updated to the latest version.
+- [#2351](https://github.com/xmrig/xmrig/issues/2351) Fixed help output for `--cpu-priority` and `--cpu-affinity` option.
+- [#2375](https://github.com/xmrig/xmrig/pull/2375) Fixed macOS CUDA backend default loader name.
+- [#2378](https://github.com/xmrig/xmrig/pull/2378) Fixed broken light mode mining on x86.
+- [#2379](https://github.com/xmrig/xmrig/pull/2379) Fixed CL code for KawPow where it assumes everything is AMD.
+- [#2386](https://github.com/xmrig/xmrig/pull/2386) RandomX: enabled `IMUL_RCP` optimization for light mode mining.
+- [#2393](https://github.com/xmrig/xmrig/pull/2393) RandomX: added BMI2 version for scratchpad prefetch.
+- [#2395](https://github.com/xmrig/xmrig/pull/2395) RandomX: rewrote dataset read code.
+- [#2398](https://github.com/xmrig/xmrig/pull/2398) RandomX: optimized ARMv8 dataset read.
+- Added `argon2/ninja` alias for `argon2/wrkz` algorithm.
+
+# v6.12.1
+- [#2296](https://github.com/xmrig/xmrig/pull/2296) Fixed Zen3 assembly code for `cn/upx2` algorithm.
+
+# v6.12.0
+- [#2276](https://github.com/xmrig/xmrig/pull/2276) Added support for Uplexa (`cn/upx2` algorithm).
+- [#2261](https://github.com/xmrig/xmrig/pull/2261) Show total hashrate if compiled without OpenCL.
+- [#2289](https://github.com/xmrig/xmrig/pull/2289) RandomX: optimized `IMUL_RCP` instruction.
+- Added support for `--user` command line option for online benchmark.
+
+# v6.11.2
+- [#2207](https://github.com/xmrig/xmrig/issues/2207) Fixed regression in HTTP parser and llhttp updated to v5.1.0.
+
+# v6.11.1
+- [#2239](https://github.com/xmrig/xmrig/pull/2239) Fixed broken `coin` setting functionality.
+
+# v6.11.0
+- [#2196](https://github.com/xmrig/xmrig/pull/2196) Improved DNS subsystem and added new DNS specific options.
+- [#2172](https://github.com/xmrig/xmrig/pull/2172) Fixed build on Alpine 3.13.
+- [#2177](https://github.com/xmrig/xmrig/pull/2177) Fixed ARM specific compilation error with GCC 10.2.
+- [#2214](https://github.com/xmrig/xmrig/pull/2214) [#2216](https://github.com/xmrig/xmrig/pull/2216) [#2235](https://github.com/xmrig/xmrig/pull/2235) Optimized `cn-heavy` algorithm.
+- [#2217](https://github.com/xmrig/xmrig/pull/2217) Fixed mining job creation sequence.
+- [#2225](https://github.com/xmrig/xmrig/pull/2225) Fixed build without OpenCL support on some systems.
+- [#2229](https://github.com/xmrig/xmrig/pull/2229) Don't use RandomX JIT if `WITH_ASM=OFF`.
+- [#2228](https://github.com/xmrig/xmrig/pull/2228) Removed useless code for cryptonight algorithms.
+- [#2234](https://github.com/xmrig/xmrig/pull/2234) Fixed build error on gcc 4.8.
+
+# v6.10.0
+- [#2122](https://github.com/xmrig/xmrig/pull/2122) Fixed pause logic when both pause on battery and user activity are enabled.
+- [#2123](https://github.com/xmrig/xmrig/issues/2123) Fixed compatibility with gcc 4.8.
+- [#2147](https://github.com/xmrig/xmrig/pull/2147) Fixed many `new job` messages when solo mining.
+- [#2150](https://github.com/xmrig/xmrig/pull/2150) Updated `sse2neon.h` to the latest master, fixes build on ARMv7.
+- [#2157](https://github.com/xmrig/xmrig/pull/2157) Fixed crash in `cn-heavy` on Zen3 with manual thread count.
+- Fixed possible out of order write to log file.
+- [http-parser](https://github.com/nodejs/http-parser) replaced to [llhttp](https://github.com/nodejs/llhttp).
+- For official builds: libuv, hwloc and OpenSSL updated to latest versions.
+
+# v6.9.0
+- [#2104](https://github.com/xmrig/xmrig/pull/2104) Added [pause-on-active](https://xmrig.com/docs/miner/config/misc#pause-on-active) config option and `--pause-on-active=N` command line option.
+- [#2112](https://github.com/xmrig/xmrig/pull/2112) Added support for [Tari merge mining](https://github.com/tari-project/tari/blob/development/README.md#tari-merge-mining).
+- [#2117](https://github.com/xmrig/xmrig/pull/2117) Fixed crash when GPU mining `cn-heavy` on Zen3 system.
+
+# v6.8.2
+- [#2080](https://github.com/xmrig/xmrig/pull/2080) Fixed compile error in Termux.
+- [#2089](https://github.com/xmrig/xmrig/pull/2089) Optimized CryptoNight-Heavy for Zen3, 7-8% speedup.
+
+# v6.8.1
+- [#2064](https://github.com/xmrig/xmrig/pull/2064) Added documentation for config.json CPU options.
+- [#2066](https://github.com/xmrig/xmrig/issues/2066) Fixed AMD GPUs health data readings on Linux.
+- [#2067](https://github.com/xmrig/xmrig/pull/2067) Fixed compilation error when RandomX and Argon2 are disabled.
+- [#2076](https://github.com/xmrig/xmrig/pull/2076) Added support for flexible huge page sizes on Linux.
+- [#2077](https://github.com/xmrig/xmrig/pull/2077) Fixed `illegal instruction` crash on ARM.
+
+# v6.8.0
+- [#2052](https://github.com/xmrig/xmrig/pull/2052) Added DMI/SMBIOS reader.
+  - Added information about memory modules on the miner startup and for online benchmark.
+  - Added new HTTP API endpoint: `GET /2/dmi`.
+  - Added new command line option `--no-dmi` or config option `"dmi"`.
+  - Added new CMake option `-DWITH_DMI=OFF`.
+- [#2057](https://github.com/xmrig/xmrig/pull/2057) Improved MSR subsystem code quality.
+- [#2058](https://github.com/xmrig/xmrig/pull/2058) RandomX JIT x86: removed unnecessary instructions.
+
+# v6.7.2
+- [#2039](https://github.com/xmrig/xmrig/pull/2039) Fixed solo mining.
+
+# v6.7.1
+- [#1995](https://github.com/xmrig/xmrig/issues/1995) Fixed log initialization.
+- [#1998](https://github.com/xmrig/xmrig/pull/1998) Added hashrate in the benchmark finished message.
+- [#2009](https://github.com/xmrig/xmrig/pull/2009) AstroBWT OpenCL fixes.
+- [#2028](https://github.com/xmrig/xmrig/pull/2028) RandomX x86 JIT: removed redundant `CFROUND`.
+
 # v6.7.0
 - **[#1991](https://github.com/xmrig/xmrig/issues/1991) Added Apple M1 processor support.**
 - **[#1986](https://github.com/xmrig/xmrig/pull/1986) Up to 20-30% faster RandomX dataset initialization with AVX2 on some CPUs.**
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -5,10 +5,11 @@ option(WITH_HWLOC           "Enable hwloc support" ON)
 option(WITH_CN_LITE         "Enable CryptoNight-Lite algorithms family" ON)
 option(WITH_CN_HEAVY        "Enable CryptoNight-Heavy algorithms family" ON)
 option(WITH_CN_PICO         "Enable CryptoNight-Pico algorithm" ON)
+option(WITH_CN_FEMTO        "Enable CryptoNight-UPX2 algorithm" ON)
 option(WITH_RANDOMX         "Enable RandomX algorithms family" ON)
 option(WITH_ARGON2          "Enable Argon2 algorithms family" ON)
-option(WITH_ASTROBWT        "Enable AstroBWT algorithms family" ON)
 option(WITH_KAWPOW          "Enable KawPow algorithms family" ON)
+option(WITH_GHOSTRIDER      "Enable GhostRider algorithm" ON)
 option(WITH_HTTP            "Enable HTTP protocol support (client/server)" ON)
 option(WITH_DEBUG_LOG       "Enable debug log output" OFF)
 option(WITH_TLS             "Enable OpenSSL support" ON)
@@ -17,6 +18,8 @@ option(WITH_MSR             "Enable MSR mod & 1st-gen Ryzen fix" ON)
 option(WITH_ENV_VARS        "Enable environment variables support in config file" ON)
 option(WITH_EMBEDDED_CONFIG "Enable internal embedded JSON config" OFF)
 option(WITH_OPENCL          "Enable OpenCL backend" ON)
+set(WITH_OPENCL_VERSION 200 CACHE STRING "Target OpenCL version")
+set_property(CACHE WITH_OPENCL_VERSION PROPERTY STRINGS 120 200 210 220)
 option(WITH_CUDA            "Enable CUDA backend" ON)
 option(WITH_NVML            "Enable NVML (NVIDIA Management Library) support (only if CUDA backend enabled)" ON)
 option(WITH_ADL             "Enable ADL (AMD Display Library) or sysfs support (only if OpenCL backend enabled)" ON)
@@ -24,8 +27,11 @@ option(WITH_STRICT_CACHE    "Enable strict checks for OpenCL cache" ON)
 option(WITH_INTERLEAVE_DEBUG_LOG "Enable debug log for threads interleave" OFF)
 option(WITH_PROFILING       "Enable profiling for developers" OFF)
 option(WITH_SSE4_1          "Enable SSE 4.1 for Blake2" ON)
+option(WITH_AVX2            "Enable AVX2 for Blake2" ON)
+option(WITH_VAES            "Enable VAES instructions for Cryptonight" ON)
 option(WITH_BENCHMARK       "Enable builtin RandomX benchmark and stress test" ON)
 option(WITH_SECURE_JIT      "Enable secure access to JIT memory" OFF)
+option(WITH_DMI             "Enable DMI/SMBIOS reader" ON)

 option(BUILD_STATIC         "Build static binary" OFF)
 option(ARM_TARGET           "Force use specific ARM target 8 or 7" 0)
@@ -54,6 +60,7 @@ set(HEADERS
    src/core/config/usage.h
    src/core/Controller.h
    src/core/Miner.h
+    src/core/Taskbar.h
    src/net/interfaces/IJobResultListener.h
    src/net/JobResult.h
    src/net/JobResults.h
@@ -102,6 +109,7 @@ set(SOURCES
    src/core/config/ConfigTransform.cpp
    src/core/Controller.cpp
    src/core/Miner.cpp
+    src/core/Taskbar.cpp
    src/net/JobResults.cpp
    src/net/Network.cpp
    src/net/strategies/DonateStrategy.cpp
@@ -122,6 +130,19 @@ set(SOURCES_CRYPTO
    src/crypto/common/VirtualMemory.cpp
   )

+if (CMAKE_C_COMPILER_ID MATCHES GNU)
+    set_source_files_properties(src/crypto/cn/CnHash.cpp PROPERTIES COMPILE_FLAGS "-Ofast -fno-tree-vectorize")
+endif()
+
+if (WITH_VAES)
+    add_definitions(-DXMRIG_VAES)
+    set(HEADERS_CRYPTO "${HEADERS_CRYPTO}" src/crypto/cn/CryptoNight_x86_vaes.h)
+    set(SOURCES_CRYPTO "${SOURCES_CRYPTO}" src/crypto/cn/CryptoNight_x86_vaes.cpp)
+    if (CMAKE_C_COMPILER_ID MATCHES GNU OR CMAKE_C_COMPILER_ID MATCHES Clang)
+        set_source_files_properties(src/crypto/cn/CryptoNight_x86_vaes.cpp PROPERTIES COMPILE_FLAGS "-Ofast -fno-tree-vectorize -mavx2 -mvaes")
+    endif()
+endif()
+
 if (WITH_HWLOC)
    list(APPEND HEADERS_CRYPTO
        src/crypto/common/NUMAMemoryPool.h
@@ -146,8 +167,10 @@ elseif (XMRIG_OS_APPLE)
        src/App_unix.cpp
        src/crypto/common/VirtualMemory_unix.cpp
        )
+
    find_library(IOKIT_LIBRARY IOKit)
-    set(EXTRA_LIBS ${IOKIT_LIBRARY})
+    find_library(CORESERVICES_LIBRARY CoreServices)
+    set(EXTRA_LIBS ${IOKIT_LIBRARY} ${CORESERVICES_LIBRARY})
 else()
    list(APPEND SOURCES_OS
        src/App_unix.cpp
@@ -169,15 +192,15 @@ else()
 endif()

 add_definitions(-DXMRIG_MINER_PROJECT -DXMRIG_JSON_SINGLE_LINE_ARRAY)
-add_definitions(-D__STDC_FORMAT_MACROS -DUNICODE)
+add_definitions(-D__STDC_FORMAT_MACROS -DUNICODE -D_FILE_OFFSET_BITS=64)

 find_package(UV REQUIRED)

 include(cmake/flags.cmake)
 include(cmake/randomx.cmake)
 include(cmake/argon2.cmake)
-include(cmake/astrobwt.cmake)
 include(cmake/kawpow.cmake)
+include(cmake/ghostrider.cmake)
 include(cmake/OpenSSL.cmake)
 include(cmake/asm.cmake)

@@ -193,10 +216,17 @@ if (WITH_CN_PICO)
    add_definitions(/DXMRIG_ALGO_CN_PICO)
 endif()

+if (WITH_CN_FEMTO)
+    add_definitions(/DXMRIG_ALGO_CN_FEMTO)
+endif()
+
 if (WITH_EMBEDDED_CONFIG)
    add_definitions(/DXMRIG_FEATURE_EMBEDDED_CONFIG)
 endif()

+include(src/hw/api/api.cmake)
+include(src/hw/dmi/dmi.cmake)
+
 include_directories(src)
 include_directories(src/3rdparty)
 include_directories(${UV_INCLUDE_DIR})
@@ -206,7 +236,7 @@ if (WITH_DEBUG_LOG)
 endif()

 add_executable(${CMAKE_PROJECT_NAME} ${HEADERS} ${SOURCES} ${SOURCES_OS} ${HEADERS_CRYPTO} ${SOURCES_CRYPTO} ${SOURCES_SYSLOG} ${TLS_SOURCES} ${XMRIG_ASM_SOURCES})
-target_link_libraries(${CMAKE_PROJECT_NAME} ${XMRIG_ASM_LIBRARY} ${OPENSSL_LIBRARIES} ${UV_LIBRARIES} ${EXTRA_LIBS} ${CPUID_LIB} ${ARGON2_LIBRARY} ${ETHASH_LIBRARY})
+target_link_libraries(${CMAKE_PROJECT_NAME} ${XMRIG_ASM_LIBRARY} ${OPENSSL_LIBRARIES} ${UV_LIBRARIES} ${EXTRA_LIBS} ${CPUID_LIB} ${ARGON2_LIBRARY} ${ETHASH_LIBRARY} ${GHOSTRIDER_LIBRARY})

 if (WIN32)
    add_custom_command(TARGET ${CMAKE_PROJECT_NAME} POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_SOURCE_DIR}/bin/WinRing0/WinRing0x64.sys" $<TARGET_FILE_DIR:${CMAKE_PROJECT_NAME}>)
@@ -214,6 +244,7 @@ if (WIN32)
    add_custom_command(TARGET ${CMAKE_PROJECT_NAME} POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_SOURCE_DIR}/scripts/benchmark_10M.cmd" $<TARGET_FILE_DIR:${CMAKE_PROJECT_NAME}>)
    add_custom_command(TARGET ${CMAKE_PROJECT_NAME} POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_SOURCE_DIR}/scripts/pool_mine_example.cmd" $<TARGET_FILE_DIR:${CMAKE_PROJECT_NAME}>)
    add_custom_command(TARGET ${CMAKE_PROJECT_NAME} POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_SOURCE_DIR}/scripts/solo_mine_example.cmd" $<TARGET_FILE_DIR:${CMAKE_PROJECT_NAME}>)
+    add_custom_command(TARGET ${CMAKE_PROJECT_NAME} POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_SOURCE_DIR}/scripts/rtm_ghostrider_example.cmd" $<TARGET_FILE_DIR:${CMAKE_PROJECT_NAME}>)
 endif()

 if (CMAKE_CXX_COMPILER_ID MATCHES Clang AND CMAKE_BUILD_TYPE STREQUAL Release AND NOT CMAKE_GENERATOR STREQUAL Xcode)
--- a/README.md
+++ b/README.md
@@ -7,10 +7,10 @@
 [![GitHub stars](https://img.shields.io/github/stars/xmrig/xmrig.svg)](https://github.com/xmrig/xmrig/stargazers)
 [![GitHub forks](https://img.shields.io/github/forks/xmrig/xmrig.svg)](https://github.com/xmrig/xmrig/network)

-XMRig is a high performance, open source, cross platform RandomX, KawPow, CryptoNight and AstroBWT unified CPU/GPU miner and [RandomX benchmark](https://xmrig.com/benchmark). Official binaries are available for Windows, Linux, macOS and FreeBSD.
+XMRig is a high performance, open source, cross platform RandomX, KawPow, CryptoNight and [GhostRider](https://github.com/xmrig/xmrig/tree/master/src/crypto/ghostrider#readme) unified CPU/GPU miner and [RandomX benchmark](https://xmrig.com/benchmark). Official binaries are available for Windows, Linux, macOS and FreeBSD.

 ## Mining backends
- **CPU** (x64/ARMv8)
+- **CPU** (x64/ARMv7/ARMv8)
 - **OpenCL** for AMD GPUs.
 - **CUDA** for NVIDIA GPUs via external [CUDA plugin](https://github.com/xmrig/xmrig-cuda).

@@ -19,7 +19,7 @@ XMRig is a high performance, open source, cross platform RandomX, KawPow, Crypto
 * **[Build from source](https://xmrig.com/docs/miner/build)**

 ## Usage
-The preferred way to configure the miner is the [JSON config file](src/config.json) as it is more flexible and human friendly. The [command line interface](https://xmrig.com/docs/miner/command-line-options) does not cover all features, such as mining profiles for different algorithms. Important options can be changed during runtime without miner restart by editing the config file or executing API calls.
+The preferred way to configure the miner is the [JSON config file](https://xmrig.com/docs/miner/config) as it is more flexible and human friendly. The [command line interface](https://xmrig.com/docs/miner/command-line-options) does not cover all features, such as mining profiles for different algorithms. Important options can be changed during runtime without miner restart by editing the config file or executing [API](https://xmrig.com/docs/miner/api) calls.

 * **[Wizard](https://xmrig.com/wizard)** helps you create initial configuration for the miner.
 * **[Workers](http://workers.xmrig.info)** helps manage your miners via HTTP API.
--- a/cmake/astrobwt.cmake
+++ b/cmake/astrobwt.cmake
@@ -1,45 +0,0 @@
-if (WITH_ASTROBWT)
-    add_definitions(/DXMRIG_ALGO_ASTROBWT)
-
-    list(APPEND HEADERS_CRYPTO
-        src/crypto/astrobwt/AstroBWT.h
-    )
-
-    list(APPEND SOURCES_CRYPTO
-        src/crypto/astrobwt/AstroBWT.cpp
-    )
-
-    if (XMRIG_ARM)
-        list(APPEND HEADERS_CRYPTO
-            src/crypto/astrobwt/salsa20_ref/ecrypt-config.h
-            src/crypto/astrobwt/salsa20_ref/ecrypt-machine.h
-            src/crypto/astrobwt/salsa20_ref/ecrypt-portable.h
-            src/crypto/astrobwt/salsa20_ref/ecrypt-sync.h
-        )
-
-        list(APPEND SOURCES_CRYPTO
-            src/crypto/astrobwt/salsa20_ref/salsa20.c
-        )
-    else()
-        if (CMAKE_SIZEOF_VOID_P EQUAL 8)
-            add_definitions(/DASTROBWT_AVX2)
-            if (CMAKE_C_COMPILER_ID MATCHES MSVC)
-                enable_language(ASM_MASM)
-                list(APPEND SOURCES_CRYPTO src/crypto/astrobwt/sha3_256_avx2.asm)
-            else()
-                enable_language(ASM)
-                list(APPEND SOURCES_CRYPTO src/crypto/astrobwt/sha3_256_avx2.S)
-            endif()
-        endif()
-
-        list(APPEND HEADERS_CRYPTO
-            src/crypto/astrobwt/Salsa20.hpp
-        )
-
-        list(APPEND SOURCES_CRYPTO
-            src/crypto/astrobwt/Salsa20.cpp
-        )
-    endif()
-else()
-    remove_definitions(/DXMRIG_ALGO_ASTROBWT)
-endif()
--- a/cmake/cpu.cmake
+++ b/cmake/cpu.cmake
@@ -1,47 +1,64 @@
+if (CMAKE_SIZEOF_VOID_P EQUAL 8)
+    set(XMRIG_64_BIT ON)
+    add_definitions(-DXMRIG_64_BIT)
+else()
+    set(XMRIG_64_BIT OFF)
+endif()
+
 if (NOT CMAKE_SYSTEM_PROCESSOR)
    message(WARNING "CMAKE_SYSTEM_PROCESSOR not defined")
 endif()

-if (CMAKE_SYSTEM_PROCESSOR MATCHES "^(x86_64|AMD64)$" AND CMAKE_SIZEOF_VOID_P EQUAL 8)
-    add_definitions(/DRAPIDJSON_SSE2)
+include(CheckCXXCompilerFlag)
+
+if (CMAKE_CXX_COMPILER_ID MATCHES MSVC)
+    set(VAES_SUPPORTED ON)
+else()
+    CHECK_CXX_COMPILER_FLAG("-mavx2 -mvaes" VAES_SUPPORTED)
+endif()
+
+if (NOT VAES_SUPPORTED)
+    set(WITH_VAES OFF)
+endif()
+
+if (XMRIG_64_BIT AND CMAKE_SYSTEM_PROCESSOR MATCHES "^(x86_64|AMD64)$")
+    add_definitions(-DRAPIDJSON_SSE2)
 else()
    set(WITH_SSE4_1 OFF)
+    set(WITH_AVX2 OFF)
+    set(WITH_VAES OFF)
 endif()

 if (NOT ARM_TARGET)
    if (CMAKE_SYSTEM_PROCESSOR MATCHES "^(aarch64|arm64|armv8-a)$")
        set(ARM_TARGET 8)
-    elseif (CMAKE_SYSTEM_PROCESSOR MATCHES "^(armv7|armv7f|armv7s|armv7k|armv7-a|armv7l)$")
+    elseif (CMAKE_SYSTEM_PROCESSOR MATCHES "^(armv7|armv7f|armv7s|armv7k|armv7-a|armv7l|armv7ve)$")
        set(ARM_TARGET 7)
    endif()
 endif()

 if (ARM_TARGET AND ARM_TARGET GREATER 6)
-    set(XMRIG_ARM     ON)
-    add_definitions(/DXMRIG_ARM)
+    set(XMRIG_ARM ON)
+    add_definitions(-DXMRIG_ARM=${ARM_TARGET})

    message(STATUS "Use ARM_TARGET=${ARM_TARGET} (${CMAKE_SYSTEM_PROCESSOR})")

-    include(CheckCXXCompilerFlag)
-
    if (ARM_TARGET EQUAL 8)
-        set(XMRIG_ARMv8 ON)
-        add_definitions(/DXMRIG_ARMv8)
-
        CHECK_CXX_COMPILER_FLAG(-march=armv8-a+crypto XMRIG_ARM_CRYPTO)

        if (XMRIG_ARM_CRYPTO)
-            add_definitions(/DXMRIG_ARM_CRYPTO)
+            add_definitions(-DXMRIG_ARM_CRYPTO)
            set(ARM8_CXX_FLAGS "-march=armv8-a+crypto")
        else()
            set(ARM8_CXX_FLAGS "-march=armv8-a")
        endif()
-    elseif (ARM_TARGET EQUAL 7)
-        set(XMRIG_ARMv7 ON)
-        add_definitions(/DXMRIG_ARMv7)
    endif()
 endif()

 if (WITH_SSE4_1)
-    add_definitions(/DXMRIG_FEATURE_SSE4_1)
+    add_definitions(-DXMRIG_FEATURE_SSE4_1)
+endif()
+
+if (WITH_AVX2)
+    add_definitions(-DXMRIG_FEATURE_AVX2)
 endif()
--- a/cmake/flags.cmake
+++ b/cmake/flags.cmake
@@ -22,12 +22,12 @@ if (CMAKE_CXX_COMPILER_ID MATCHES GNU)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -fexceptions -fno-rtti -Wno-strict-aliasing -Wno-class-memaccess")
    set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -Ofast -s")

-    if (XMRIG_ARMv8)
+    if (ARM_TARGET EQUAL 8)
        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${ARM8_CXX_FLAGS}")
        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ARM8_CXX_FLAGS} -flax-vector-conversions")
-    elseif (XMRIG_ARMv7)
-        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mfpu=neon")
-        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mfpu=neon -flax-vector-conversions")
+    elseif (ARM_TARGET EQUAL 7)
+        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -march=armv7-a -mfpu=neon")
+        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=armv7-a -mfpu=neon -flax-vector-conversions")
    else()
        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -maes")
        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -maes")
@@ -80,10 +80,10 @@ elseif (CMAKE_CXX_COMPILER_ID MATCHES Clang)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -fexceptions -fno-rtti -Wno-missing-braces")
    set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -Ofast -funroll-loops -fmerge-all-constants")

-    if (XMRIG_ARMv8)
+    if (ARM_TARGET EQUAL 8)
        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${ARM8_CXX_FLAGS}")
        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ARM8_CXX_FLAGS}")
-    elseif (XMRIG_ARMv7)
+    elseif (ARM_TARGET EQUAL 7)
        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mfpu=neon -march=${CMAKE_SYSTEM_PROCESSOR}")
        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mfpu=neon -march=${CMAKE_SYSTEM_PROCESSOR}")
    else()
--- a/cmake/ghostrider.cmake
+++ b/cmake/ghostrider.cmake
@@ -0,0 +1,8 @@
+if (WITH_GHOSTRIDER)
+    add_definitions(/DXMRIG_ALGO_GHOSTRIDER)
+    add_subdirectory(src/crypto/ghostrider)
+    set(GHOSTRIDER_LIBRARY ghostrider)
+else()
+    remove_definitions(/DXMRIG_ALGO_GHOSTRIDER)
+    set(GHOSTRIDER_LIBRARY "")
+endif()
--- a/cmake/os.cmake
+++ b/cmake/os.cmake
@@ -22,32 +22,31 @@ endif()


 if (XMRIG_OS_WIN)
-    add_definitions(/DWIN32)
-    add_definitions(/DXMRIG_OS_WIN)
+    add_definitions(-DWIN32 -DXMRIG_OS_WIN)
 elseif(XMRIG_OS_APPLE)
-    add_definitions(/DXMRIG_OS_APPLE)
+    add_definitions(-DXMRIG_OS_APPLE)

    if (XMRIG_OS_IOS)
-        add_definitions(/DXMRIG_OS_IOS)
+        add_definitions(-DXMRIG_OS_IOS)
    else()
-        add_definitions(/DXMRIG_OS_MACOS)
+        add_definitions(-DXMRIG_OS_MACOS)
    endif()

    if (XMRIG_ARM)
        set(WITH_SECURE_JIT ON)
    endif()
 elseif(XMRIG_OS_UNIX)
-    add_definitions(/DXMRIG_OS_UNIX)
+    add_definitions(-DXMRIG_OS_UNIX)

    if (XMRIG_OS_ANDROID)
-        add_definitions(/DXMRIG_OS_ANDROID)
+        add_definitions(-DXMRIG_OS_ANDROID)
    elseif (XMRIG_OS_LINUX)
-        add_definitions(/DXMRIG_OS_LINUX)
+        add_definitions(-DXMRIG_OS_LINUX)
    elseif (XMRIG_OS_FREEBSD)
-        add_definitions(/DXMRIG_OS_FREEBSD)
+        add_definitions(-DXMRIG_OS_FREEBSD)
    endif()
 endif()

 if (WITH_SECURE_JIT)
-    add_definitions(/DXMRIG_SECURE_JIT)
+    add_definitions(-DXMRIG_SECURE_JIT)
 endif()
--- a/cmake/randomx.cmake
+++ b/cmake/randomx.cmake
@@ -42,13 +42,13 @@ if (WITH_RANDOMX)
        src/crypto/rx/RxVm.cpp
    )

-    if (CMAKE_C_COMPILER_ID MATCHES MSVC)
+    if (WITH_ASM AND CMAKE_C_COMPILER_ID MATCHES MSVC)
        enable_language(ASM_MASM)
        list(APPEND SOURCES_CRYPTO
             src/crypto/randomx/jit_compiler_x86_static.asm
             src/crypto/randomx/jit_compiler_x86.cpp
            )
-    elseif (NOT XMRIG_ARM AND CMAKE_SIZEOF_VOID_P EQUAL 8)
+    elseif (WITH_ASM AND NOT XMRIG_ARM AND CMAKE_SIZEOF_VOID_P EQUAL 8)
        list(APPEND SOURCES_CRYPTO
             src/crypto/randomx/jit_compiler_x86_static.S
             src/crypto/randomx/jit_compiler_x86.cpp
@@ -76,7 +76,15 @@ if (WITH_RANDOMX)
        list(APPEND SOURCES_CRYPTO src/crypto/randomx/blake2/blake2b_sse41.c)

        if (CMAKE_C_COMPILER_ID MATCHES GNU OR CMAKE_C_COMPILER_ID MATCHES Clang)
-            set_source_files_properties(src/crypto/randomx/blake2/blake2b_sse41.c PROPERTIES COMPILE_FLAGS -msse4.1)
+            set_source_files_properties(src/crypto/randomx/blake2/blake2b_sse41.c PROPERTIES COMPILE_FLAGS "-Ofast -msse4.1")
+        endif()
+    endif()
+
+    if (WITH_AVX2)
+        list(APPEND SOURCES_CRYPTO src/crypto/randomx/blake2/avx2/blake2b_avx2.c)
+
+        if (CMAKE_C_COMPILER_ID MATCHES GNU OR CMAKE_C_COMPILER_ID MATCHES Clang)
+            set_source_files_properties(src/crypto/randomx/blake2/avx2/blake2b_avx2.c PROPERTIES COMPILE_FLAGS "-Ofast -mavx2")
        endif()
    endif()

@@ -100,13 +108,29 @@ if (WITH_RANDOMX)
        message("-- WITH_MSR=ON")

        if (XMRIG_OS_WIN)
-            list(APPEND SOURCES_CRYPTO src/crypto/rx/Rx_win.cpp)
+            list(APPEND SOURCES_CRYPTO
+                src/crypto/rx/RxFix_win.cpp
+                src/hw/msr/Msr_win.cpp
+                )
        elseif (XMRIG_OS_LINUX)
-            list(APPEND SOURCES_CRYPTO src/crypto/rx/Rx_linux.cpp)
+            list(APPEND SOURCES_CRYPTO
+                src/crypto/rx/RxFix_linux.cpp
+                src/hw/msr/Msr_linux.cpp
+                )
        endif()

-        list(APPEND HEADERS_CRYPTO src/crypto/rx/msr/MsrItem.h)
-        list(APPEND SOURCES_CRYPTO src/crypto/rx/msr/MsrItem.cpp)
+        list(APPEND HEADERS_CRYPTO
+            src/crypto/rx/RxFix.h
+            src/crypto/rx/RxMsr.h
+            src/hw/msr/Msr.h
+            src/hw/msr/MsrItem.h
+            )
+
+        list(APPEND SOURCES_CRYPTO
+            src/crypto/rx/RxMsr.cpp
+            src/hw/msr/Msr.cpp
+            src/hw/msr/MsrItem.cpp
+            )
    else()
        remove_definitions(/DXMRIG_FEATURE_MSR)
        remove_definitions(/DXMRIG_FIX_RYZEN)
--- a/doc/CPU.md
+++ b/doc/CPU.md
@@ -1,3 +1,5 @@
+**:warning: Recent version of this page https://xmrig.com/docs/miner/config/cpu.**
+
 # CPU backend

 All CPU related settings contains in one `cpu` object in config file, CPU backend allow specify multiple profiles and allow switch between them without restrictions by pool request or config change. Default auto-configuration create reasonable minimum of profiles which cover all supported algorithms.
@@ -75,6 +77,35 @@ Each number represent one thread and means CPU affinity, this is default format
 ```
 Internal format, but can be user defined.

+## RandomX options
+
+#### `init`
+Thread count to initialize RandomX dataset. Auto-detect (`-1`) or any number greater than 0 to use that many threads.
+
+#### `init-avx2`
+Use AVX2 for dataset initialization. Faster on some CPUs. Auto-detect (`-1`), disabled (`0`), always enabled on CPUs that support AVX2 (`1`).
+
+#### `mode`
+RandomX mining mode: `auto`, `fast` (2 GB memory), `light` (256 MB memory).
+
+#### `1gb-pages`
+Use 1GB hugepages for RandomX dataset (Linux only). Enabled (`true`) or disabled (`false`). It gives 1-3% speedup.
+
+#### `wrmsr`
+[MSR mod](https://xmrig.com/docs/miner/randomx-optimization-guide/msr). Enabled (`true`) or disabled (`false`). It gives up to 15% speedup depending on your system. _(**Note**: Userspace MSR writes are no longer enabled by default; the flag `msr.allow_writes=on` must be set for Linux Kernels 5.9 and after.)_
+
+#### `rdmsr`
+Restore MSR register values to their original values on exit. Used together with `wrmsr`. Enabled (`true`) or disabled (`false`).
+
+#### `cache_qos`
+[Cache QoS](https://xmrig.com/docs/miner/randomx-optimization-guide/qos). Enabled (`true`) or disabled (`false`). It's useful when you can't or don't want to mine on all CPU cores to make mining hashrate more stable.
+
+#### `numa`
+NUMA support (better hashrate on multi-CPU servers and Ryzen Threadripper 1xxx/2xxx). Enabled (`true`) or disabled (`false`).
+
+#### `scratchpad_prefetch_mode`
+Which instruction to use in RandomX loop to prefetch data from scratchpad. `1` is default and fastest in most cases. Can be off (`0`), `prefetcht0` instruction (`1`), `prefetchnta` instruction (`2`, a bit faster on Coffee Lake and a few other CPUs), `mov` instruction (`3`).
+
 ## Shared options

 #### `enabled`
@@ -83,23 +114,32 @@ Enable (`true`) or disable (`false`) CPU backend, by default `true`.
 #### `huge-pages`
 Enable (`true`) or disable (`false`) huge pages support, by default `true`.

+#### `huge-pages-jit`
+Enable (`true`) or disable (`false`) huge pages support for RandomX JIT code, by default `false`. It gives a very small boost on Ryzen CPUs, but hashrate is unstable between launches. Use with caution.
+
 #### `hw-aes`
 Force enable (`true`) or disable (`false`) hardware AES support. Default value `null` means miner autodetect this feature. Usually don't need change this option, this option useful for some rare cases when miner can't detect hardware AES, but it available. If you force enable this option, but your hardware not support it, miner will crash.

 #### `priority`
-Mining threads priority, value from `1` (lowest priority) to `5` (highest possible priority). Default value `null` means miner don't change threads priority at all.
+Mining threads priority, value from `1` (lowest priority) to `5` (highest possible priority). Default value `null` means miner don't change threads priority at all. Setting priority higher than 2 can make your PC unresponsive.
+
+#### `memory-pool` (since v4.3.0)
+Use continuous, persistent memory block for mining threads, useful for preserve huge pages allocation while algorithm switching. Possible values `false` (feature disabled, by default) or `true` or specific count of 2 MB huge pages. It helps to avoid loosing huge pages for scratchpads when RandomX dataset is updated and mining threads restart after a 2-3 days of mining.
+
+#### `yield` (since v5.1.1)
+Prefer system better system response/stability `true` (default value) or maximum hashrate `false`.

 #### `asm`
 Enable/configure or disable ASM optimizations. Possible values: `true`, `false`, `"intel"`, `"ryzen"`, `"bulldozer"`.

 #### `argon2-impl` (since v3.1.0)
-Allow override automatically detected Argon2 implementation, this option added mostly for debug purposes, default value `null` means autodetect. Other possible values: `"x86_64"`, `"SSE2"`, `"SSSE3"`, `"XOP"`, `"AVX2"`, `"AVX-512F"`. Manual selection has no safe guards, if you CPU not support required instuctions, miner will crash.
+Allow override automatically detected Argon2 implementation, this option added mostly for debug purposes, default value `null` means autodetect. This is used in RandomX dataset initialization and also in some other mining algorithms. Other possible values: `"x86_64"`, `"SSE2"`, `"SSSE3"`, `"XOP"`, `"AVX2"`, `"AVX-512F"`. Manual selection has no safe guards - if your CPU doesn't support required instuctions, miner will crash.
+
+#### `astrobwt-max-size`
+AstroBWT algorithm: skip hashes with large stage 2 size, default: `550`, min: `400`, max: `1200`. Optimal value depends on your CPU/GPU
+
+#### `astrobwt-avx2`
+AstroBWT algorithm: use AVX2 code. It's faster on some CPUs and slower on other

 #### `max-threads-hint` (since v4.2.0)
 Maximum CPU threads count (in percentage) hint for autoconfig. [CPU_MAX_USAGE.md](CPU_MAX_USAGE.md)
-
-#### `memory-pool` (since v4.3.0)
-Use continuous, persistent memory block for mining threads, useful for preserve huge pages allocation while algorithm swithing. Possible values `false` (feature disabled, by default) or `true` or specific count of 2 MB huge pages.
-
-#### `yield` (since v5.1.1)
-Prefer system better system response/stability `true` (default value) or maximum hashrate `false`.
--- a/doc/releases/5_0_1/SHA256SUMS
+++ b/doc/releases/5_0_1/SHA256SUMS
@@ -1,5 +0,0 @@
-6bb1a2e3a0fbca5195be6022f2a9fbff8a353c37c7542e7ab89420cb45b64505  xmrig-5.0.1-gcc-win32.zip
-24dba9ec281acfb2ea2c401ebd0e4e2d1f1ee5fd557da5ff3c7049020c1f78b6  xmrig-5.0.1-gcc-win64.zip
-86d65c6693ec9e35cd7547329580638b85c9eb0cf8383892a1c15199de5b556f  xmrig-5.0.1-msvc-cuda10_1-win64.zip
-0fbfe518b1c4b6993b0f66ff01302626375b15620ccf8f64d6fb97845068ffca  xmrig-5.0.1-msvc-win64.zip
-aa34890738a3494de2fa0e44db346937fea7339852f5f10b5d4655f95e2d8f1f  xmrig-5.0.1-xenial-x64.tar.gz
--- a/doc/releases/5_0_1/SHA256SUMS.sig
+++ b/doc/releases/5_0_1/SHA256SUMS.sig
@@ -1,11 +0,0 @@
-----BEGIN PGP SIGNATURE-----
-
-iQEzBAABCgAdFiEEmsTOqOZuNaXHzdwbRGpTY4vpRAkFAl3VcsoACgkQRGpTY4vp
-RAm9vQgA1MyTUU2jley2TCYLUzQy2Fffc8fbXYv64r44jbWOjC/6qo2iIlRgPhIc
-oVyPKr5TYS3QjDzCEm8IvozS0YudS6soESbPzqDonboK8pd0K4bsML9TQY2feV7A
-NL5vln0rfVHp1wxLLrQpfBqAgvJUXEyaHece6gFQN79JOGhEo2bHL2NyrOl+FViS
-b2BaMtXq410Fh+XT6ShnOaG/2EuO8ZqSGdCO6A/2LHQw1UY+mZiCvue6P6B06HmB
-WD/urOv38V389v+V+Sp4UlEW6VpBOOjvtChoVWtLt+tKzydrnt2EmoWWWg475pka
-4G6whHuMWS8CTt5/PDhJpvVXNQTIOw==
-=C764
-----END PGP SIGNATURE-----
--- a/scripts/build.hwloc.sh
+++ b/scripts/build.hwloc.sh
@@ -1,6 +1,10 @@
 #!/bin/bash -e

-HWLOC_VERSION="2.4.0"
+HWLOC_VERSION_MAJOR="2"
+HWLOC_VERSION_MINOR="7"
+HWLOC_VERSION_PATCH="1"
+
+HWLOC_VERSION="${HWLOC_VERSION_MAJOR}.${HWLOC_VERSION_MINOR}.${HWLOC_VERSION_PATCH}"

 mkdir -p deps
 mkdir -p deps/include
@@ -8,7 +12,7 @@ mkdir -p deps/lib

 mkdir -p build && cd build

-wget https://download.open-mpi.org/release/hwloc/v2.4/hwloc-${HWLOC_VERSION}.tar.gz -O hwloc-${HWLOC_VERSION}.tar.gz
+wget https://download.open-mpi.org/release/hwloc/v${HWLOC_VERSION_MAJOR}.${HWLOC_VERSION_MINOR}/hwloc-${HWLOC_VERSION}.tar.gz -O hwloc-${HWLOC_VERSION}.tar.gz
 tar -xzf hwloc-${HWLOC_VERSION}.tar.gz

 cd hwloc-${HWLOC_VERSION}
@@ -16,4 +20,4 @@ cd hwloc-${HWLOC_VERSION}
 make -j$(nproc || sysctl -n hw.ncpu || sysctl -n hw.logicalcpu)
 cp -fr include ../../deps
 cp hwloc/.libs/libhwloc.a ../../deps/lib
-cd ..
+cd ..
--- a/scripts/build.libressl.sh
+++ b/scripts/build.libressl.sh
@@ -1,6 +1,6 @@
 #!/bin/bash -e

-LIBRESSL_VERSION="3.0.2"
+LIBRESSL_VERSION="3.5.2"

 mkdir -p deps
 mkdir -p deps/include
@@ -17,4 +17,4 @@ make -j$(nproc || sysctl -n hw.ncpu || sysctl -n hw.logicalcpu)
 cp -fr include ../../deps
 cp crypto/.libs/libcrypto.a ../../deps/lib
 cp ssl/.libs/libssl.a ../../deps/lib
-cd ..
+cd ..
--- a/scripts/build.openssl.sh
+++ b/scripts/build.openssl.sh
@@ -1,6 +1,6 @@
 #!/bin/bash -e

-OPENSSL_VERSION="1.1.1i"
+OPENSSL_VERSION="1.1.1o"

 mkdir -p deps
 mkdir -p deps/include
@@ -17,4 +17,4 @@ make -j$(nproc || sysctl -n hw.ncpu || sysctl -n hw.logicalcpu)
 cp -fr include ../../deps
 cp libcrypto.a ../../deps/lib
 cp libssl.a ../../deps/lib
-cd ..
+cd ..
--- a/scripts/build.openssl3.sh
+++ b/scripts/build.openssl3.sh
@@ -0,0 +1,20 @@
+#!/bin/bash -e
+
+OPENSSL_VERSION="3.0.3"
+
+mkdir -p deps
+mkdir -p deps/include
+mkdir -p deps/lib
+
+mkdir -p build && cd build
+
+wget https://www.openssl.org/source/openssl-${OPENSSL_VERSION}.tar.gz -O openssl-${OPENSSL_VERSION}.tar.gz
+tar -xzf openssl-${OPENSSL_VERSION}.tar.gz
+
+cd openssl-${OPENSSL_VERSION}
+./config -no-shared -no-asm -no-zlib -no-comp -no-dgram -no-filenames -no-cms
+make -j$(nproc || sysctl -n hw.ncpu || sysctl -n hw.logicalcpu)
+cp -fr include ../../deps
+cp libcrypto.a ../../deps/lib
+cp libssl.a ../../deps/lib
+cd ..
--- a/scripts/build.uv.sh
+++ b/scripts/build.uv.sh
@@ -1,6 +1,6 @@
 #!/bin/bash -e

-UV_VERSION="1.40.0"
+UV_VERSION="1.44.1"

 mkdir -p deps
 mkdir -p deps/include
@@ -17,4 +17,4 @@ sh autogen.sh
 make -j$(nproc || sysctl -n hw.ncpu || sysctl -n hw.logicalcpu)
 cp -fr include ../../deps
 cp .libs/libuv.a ../../deps/lib
-cd ..
+cd ..
--- a/scripts/generate_cl.js
+++ b/scripts/generate_cl.js
@@ -51,6 +51,7 @@ function rx()
        'randomx_constants_wow.h',
        'randomx_constants_arqma.h',
        'randomx_constants_keva.h',
+        'randomx_constants_graft.h',
        'aes.cl',
        'blake2b.cl',
        'randomx_vm.cl',
@@ -66,15 +67,6 @@ function rx()
 }


-function astrobwt()
-{
-    const astrobwt = opencl_minify(addIncludes('astrobwt.cl', [ 'BWT.cl', 'salsa20.cl', 'sha3.cl' ]));
-
-    // fs.writeFileSync('astrobwt_gen.cl', astrobwt);
-    fs.writeFileSync('astrobwt_cl.h', text2h(astrobwt, 'xmrig', 'astrobwt_cl'));
-}
-
-
 function kawpow()
 {
    const kawpow = opencl_minify(addIncludes('kawpow.cl', [ 'defs.h' ]));
@@ -96,11 +88,6 @@ process.chdir(path.resolve('src/backend/opencl/cl/rx'));

 rx();

-process.chdir(cwd);
-process.chdir(path.resolve('src/backend/opencl/cl/astrobwt'));
-
-astrobwt();
-
 process.chdir(cwd);
 process.chdir(path.resolve('src/backend/opencl/cl/kawpow'));

--- a/scripts/randomx_boost.sh
+++ b/scripts/randomx_boost.sh
@@ -1,28 +1,34 @@
-#!/bin/bash
+#!/bin/sh -e

-modprobe msr
+MSR_FILE=/sys/module/msr/parameters/allow_writes

-if cat /proc/cpuinfo | grep "AMD Ryzen" > /dev/null;
+if test -e "$MSR_FILE"; then
+	echo on > $MSR_FILE
+else
+	modprobe msr allow_writes=on
+fi
+
+if grep -E 'AMD Ryzen|AMD EPYC' /proc/cpuinfo > /dev/null;
 	then
-	if cat /proc/cpuinfo | grep "cpu family[[:space:]]:[[:space:]]25" > /dev/null;
+	if grep "cpu family[[:space:]]:[[:space:]]25" /proc/cpuinfo > /dev/null;
 		then
-			echo "Detected Ryzen (Zen3)"
+			echo "Detected Zen3 CPU"
 			wrmsr -a 0xc0011020 0x4480000000000
 			wrmsr -a 0xc0011021 0x1c000200000040
 			wrmsr -a 0xc0011022 0xc000000401500000
 			wrmsr -a 0xc001102b 0x2000cc14
-			echo "MSR register values for Ryzen (Zen3) applied"
+			echo "MSR register values for Zen3 applied"
 		else
-			echo "Detected Ryzen (Zen1/Zen2)"
+			echo "Detected Zen1/Zen2 CPU"
 			wrmsr -a 0xc0011020 0
 			wrmsr -a 0xc0011021 0x40
 			wrmsr -a 0xc0011022 0x1510000
 			wrmsr -a 0xc001102b 0x2000cc16
-			echo "MSR register values for Ryzen (Zen1/Zen2) applied"
+			echo "MSR register values for Zen1/Zen2 applied"
 		fi
-elif cat /proc/cpuinfo | grep "Intel" > /dev/null;
+elif grep "Intel" /proc/cpuinfo > /dev/null;
 	then
-		echo "Detected Intel"
+		echo "Detected Intel CPU"
 		wrmsr -a 0x1a4 0xf
 		echo "MSR register values for Intel applied"
 else
--- a/scripts/rtm_ghostrider_example.cmd
+++ b/scripts/rtm_ghostrider_example.cmd
@@ -0,0 +1,23 @@
+:: Example batch file for mining Raptoreum at a pool
+::
+:: Format:
+::      xmrig.exe -a gr -o <pool address>:<pool port> -u <pool username/wallet> -p <pool password>
+::
+:: Fields:
+::      pool address            The host name of the pool stratum or its IP address, for example raptoreumemporium.com
+::      pool port               The port of the pool's stratum to connect to, for example 3333. Check your pool's getting started page.
+::      pool username/wallet    For most pools, this is the wallet address you want to mine to. Some pools require a username
+::      pool password           For most pools this can be just 'x'. For pools using usernames, you may need to provide a password as configured on the pool.
+::
+:: List of Raptoreum mining pools:
+::      https://miningpoolstats.stream/raptoreum
+::
+:: Choose pools outside of top 5 to help Raptoreum network be more decentralized!
+:: Smaller pools also often have smaller fees/payout limits.
+
+cd %~dp0
+:: Use this command line to connect to non-SSL port
+xmrig.exe -a gr -o raptoreumemporium.com:3008 -u WALLET_ADDRESS -p x
+:: Or use this command line to connect to an SSL port
+:: xmrig.exe -a gr -o rtm.suprnova.cc:4273 --tls -u WALLET_ADDRESS -p x
+pause
--- a/src/3rdparty/CL/cl_dx9_media_sharing.h
+++ b/src/3rdparty/CL/cl_dx9_media_sharing.h
@@ -44,7 +44,7 @@ extern "C" {

 typedef cl_uint             cl_dx9_media_adapter_type_khr;
 typedef cl_uint             cl_dx9_media_adapter_set_khr;
-    
+
 #if defined(_WIN32)
 #include <d3d9.h>
 typedef struct _cl_dx9_surface_info_khr
@@ -105,7 +105,7 @@ typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromDX9MediaSurfaceKHR_fn)(
    cl_mem_flags                  flags,
    cl_dx9_media_adapter_type_khr adapter_type,
    void *                        surface_info,
-    cl_uint                       plane,                                                                          
+    cl_uint                       plane,
    cl_int *                      errcode_ret) CL_API_SUFFIX__VERSION_1_2;

 typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireDX9MediaSurfacesKHR_fn)(
--- a/src/3rdparty/CL/cl_gl_ext.h
+++ b/src/3rdparty/CL/cl_gl_ext.h
@@ -35,7 +35,7 @@ extern "C" {

 #include <CL/cl_gl.h>

-/* 
+/*
 *  cl_khr_gl_event extension
 */
 #define CL_COMMAND_GL_FENCE_SYNC_OBJECT_KHR     0x200D
--- a/src/3rdparty/adl/adl_defines.h
+++ b/src/3rdparty/adl/adl_defines.h
@@ -1471,7 +1471,7 @@ typedef enum _ADLProfilePropertyType
 #define ADL_HDR_FREESYNC_HDR    0x0004      ///< FreeSync HDR supported
 /// @}

-/// \defgroup define_FreesyncFlags ADLDDCInfo2 Freesync HDR flags 
+/// \defgroup define_FreesyncFlags ADLDDCInfo2 Freesync HDR flags
 /// @{
 /// defines for iFreesyncFlags in ADLDDCInfo2
 #define ADL_HDR_FREESYNC_BACKLIGHT_SUPPORT           0x0001      ///< Global backlight control supported
@@ -1738,7 +1738,7 @@ enum ADLODNDPMMaskType
     ADL_ODN_DPM_MASK                = 1 << 2,
 };

-//ODN features Bits for ADLODNCapabilitiesX2 
+//ODN features Bits for ADLODNCapabilitiesX2
 enum ADLODNFeatureControl
 {
     ADL_ODN_SCLK_DPM                = 1 << 0,
@@ -1764,7 +1764,7 @@ enum ADLODNFeatureControl

 //If any new feature is added, PPLIB only needs to add ext feature ID and Item ID(Seeting ID). These IDs should match the drive defined in CWDDEPM.h
 enum ADLODNExtFeatureControl
-{	
+{
 	ADL_ODN_EXT_FEATURE_MEMORY_TIMING_TUNE = 1 << 0,
 	ADL_ODN_EXT_FEATURE_FAN_ZERO_RPM_CONTROL = 1 << 1,
 	ADL_ODN_EXT_FEATURE_AUTO_UV_ENGINE = 1 << 2,   //Auto under voltage
@@ -1794,7 +1794,7 @@ enum ADLODNExtSettingId
 	ADL_ODN_PARAMETER_FAN_CURVE_SPEED_5,
    ADL_ODN_POWERGAUGE,
 	ODN_COUNT
-	
+
 } ;

 //OD8 Capability features bits
@@ -1811,7 +1811,7 @@ enum ADLOD8FeatureControl
    ADL_OD8_MEMORY_TIMING_TUNE = 1 << 8,
    ADL_OD8_FAN_ZERO_RPM_CONTROL = 1 << 9 ,
 	ADL_OD8_AUTO_UV_ENGINE = 1 << 10,  //Auto under voltage
-	ADL_OD8_AUTO_OC_ENGINE = 1 << 11,  //Auto overclock engine     
+	ADL_OD8_AUTO_OC_ENGINE = 1 << 11,  //Auto overclock engine
 	ADL_OD8_AUTO_OC_MEMORY = 1 << 12,  //Auto overclock memory
 	ADL_OD8_FAN_CURVE = 1 << 13,   //Fan curve
 	ADL_OD8_WS_AUTO_FAN_ACOUSTIC_LIMIT = 1 << 14, //Workstation Manual Fan controller
@@ -1888,7 +1888,7 @@ typedef enum _ADLSensorType
 	PMLOG_TEMPERATURE_VRSOC = 24,
 	PMLOG_TEMPERATURE_VRMVDD0 = 25,
 	PMLOG_TEMPERATURE_VRMVDD1 = 26,
-	PMLOG_TEMPERATURE_HOTSPOT = 27,    
+	PMLOG_TEMPERATURE_HOTSPOT = 27,
        PMLOG_TEMPERATURE_GFX = 28,
        PMLOG_TEMPERATURE_SOC = 29,
        PMLOG_GFX_POWER = 30,
--- a/src/3rdparty/adl/adl_sdk.h
+++ b/src/3rdparty/adl/adl_sdk.h
@@ -37,7 +37,7 @@
 #define __stdcall
 #endif /* (LINUX) */

-/// Memory Allocation Call back 
+/// Memory Allocation Call back
 typedef void* ( __stdcall *ADL_MAIN_MALLOC_CALLBACK )( int );


--- a/src/3rdparty/adl/adl_structures.h
+++ b/src/3rdparty/adl/adl_structures.h
@@ -1753,7 +1753,7 @@ typedef struct ADLPXConfigCaps
 ///\brief Enum containing PX or HG type
 ///
 /// This enum is used to get PX or hG type
-/// 
+///
 /// \nosubgrouping
 //////////////////////////////////////////////////////////////////////////////////////////
 enum ADLPxType
--- a/src/3rdparty/epee/LICENSE.txt
+++ b/src/3rdparty/epee/LICENSE.txt
@@ -0,0 +1,25 @@
+Copyright (c) 2006-2013, Andrey N. Sabelnikov, www.sabelnikov.net
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+    * Redistributions of source code must retain the above copyright
+      notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+      notice, this list of conditions and the following disclaimer in the
+      documentation and/or other materials provided with the distribution.
+    * Neither the name of the Andrey N. Sabelnikov nor the
+      names of its contributors may be used to endorse or promote products
+      derived from this software without specific prior written permission.
+
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL Andrey N. Sabelnikov BE LIABLE FOR ANY
+DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--- a/src/3rdparty/epee/README.md
+++ b/src/3rdparty/epee/README.md
@@ -0,0 +1 @@
+epee -  is a small library of helpers, wrappers, tools and and so on, used to make my life easier.
--- a/src/3rdparty/epee/span.h
+++ b/src/3rdparty/epee/span.h
@@ -0,0 +1,176 @@
+// Copyright (c) 2017-2020, The Monero Project
+//
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without modification, are
+// permitted provided that the following conditions are met:
+//
+// 1. Redistributions of source code must retain the above copyright notice, this list of
+//    conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright notice, this list
+//    of conditions and the following disclaimer in the documentation and/or other
+//    materials provided with the distribution.
+//
+// 3. Neither the name of the copyright holder nor the names of its contributors may be
+//    used to endorse or promote products derived from this software without specific
+//    prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+// THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+// THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+#pragma once
+
+#include <algorithm>
+#include <cstdint>
+#include <memory>
+#include <string>
+#include <type_traits>
+
+namespace epee
+{
+  /*!
+    \brief Non-owning sequence of data. Does not deep copy
+
+    Inspired by `gsl::span` and/or `boost::iterator_range`. This class is
+    intended to be used as a parameter type for functions that need to take a
+    writable or read-only sequence of data. Most common cases are `span<char>`
+    and `span<std::uint8_t>`. Using as a class member is only recommended if
+    clearly documented as not doing a deep-copy. C-arrays are easily convertible
+    to this type.
+
+    \note Conversion from C string literal to `span<const char>` will include
+      the NULL-terminator.
+    \note Never allows derived-to-base pointer conversion; an array of derived
+      types is not an array of base types.
+   */
+  template<typename T>
+  class span
+  {
+    template<typename U>
+    static constexpr bool safe_conversion() noexcept
+    {
+      // Allow exact matches or `T*` -> `const T*`.
+      using with_const = typename std::add_const<U>::type;
+      return std::is_same<T, U>() ||
+        (std::is_const<T>() && std::is_same<T, with_const>());
+    }
+
+  public:
+    using value_type = T;
+    using size_type = std::size_t;
+    using difference_type = std::ptrdiff_t;
+    using pointer = T*;
+    using const_pointer = const T*;
+    using reference = T&;
+    using const_reference = const T&;
+    using iterator = pointer;
+    using const_iterator = const_pointer;
+
+    constexpr span() noexcept : ptr(nullptr), len(0) {}
+    constexpr span(std::nullptr_t) noexcept : span() {}
+
+    //! Prevent derived-to-base conversions; invalid in this context.
+    template<typename U, typename = typename std::enable_if<safe_conversion<U>()>::type>
+    constexpr span(U* const src_ptr, const std::size_t count) noexcept
+      : ptr(src_ptr), len(count) {}
+
+    //! Conversion from C-array. Prevents common bugs with sizeof + arrays.
+    template<std::size_t N>
+    constexpr span(T (&src)[N]) noexcept : span(src, N) {}
+
+    constexpr span(const span&) noexcept = default;
+    span& operator=(const span&) noexcept = default;
+
+    /*! Try to remove `amount` elements from beginning of span.
+    \return Number of elements removed. */
+    std::size_t remove_prefix(std::size_t amount) noexcept
+    {
+        amount = std::min(len, amount);
+        ptr += amount;
+        len -= amount;
+        return amount;
+    }
+
+    constexpr iterator begin() const noexcept { return ptr; }
+    constexpr const_iterator cbegin() const noexcept { return ptr; }
+
+    constexpr iterator end() const noexcept { return begin() + size(); }
+    constexpr const_iterator cend() const noexcept { return cbegin() + size(); }
+
+    constexpr bool empty() const noexcept { return size() == 0; }
+    constexpr pointer data() const noexcept { return ptr; }
+    constexpr std::size_t size() const noexcept { return len; }
+    constexpr std::size_t size_bytes() const noexcept { return size() * sizeof(value_type); }
+
+    T &operator[](size_t idx) noexcept { return ptr[idx]; }
+    const T &operator[](size_t idx) const noexcept { return ptr[idx]; }
+
+  private:
+    T* ptr;
+    std::size_t len;
+  };
+
+  //! \return `span<const T::value_type>` from a STL compatible `src`.
+  template<typename T>
+  constexpr span<const typename T::value_type> to_span(const T& src)
+  {
+    // compiler provides diagnostic if size() is not size_t.
+    return {src.data(), src.size()};
+  }
+
+  //! \return `span<T::value_type>` from a STL compatible `src`.
+  template<typename T>
+  constexpr span<typename T::value_type> to_mut_span(T& src)
+  {
+    // compiler provides diagnostic if size() is not size_t.
+    return {src.data(), src.size()};
+  }
+
+  template<typename T>
+  constexpr bool has_padding() noexcept
+  {
+    return !std::is_standard_layout<T>() || alignof(T) != 1;
+  }
+
+  //! \return Cast data from `src` as `span<const std::uint8_t>`.
+  template<typename T>
+  span<const std::uint8_t> to_byte_span(const span<const T> src) noexcept
+  {
+    static_assert(!has_padding<T>(), "source type may have padding");
+    return {reinterpret_cast<const std::uint8_t*>(src.data()), src.size_bytes()};
+  }
+
+  //! \return `span<const std::uint8_t>` which represents the bytes at `&src`.
+  template<typename T>
+  span<const std::uint8_t> as_byte_span(const T& src) noexcept
+  {
+    static_assert(!std::is_empty<T>(), "empty types will not work -> sizeof == 1");
+    static_assert(!has_padding<T>(), "source type may have padding");
+    return {reinterpret_cast<const std::uint8_t*>(std::addressof(src)), sizeof(T)};
+  }
+
+  //! \return `span<std::uint8_t>` which represents the bytes at `&src`.
+  template<typename T>
+  span<std::uint8_t> as_mut_byte_span(T& src) noexcept
+  {
+    static_assert(!std::is_empty<T>(), "empty types will not work -> sizeof == 1");
+    static_assert(!has_padding<T>(), "source type may have padding");
+    return {reinterpret_cast<std::uint8_t*>(std::addressof(src)), sizeof(T)};
+  }
+
+  //! make a span from a std::string
+  template<typename T>
+  span<const T> strspan(const std::string &s) noexcept
+  {
+    static_assert(std::is_same<T, char>() || std::is_same<T, unsigned char>() || std::is_same<T, int8_t>() || std::is_same<T, uint8_t>(), "Unexpected type");
+    return {reinterpret_cast<const T*>(s.data()), s.size()};
+  }
+}
--- a/src/3rdparty/fmt/README.rst
+++ b/src/3rdparty/fmt/README.rst
@@ -81,7 +81,7 @@ Examples
 .. code:: c++

    #include <fmt/core.h>
-    
+
    int main() {
      fmt::print("Hello, world!\n");
    }
@@ -293,11 +293,11 @@ Projects using this library
  An open-source library for mathematical programming

 * `Aseprite <https://github.com/aseprite/aseprite>`_:
-  Animated sprite editor & pixel art tool 
+  Animated sprite editor & pixel art tool

 * `AvioBook <https://www.aviobook.aero/en>`_: A comprehensive aircraft
  operations suite
-  
+
 * `Celestia <https://celestia.space/>`_: Real-time 3D visualization of space

 * `Ceph <https://ceph.com/>`_: A scalable distributed storage system
@@ -351,7 +351,7 @@ Projects using this library

 * `quasardb <https://www.quasardb.net/>`_: A distributed, high-performance,
  associative database
-  
+
 * `Quill <https://github.com/odygrd/quill>`_: Asynchronous low-latency logging library

 * `QKW <https://github.com/ravijanjam/qkw>`_: Generalizing aliasing to simplify
--- a/src/3rdparty/getopt/getopt.h
+++ b/src/3rdparty/getopt/getopt.h
@@ -3,9 +3,9 @@
 * DISCLAIMER
 * This file is part of the mingw-w64 runtime package.
 *
- * The mingw-w64 runtime package and its code is distributed in the hope that it 
- * will be useful but WITHOUT ANY WARRANTY.  ALL WARRANTIES, EXPRESSED OR 
- * IMPLIED ARE HEREBY DISCLAIMED.  This includes but is not limited to 
+ * The mingw-w64 runtime package and its code is distributed in the hope that it
+ * will be useful but WITHOUT ANY WARRANTY.  ALL WARRANTIES, EXPRESSED OR
+ * IMPLIED ARE HEREBY DISCLAIMED.  This includes but is not limited to
 * warranties of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 */
 /*
@@ -109,11 +109,7 @@ char    *optarg;		/* argument associated with option */
 extern char __declspec(dllimport) *__progname;
 #endif

-#ifdef __CYGWIN__
 static char EMSG[] = "";
-#else
-#define	EMSG		""
-#endif

 static int getopt_internal(int, char * const *, const char *,
 			   const struct option *, int *, int);
--- a/src/3rdparty/http-parser/AUTHORS
+++ b/src/3rdparty/http-parser/AUTHORS
@@ -1,68 +0,0 @@
-# Authors ordered by first contribution.
-Ryan Dahl <ry@tinyclouds.org>
-Jeremy Hinegardner <jeremy@hinegardner.org>
-Sergey Shepelev <temotor@gmail.com>
-Joe Damato <ice799@gmail.com>
-tomika <tomika_nospam@freemail.hu>
-Phoenix Sol <phoenix@burninglabs.com>
-Cliff Frey <cliff@meraki.com>
-Ewen Cheslack-Postava <ewencp@cs.stanford.edu>
-Santiago Gala <sgala@apache.org>
-Tim Becker <tim.becker@syngenio.de>
-Jeff Terrace <jterrace@gmail.com>
-Ben Noordhuis <info@bnoordhuis.nl>
-Nathan Rajlich <nathan@tootallnate.net>
-Mark Nottingham <mnot@mnot.net>
-Aman Gupta <aman@tmm1.net>
-Tim Becker <tim.becker@kuriositaet.de>
-Sean Cunningham <sean.cunningham@mandiant.com>
-Peter Griess <pg@std.in>
-Salman Haq <salman.haq@asti-usa.com>
-Cliff Frey <clifffrey@gmail.com>
-Jon Kolb <jon@b0g.us>
-Fouad Mardini <f.mardini@gmail.com>
-Paul Querna <pquerna@apache.org>
-Felix Geisendörfer <felix@debuggable.com>
-koichik <koichik@improvement.jp>
-Andre Caron <andre.l.caron@gmail.com>
-Ivo Raisr <ivosh@ivosh.net>
-James McLaughlin <jamie@lacewing-project.org>
-David Gwynne <loki@animata.net>
-Thomas LE ROUX <thomas@november-eleven.fr>
-Randy Rizun <rrizun@ortivawireless.com>
-Andre Louis Caron <andre.louis.caron@usherbrooke.ca>
-Simon Zimmermann <simonz05@gmail.com>
-Erik Dubbelboer <erik@dubbelboer.com>
-Martell Malone <martellmalone@gmail.com>
-Bertrand Paquet <bpaquet@octo.com>
-BogDan Vatra <bogdan@kde.org>
-Peter Faiman <peter@thepicard.org>
-Corey Richardson <corey@octayn.net>
-Tóth Tamás <tomika_nospam@freemail.hu>
-Cam Swords <cam.swords@gmail.com>
-Chris Dickinson <christopher.s.dickinson@gmail.com>
-Uli Köhler <ukoehler@btronik.de>
-Charlie Somerville <charlie@charliesomerville.com>
-Patrik Stutz <patrik.stutz@gmail.com>
-Fedor Indutny <fedor.indutny@gmail.com>
-runner <runner.mei@gmail.com>
-Alexis Campailla <alexis@janeasystems.com>
-David Wragg <david@wragg.org>
-Vinnie Falco <vinnie.falco@gmail.com>
-Alex Butum <alexbutum@linux.com>
-Rex Feng <rexfeng@gmail.com>
-Alex Kocharin <alex@kocharin.ru>
-Mark Koopman <markmontymark@yahoo.com>
-Helge Heß <me@helgehess.eu>
-Alexis La Goutte <alexis.lagoutte@gmail.com>
-George Miroshnykov <george.miroshnykov@gmail.com>
-Maciej Małecki <me@mmalecki.com>
-Marc O'Morain <github.com@marcomorain.com>
-Jeff Pinner <jpinner@twitter.com>
-Timothy J Fontaine <tjfontaine@gmail.com>
-Akagi201 <akagi201@gmail.com>
-Romain Giraud <giraud.romain@gmail.com>
-Jay Satiro <raysatiro@yahoo.com>
-Arne Steen <Arne.Steen@gmx.de>
-Kjell Schubert <kjell.schubert@gmail.com>
-Olivier Mengué <dolmen@cpan.org>
--- a/src/3rdparty/http-parser/LICENSE-MIT
+++ b/src/3rdparty/http-parser/LICENSE-MIT
@@ -1,19 +0,0 @@
-Copyright Joyent, Inc. and other Node contributors.
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to
-deal in the Software without restriction, including without limitation the
-rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
-sell copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in
-all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
-IN THE SOFTWARE. 
--- a/src/3rdparty/http-parser/README.md
+++ b/src/3rdparty/http-parser/README.md
@@ -1,246 +0,0 @@
-HTTP Parser
-===========
-
-[![Build Status](https://api.travis-ci.org/nodejs/http-parser.svg?branch=master)](https://travis-ci.org/nodejs/http-parser)
-
-This is a parser for HTTP messages written in C. It parses both requests and
-responses. The parser is designed to be used in performance HTTP
-applications. It does not make any syscalls nor allocations, it does not
-buffer data, it can be interrupted at anytime. Depending on your
-architecture, it only requires about 40 bytes of data per message
-stream (in a web server that is per connection).
-
-Features:
-
-  * No dependencies
-  * Handles persistent streams (keep-alive).
-  * Decodes chunked encoding.
-  * Upgrade support
-  * Defends against buffer overflow attacks.
-
-The parser extracts the following information from HTTP messages:
-
-  * Header fields and values
-  * Content-Length
-  * Request method
-  * Response status code
-  * Transfer-Encoding
-  * HTTP version
-  * Request URL
-  * Message body
-
-
-Usage
-----
-
-One `http_parser` object is used per TCP connection. Initialize the struct
-using `http_parser_init()` and set the callbacks. That might look something
-like this for a request parser:
-```c
-http_parser_settings settings;
-settings.on_url = my_url_callback;
-settings.on_header_field = my_header_field_callback;
-/* ... */
-
-http_parser *parser = malloc(sizeof(http_parser));
-http_parser_init(parser, HTTP_REQUEST);
-parser->data = my_socket;
-```
-
-When data is received on the socket execute the parser and check for errors.
-
-```c
-size_t len = 80*1024, nparsed;
-char buf[len];
-ssize_t recved;
-
-recved = recv(fd, buf, len, 0);
-
-if (recved < 0) {
-  /* Handle error. */
-}
-
-/* Start up / continue the parser.
- * Note we pass recved==0 to signal that EOF has been received.
- */
-nparsed = http_parser_execute(parser, &settings, buf, recved);
-
-if (parser->upgrade) {
-  /* handle new protocol */
-} else if (nparsed != recved) {
-  /* Handle error. Usually just close the connection. */
-}
-```
-
-`http_parser` needs to know where the end of the stream is. For example, sometimes
-servers send responses without Content-Length and expect the client to
-consume input (for the body) until EOF. To tell `http_parser` about EOF, give
-`0` as the fourth parameter to `http_parser_execute()`. Callbacks and errors
-can still be encountered during an EOF, so one must still be prepared
-to receive them.
-
-Scalar valued message information such as `status_code`, `method`, and the
-HTTP version are stored in the parser structure. This data is only
-temporally stored in `http_parser` and gets reset on each new message. If
-this information is needed later, copy it out of the structure during the
-`headers_complete` callback.
-
-The parser decodes the transfer-encoding for both requests and responses
-transparently. That is, a chunked encoding is decoded before being sent to
-the on_body callback.
-
-
-The Special Problem of Upgrade
------------------------------
-
-`http_parser` supports upgrading the connection to a different protocol. An
-increasingly common example of this is the WebSocket protocol which sends
-a request like
-
-        GET /demo HTTP/1.1
-        Upgrade: WebSocket
-        Connection: Upgrade
-        Host: example.com
-        Origin: http://example.com
-        WebSocket-Protocol: sample
-
-followed by non-HTTP data.
-
-(See [RFC6455](https://tools.ietf.org/html/rfc6455) for more information the
-WebSocket protocol.)
-
-To support this, the parser will treat this as a normal HTTP message without a
-body, issuing both on_headers_complete and on_message_complete callbacks. However
-http_parser_execute() will stop parsing at the end of the headers and return.
-
-The user is expected to check if `parser->upgrade` has been set to 1 after
-`http_parser_execute()` returns. Non-HTTP data begins at the buffer supplied
-offset by the return value of `http_parser_execute()`.
-
-
-Callbacks
---------
-
-During the `http_parser_execute()` call, the callbacks set in
-`http_parser_settings` will be executed. The parser maintains state and
-never looks behind, so buffering the data is not necessary. If you need to
-save certain data for later usage, you can do that from the callbacks.
-
-There are two types of callbacks:
-
-* notification `typedef int (*http_cb) (http_parser*);`
-    Callbacks: on_message_begin, on_headers_complete, on_message_complete.
-* data `typedef int (*http_data_cb) (http_parser*, const char *at, size_t length);`
-    Callbacks: (requests only) on_url,
-               (common) on_header_field, on_header_value, on_body;
-
-Callbacks must return 0 on success. Returning a non-zero value indicates
-error to the parser, making it exit immediately.
-
-For cases where it is necessary to pass local information to/from a callback,
-the `http_parser` object's `data` field can be used.
-An example of such a case is when using threads to handle a socket connection,
-parse a request, and then give a response over that socket. By instantiation
-of a thread-local struct containing relevant data (e.g. accepted socket,
-allocated memory for callbacks to write into, etc), a parser's callbacks are
-able to communicate data between the scope of the thread and the scope of the
-callback in a threadsafe manner. This allows `http_parser` to be used in
-multi-threaded contexts.
-
-Example:
-```c
- typedef struct {
-  socket_t sock;
-  void* buffer;
-  int buf_len;
- } custom_data_t;
-
-
-int my_url_callback(http_parser* parser, const char *at, size_t length) {
-  /* access to thread local custom_data_t struct.
-  Use this access save parsed data for later use into thread local
-  buffer, or communicate over socket
-  */
-  parser->data;
-  ...
-  return 0;
-}
-
-...
-
-void http_parser_thread(socket_t sock) {
- int nparsed = 0;
- /* allocate memory for user data */
- custom_data_t *my_data = malloc(sizeof(custom_data_t));
-
- /* some information for use by callbacks.
- * achieves thread -> callback information flow */
- my_data->sock = sock;
-
- /* instantiate a thread-local parser */
- http_parser *parser = malloc(sizeof(http_parser));
- http_parser_init(parser, HTTP_REQUEST); /* initialise parser */
- /* this custom data reference is accessible through the reference to the
- parser supplied to callback functions */
- parser->data = my_data;
-
- http_parser_settings settings; /* set up callbacks */
- settings.on_url = my_url_callback;
-
- /* execute parser */
- nparsed = http_parser_execute(parser, &settings, buf, recved);
-
- ...
- /* parsed information copied from callback.
- can now perform action on data copied into thread-local memory from callbacks.
- achieves callback -> thread information flow */
- my_data->buffer;
- ...
-}
-
-```
-
-In case you parse HTTP message in chunks (i.e. `read()` request line
-from socket, parse, read half headers, parse, etc) your data callbacks
-may be called more than once. `http_parser` guarantees that data pointer is only
-valid for the lifetime of callback. You can also `read()` into a heap allocated
-buffer to avoid copying memory around if this fits your application.
-
-Reading headers may be a tricky task if you read/parse headers partially.
-Basically, you need to remember whether last header callback was field or value
-and apply the following logic:
-
-    (on_header_field and on_header_value shortened to on_h_*)
-     ------------------------ ------------ --------------------------------------------
-    | State (prev. callback) | Callback   | Description/action                         |
-     ------------------------ ------------ --------------------------------------------
-    | nothing (first call)   | on_h_field | Allocate new buffer and copy callback data |
-    |                        |            | into it                                    |
-     ------------------------ ------------ --------------------------------------------
-    | value                  | on_h_field | New header started.                        |
-    |                        |            | Copy current name,value buffers to headers |
-    |                        |            | list and allocate new buffer for new name  |
-     ------------------------ ------------ --------------------------------------------
-    | field                  | on_h_field | Previous name continues. Reallocate name   |
-    |                        |            | buffer and append callback data to it      |
-     ------------------------ ------------ --------------------------------------------
-    | field                  | on_h_value | Value for current header started. Allocate |
-    |                        |            | new buffer and copy callback data to it    |
-     ------------------------ ------------ --------------------------------------------
-    | value                  | on_h_value | Value continues. Reallocate value buffer   |
-    |                        |            | and append callback data to it             |
-     ------------------------ ------------ --------------------------------------------
-
-
-Parsing URLs
------------
-
-A simplistic zero-copy URL parser is provided as `http_parser_parse_url()`.
-Users of this library may wish to use it to parse URLs constructed from
-consecutive `on_url` callbacks.
-
-See examples of reading in headers:
-
-* [partial example](http://gist.github.com/155877) in C
-* [from http-parser tests](http://github.com/joyent/http-parser/blob/37a0ff8/test.c#L403) in C
-* [from Node library](http://github.com/joyent/node/blob/842eaf4/src/http.js#L284) in Javascript
--- a/src/3rdparty/http-parser/http_parser.c
+++ b/src/3rdparty/http-parser/http_parser.c
--- a/src/3rdparty/http-parser/http_parser.h
+++ b/src/3rdparty/http-parser/http_parser.h
@@ -1,442 +0,0 @@
-/* Copyright Joyent, Inc. and other Node contributors. All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to
- * deal in the Software without restriction, including without limitation the
- * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
- * sell copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
- * IN THE SOFTWARE.
- */
-#ifndef http_parser_h
-#define http_parser_h
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-/* Also update SONAME in the Makefile whenever you change these. */
-#define HTTP_PARSER_VERSION_MAJOR 2
-#define HTTP_PARSER_VERSION_MINOR 9
-#define HTTP_PARSER_VERSION_PATCH 3
-
-#include <stddef.h>
-#if defined(_WIN32) && !defined(__MINGW32__) && \
-  (!defined(_MSC_VER) || _MSC_VER<1600) && !defined(__WINE__)
-#include <BaseTsd.h>
-typedef __int8 int8_t;
-typedef unsigned __int8 uint8_t;
-typedef __int16 int16_t;
-typedef unsigned __int16 uint16_t;
-typedef __int32 int32_t;
-typedef unsigned __int32 uint32_t;
-typedef __int64 int64_t;
-typedef unsigned __int64 uint64_t;
-#else
-#include <stdint.h>
-#endif
-
-/* Compile with -DHTTP_PARSER_STRICT=0 to make less checks, but run
- * faster
- */
-#ifndef HTTP_PARSER_STRICT
-# define HTTP_PARSER_STRICT 1
-#endif
-
-/* Maximium header size allowed. If the macro is not defined
- * before including this header then the default is used. To
- * change the maximum header size, define the macro in the build
- * environment (e.g. -DHTTP_MAX_HEADER_SIZE=<value>). To remove
- * the effective limit on the size of the header, define the macro
- * to a very large number (e.g. -DHTTP_MAX_HEADER_SIZE=0x7fffffff)
- */
-#ifndef HTTP_MAX_HEADER_SIZE
-# define HTTP_MAX_HEADER_SIZE (80*1024)
-#endif
-
-typedef struct http_parser http_parser;
-typedef struct http_parser_settings http_parser_settings;
-
-
-/* Callbacks should return non-zero to indicate an error. The parser will
- * then halt execution.
- *
- * The one exception is on_headers_complete. In a HTTP_RESPONSE parser
- * returning '1' from on_headers_complete will tell the parser that it
- * should not expect a body. This is used when receiving a response to a
- * HEAD request which may contain 'Content-Length' or 'Transfer-Encoding:
- * chunked' headers that indicate the presence of a body.
- *
- * Returning `2` from on_headers_complete will tell parser that it should not
- * expect neither a body nor any futher responses on this connection. This is
- * useful for handling responses to a CONNECT request which may not contain
- * `Upgrade` or `Connection: upgrade` headers.
- *
- * http_data_cb does not return data chunks. It will be called arbitrarily
- * many times for each string. E.G. you might get 10 callbacks for "on_url"
- * each providing just a few characters more data.
- */
-typedef int (*http_data_cb) (http_parser*, const char *at, size_t length);
-typedef int (*http_cb) (http_parser*);
-
-
-/* Status Codes */
-#define HTTP_STATUS_MAP(XX)                                                 \
-  XX(100, CONTINUE,                        Continue)                        \
-  XX(101, SWITCHING_PROTOCOLS,             Switching Protocols)             \
-  XX(102, PROCESSING,                      Processing)                      \
-  XX(200, OK,                              OK)                              \
-  XX(201, CREATED,                         Created)                         \
-  XX(202, ACCEPTED,                        Accepted)                        \
-  XX(203, NON_AUTHORITATIVE_INFORMATION,   Non-Authoritative Information)   \
-  XX(204, NO_CONTENT,                      No Content)                      \
-  XX(205, RESET_CONTENT,                   Reset Content)                   \
-  XX(206, PARTIAL_CONTENT,                 Partial Content)                 \
-  XX(207, MULTI_STATUS,                    Multi-Status)                    \
-  XX(208, ALREADY_REPORTED,                Already Reported)                \
-  XX(226, IM_USED,                         IM Used)                         \
-  XX(300, MULTIPLE_CHOICES,                Multiple Choices)                \
-  XX(301, MOVED_PERMANENTLY,               Moved Permanently)               \
-  XX(302, FOUND,                           Found)                           \
-  XX(303, SEE_OTHER,                       See Other)                       \
-  XX(304, NOT_MODIFIED,                    Not Modified)                    \
-  XX(305, USE_PROXY,                       Use Proxy)                       \
-  XX(307, TEMPORARY_REDIRECT,              Temporary Redirect)              \
-  XX(308, PERMANENT_REDIRECT,              Permanent Redirect)              \
-  XX(400, BAD_REQUEST,                     Bad Request)                     \
-  XX(401, UNAUTHORIZED,                    Unauthorized)                    \
-  XX(402, PAYMENT_REQUIRED,                Payment Required)                \
-  XX(403, FORBIDDEN,                       Forbidden)                       \
-  XX(404, NOT_FOUND,                       Not Found)                       \
-  XX(405, METHOD_NOT_ALLOWED,              Method Not Allowed)              \
-  XX(406, NOT_ACCEPTABLE,                  Not Acceptable)                  \
-  XX(407, PROXY_AUTHENTICATION_REQUIRED,   Proxy Authentication Required)   \
-  XX(408, REQUEST_TIMEOUT,                 Request Timeout)                 \
-  XX(409, CONFLICT,                        Conflict)                        \
-  XX(410, GONE,                            Gone)                            \
-  XX(411, LENGTH_REQUIRED,                 Length Required)                 \
-  XX(412, PRECONDITION_FAILED,             Precondition Failed)             \
-  XX(413, PAYLOAD_TOO_LARGE,               Payload Too Large)               \
-  XX(414, URI_TOO_LONG,                    URI Too Long)                    \
-  XX(415, UNSUPPORTED_MEDIA_TYPE,          Unsupported Media Type)          \
-  XX(416, RANGE_NOT_SATISFIABLE,           Range Not Satisfiable)           \
-  XX(417, EXPECTATION_FAILED,              Expectation Failed)              \
-  XX(421, MISDIRECTED_REQUEST,             Misdirected Request)             \
-  XX(422, UNPROCESSABLE_ENTITY,            Unprocessable Entity)            \
-  XX(423, LOCKED,                          Locked)                          \
-  XX(424, FAILED_DEPENDENCY,               Failed Dependency)               \
-  XX(426, UPGRADE_REQUIRED,                Upgrade Required)                \
-  XX(428, PRECONDITION_REQUIRED,           Precondition Required)           \
-  XX(429, TOO_MANY_REQUESTS,               Too Many Requests)               \
-  XX(431, REQUEST_HEADER_FIELDS_TOO_LARGE, Request Header Fields Too Large) \
-  XX(451, UNAVAILABLE_FOR_LEGAL_REASONS,   Unavailable For Legal Reasons)   \
-  XX(500, INTERNAL_SERVER_ERROR,           Internal Server Error)           \
-  XX(501, NOT_IMPLEMENTED,                 Not Implemented)                 \
-  XX(502, BAD_GATEWAY,                     Bad Gateway)                     \
-  XX(503, SERVICE_UNAVAILABLE,             Service Unavailable)             \
-  XX(504, GATEWAY_TIMEOUT,                 Gateway Timeout)                 \
-  XX(505, HTTP_VERSION_NOT_SUPPORTED,      HTTP Version Not Supported)      \
-  XX(506, VARIANT_ALSO_NEGOTIATES,         Variant Also Negotiates)         \
-  XX(507, INSUFFICIENT_STORAGE,            Insufficient Storage)            \
-  XX(508, LOOP_DETECTED,                   Loop Detected)                   \
-  XX(510, NOT_EXTENDED,                    Not Extended)                    \
-  XX(511, NETWORK_AUTHENTICATION_REQUIRED, Network Authentication Required) \
-
-enum http_status
-  {
-#define XX(num, name, string) HTTP_STATUS_##name = num,
-  HTTP_STATUS_MAP(XX)
-#undef XX
-  };
-
-
-/* Request Methods */
-#define HTTP_METHOD_MAP(XX)         \
-  XX(0,  DELETE,      DELETE)       \
-  XX(1,  GET,         GET)          \
-  XX(2,  HEAD,        HEAD)         \
-  XX(3,  POST,        POST)         \
-  XX(4,  PUT,         PUT)          \
-  /* pathological */                \
-  XX(5,  CONNECT,     CONNECT)      \
-  XX(6,  OPTIONS,     OPTIONS)      \
-  XX(7,  TRACE,       TRACE)        \
-  /* WebDAV */                      \
-  XX(8,  COPY,        COPY)         \
-  XX(9,  LOCK,        LOCK)         \
-  XX(10, MKCOL,       MKCOL)        \
-  XX(11, MOVE,        MOVE)         \
-  XX(12, PROPFIND,    PROPFIND)     \
-  XX(13, PROPPATCH,   PROPPATCH)    \
-  XX(14, SEARCH,      SEARCH)       \
-  XX(15, UNLOCK,      UNLOCK)       \
-  XX(16, BIND,        BIND)         \
-  XX(17, REBIND,      REBIND)       \
-  XX(18, UNBIND,      UNBIND)       \
-  XX(19, ACL,         ACL)          \
-  /* subversion */                  \
-  XX(20, REPORT,      REPORT)       \
-  XX(21, MKACTIVITY,  MKACTIVITY)   \
-  XX(22, CHECKOUT,    CHECKOUT)     \
-  XX(23, MERGE,       MERGE)        \
-  /* upnp */                        \
-  XX(24, MSEARCH,     M-SEARCH)     \
-  XX(25, NOTIFY,      NOTIFY)       \
-  XX(26, SUBSCRIBE,   SUBSCRIBE)    \
-  XX(27, UNSUBSCRIBE, UNSUBSCRIBE)  \
-  /* RFC-5789 */                    \
-  XX(28, PATCH,       PATCH)        \
-  XX(29, PURGE,       PURGE)        \
-  /* CalDAV */                      \
-  XX(30, MKCALENDAR,  MKCALENDAR)   \
-  /* RFC-2068, section 19.6.1.2 */  \
-  XX(31, LINK,        LINK)         \
-  XX(32, UNLINK,      UNLINK)       \
-  /* icecast */                     \
-  XX(33, SOURCE,      SOURCE)       \
-
-enum http_method
-  {
-#define XX(num, name, string) HTTP_##name = num,
-  HTTP_METHOD_MAP(XX)
-#undef XX
-  };
-
-
-enum http_parser_type { HTTP_REQUEST, HTTP_RESPONSE, HTTP_BOTH };
-
-
-/* Flag values for http_parser.flags field */
-enum flags
-  { F_CHUNKED               = 1 << 0
-  , F_CONNECTION_KEEP_ALIVE = 1 << 1
-  , F_CONNECTION_CLOSE      = 1 << 2
-  , F_CONNECTION_UPGRADE    = 1 << 3
-  , F_TRAILING              = 1 << 4
-  , F_UPGRADE               = 1 << 5
-  , F_SKIPBODY              = 1 << 6
-  , F_CONTENTLENGTH         = 1 << 7
-  , F_TRANSFER_ENCODING     = 1 << 8
-  };
-
-
-/* Map for errno-related constants
- *
- * The provided argument should be a macro that takes 2 arguments.
- */
-#define HTTP_ERRNO_MAP(XX)                                           \
-  /* No error */                                                     \
-  XX(OK, "success")                                                  \
-                                                                     \
-  /* Callback-related errors */                                      \
-  XX(CB_message_begin, "the on_message_begin callback failed")       \
-  XX(CB_url, "the on_url callback failed")                           \
-  XX(CB_header_field, "the on_header_field callback failed")         \
-  XX(CB_header_value, "the on_header_value callback failed")         \
-  XX(CB_headers_complete, "the on_headers_complete callback failed") \
-  XX(CB_body, "the on_body callback failed")                         \
-  XX(CB_message_complete, "the on_message_complete callback failed") \
-  XX(CB_status, "the on_status callback failed")                     \
-  XX(CB_chunk_header, "the on_chunk_header callback failed")         \
-  XX(CB_chunk_complete, "the on_chunk_complete callback failed")     \
-                                                                     \
-  /* Parsing-related errors */                                       \
-  XX(INVALID_EOF_STATE, "stream ended at an unexpected time")        \
-  XX(HEADER_OVERFLOW,                                                \
-     "too many header bytes seen; overflow detected")                \
-  XX(CLOSED_CONNECTION,                                              \
-     "data received after completed connection: close message")      \
-  XX(INVALID_VERSION, "invalid HTTP version")                        \
-  XX(INVALID_STATUS, "invalid HTTP status code")                     \
-  XX(INVALID_METHOD, "invalid HTTP method")                          \
-  XX(INVALID_URL, "invalid URL")                                     \
-  XX(INVALID_HOST, "invalid host")                                   \
-  XX(INVALID_PORT, "invalid port")                                   \
-  XX(INVALID_PATH, "invalid path")                                   \
-  XX(INVALID_QUERY_STRING, "invalid query string")                   \
-  XX(INVALID_FRAGMENT, "invalid fragment")                           \
-  XX(LF_EXPECTED, "LF character expected")                           \
-  XX(INVALID_HEADER_TOKEN, "invalid character in header")            \
-  XX(INVALID_CONTENT_LENGTH,                                         \
-     "invalid character in content-length header")                   \
-  XX(UNEXPECTED_CONTENT_LENGTH,                                      \
-     "unexpected content-length header")                             \
-  XX(INVALID_CHUNK_SIZE,                                             \
-     "invalid character in chunk size header")                       \
-  XX(INVALID_TRANSFER_ENCODING,                                      \
-     "request has invalid transfer-encoding")                        \
-  XX(INVALID_CONSTANT, "invalid constant string")                    \
-  XX(INVALID_INTERNAL_STATE, "encountered unexpected internal state")\
-  XX(STRICT, "strict mode assertion failed")                         \
-  XX(PAUSED, "parser is paused")                                     \
-  XX(UNKNOWN, "an unknown error occurred")
-
-
-/* Define HPE_* values for each errno value above */
-#define HTTP_ERRNO_GEN(n, s) HPE_##n,
-enum http_errno {
-  HTTP_ERRNO_MAP(HTTP_ERRNO_GEN)
-};
-#undef HTTP_ERRNO_GEN
-
-
-/* Get an http_errno value from an http_parser */
-#define HTTP_PARSER_ERRNO(p)            ((enum http_errno) (p)->http_errno)
-
-
-struct http_parser {
-  /** PRIVATE **/
-  unsigned int type : 2;         /* enum http_parser_type */
-  unsigned int state : 7;        /* enum state from http_parser.c */
-  unsigned int header_state : 7; /* enum header_state from http_parser.c */
-  unsigned int index : 7;        /* index into current matcher */
-  unsigned int lenient_http_headers : 1;
-  unsigned int flags : 16;       /* F_* values from 'flags' enum; semi-public */
-
-  uint32_t nread;          /* # bytes read in various scenarios */
-  uint64_t content_length; /* # bytes in body (0 if no Content-Length header) */
-
-  /** READ-ONLY **/
-  unsigned short http_major;
-  unsigned short http_minor;
-  unsigned int status_code : 16; /* responses only */
-  unsigned int method : 8;       /* requests only */
-  unsigned int http_errno : 7;
-
-  /* 1 = Upgrade header was present and the parser has exited because of that.
-   * 0 = No upgrade header present.
-   * Should be checked when http_parser_execute() returns in addition to
-   * error checking.
-   */
-  unsigned int upgrade : 1;
-
-  /** PUBLIC **/
-  void *data; /* A pointer to get hook to the "connection" or "socket" object */
-};
-
-
-struct http_parser_settings {
-  http_cb      on_message_begin;
-  http_data_cb on_url;
-  http_data_cb on_status;
-  http_data_cb on_header_field;
-  http_data_cb on_header_value;
-  http_cb      on_headers_complete;
-  http_data_cb on_body;
-  http_cb      on_message_complete;
-  /* When on_chunk_header is called, the current chunk length is stored
-   * in parser->content_length.
-   */
-  http_cb      on_chunk_header;
-  http_cb      on_chunk_complete;
-};
-
-
-enum http_parser_url_fields
-  { UF_SCHEMA           = 0
-  , UF_HOST             = 1
-  , UF_PORT             = 2
-  , UF_PATH             = 3
-  , UF_QUERY            = 4
-  , UF_FRAGMENT         = 5
-  , UF_USERINFO         = 6
-  , UF_MAX              = 7
-  };
-
-
-/* Result structure for http_parser_parse_url().
- *
- * Callers should index into field_data[] with UF_* values iff field_set
- * has the relevant (1 << UF_*) bit set. As a courtesy to clients (and
- * because we probably have padding left over), we convert any port to
- * a uint16_t.
- */
-struct http_parser_url {
-  uint16_t field_set;           /* Bitmask of (1 << UF_*) values */
-  uint16_t port;                /* Converted UF_PORT string */
-
-  struct {
-    uint16_t off;               /* Offset into buffer in which field starts */
-    uint16_t len;               /* Length of run in buffer */
-  } field_data[UF_MAX];
-};
-
-
-/* Returns the library version. Bits 16-23 contain the major version number,
- * bits 8-15 the minor version number and bits 0-7 the patch level.
- * Usage example:
- *
- *   unsigned long version = http_parser_version();
- *   unsigned major = (version >> 16) & 255;
- *   unsigned minor = (version >> 8) & 255;
- *   unsigned patch = version & 255;
- *   printf("http_parser v%u.%u.%u\n", major, minor, patch);
- */
-unsigned long http_parser_version(void);
-
-void http_parser_init(http_parser *parser, enum http_parser_type type);
-
-
-/* Initialize http_parser_settings members to 0
- */
-void http_parser_settings_init(http_parser_settings *settings);
-
-
-/* Executes the parser. Returns number of parsed bytes. Sets
- * `parser->http_errno` on error. */
-size_t http_parser_execute(http_parser *parser,
-                           const http_parser_settings *settings,
-                           const char *data,
-                           size_t len);
-
-
-/* If http_should_keep_alive() in the on_headers_complete or
- * on_message_complete callback returns 0, then this should be
- * the last message on the connection.
- * If you are the server, respond with the "Connection: close" header.
- * If you are the client, close the connection.
- */
-int http_should_keep_alive(const http_parser *parser);
-
-/* Returns a string version of the HTTP method. */
-const char *http_method_str(enum http_method m);
-
-/* Returns a string version of the HTTP status code. */
-const char *http_status_str(enum http_status s);
-
-/* Return a string name of the given error */
-const char *http_errno_name(enum http_errno err);
-
-/* Return a string description of the given error */
-const char *http_errno_description(enum http_errno err);
-
-/* Initialize all http_parser_url members to 0 */
-void http_parser_url_init(struct http_parser_url *u);
-
-/* Parse a URL; return nonzero on failure */
-int http_parser_parse_url(const char *buf, size_t buflen,
-                          int is_connect,
-                          struct http_parser_url *u);
-
-/* Pause or un-pause the parser; a nonzero value pauses */
-void http_parser_pause(http_parser *parser, int paused);
-
-/* Checks if this is the final chunk of the body. */
-int http_body_is_final(const http_parser *parser);
-
-/* Change the maximum header size provided at compile time. */
-void http_parser_set_max_header_size(uint32_t size);
-
-#ifdef __cplusplus
-}
-#endif
-#endif
--- a/src/3rdparty/hwloc/NEWS
+++ b/src/3rdparty/hwloc/NEWS
@@ -1,5 +1,5 @@
 Copyright © 2009 CNRS
-Copyright © 2009-2020 Inria.  All rights reserved.
+Copyright © 2009-2022 Inria.  All rights reserved.
 Copyright © 2009-2013 Université Bordeaux
 Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
 Copyright © 2020 Hewlett Packard Enterprise.  All rights reserved.
@@ -17,6 +17,158 @@ bug fixes (and other actions) for each version of hwloc since version
 0.9.


+Version 2.7.1
+-------------
+* Workaround crashes when virtual machines report incoherent x86 CPUID
+  information about numbers of cores and threads.
+  Thanks to Peter Bense for the report.
+* Use setenv() instead of putenv() when trying to force enable oneAPI L0
+  support, to avoid issues with applications that touch the environment,
+  thanks to Josh Hursey for the patch.
+* Add some warnings at the end of configure when GPU libraries are
+  missing on the system or their path is missing in the environment.
+
+
+Version 2.7.0
+-------------
+* Backends
+  + Add support for NUMA nodes and caches with more than 64 PUs across
+    multiple processor groups on Windows 11 and Windows Server 2022.
+  + Group objects are not created for Windows processor groups anymore,
+    except if HWLOC_WINDOWS_PROCESSOR_GROUP_OBJS=1 in the environment.
+  + Expose "Cluster" group objects on Linux kernel 5.16+ for CPUs
+    that share some internal cache or bus. This can be equivalent
+    to the L2 Cache level on some platforms (e.g. x86) or a specific
+    level between L2 and L3 on others (e.g. ARM Kungpeng 920).
+    Thanks to Jonathan Cameron for the help.
+    - HWLOC_DONT_MERGE_CLUSTER_GROUPS=1 may be set in the environment
+      to prevent these groups from being merged with identical caches, etc.
+  + Improve the oneAPI LevelZero backend:
+    - Expose subdevices such as "ze0.1" inside root OS devices ("ze0")
+      when the hardware contains multiple subdevices.
+    - Add many new attributes to describe device type, and the
+      numbers of slices, subslices, execution units and threads.
+    - Expose the memory information as LevelZeroHBM/DDR/MemorySize infos.
+  + Ignore the max frequencies of cores in Linux cpukinds when the
+    base frequencies are available (to avoid exposing hybrid CPUs
+    when Intel Turbo Boost Max 3.0 gives slightly different max
+    frequencies to CPU cores).
+    - May be reverted by setting HWLOC_CPUKINDS_MAXFREQ=1 in the environment.
+* Tools
+  + Add --grey and --palette options to switch lstopo to greyscale or
+    white-background-only graphics, or to tune individual colors.
+* Build
+  + Windows CMake builds now support non-MSVC compilers, detect several
+    features at build time, can build/run tests, etc.
+    Thanks to Michael Hirsch and Alexander Neumann .
+
+
+Version 2.6.0
+-------------
+* Backends
+  + Expose two cpukinds for energy-efficient cores (icestorm) and
+    high-performance cores (firestorm) on Apple M1 on Mac OS X.
+  + Use sysfs CPU "capacity" to rank hybrid cores by efficiency
+    on Linux when available (mostly on recent ARM platforms for now).
+  + Improve HWLOC_MEMBIND_BIND (without the STRICT flag) on Linux kernel
+    >= 5.15: If more than one node is given, the kernel may now use all
+    of them instead of only the first one before falling back to others.
+  + Expose cache os_index when available on Linux, it may be needed
+    when using resctrl to configure cache partitioning, memory bandwidth
+    monitoring, etc.
+  + Add a "XGMIHops" distances matrix in the RSMI backend for AMD GPU
+    interconnected through XGMI links.
+  + Expose AMD GPU memory information (VRAM and GTT) in the RSMI backend.
+  + Add OS devices such as "bxi0" for Atos/Bull BXI HCAs on Linux.
+* Tools
+  + lstopo has a better placement algorithm with respect to I/O
+    objects, see --children-order in the manpage for details.
+  + hwloc-annotate may now change object subtypes and cache or memory
+    sizes.
+* Build
+  + Allow to specify the ROCm installation for building the RSMI backend:
+    - Use a custom installation path if specified with --with-rocm=<dir>.
+    - Use /opt/rocm-<version> if specified with --with-rocm-version=<version>
+      or the ROCM_VERSION environment variable.
+    - Try /opt/rocm if it exists.
+    - See "How do I enable ROCm SMI and select which version to use?"
+      in the FAQ for details.
+  + Add a CMakeLists for Windows under contrib/windows-cmake/ .
+* Documentation
+  + Add FAQ entry "How do I create a custom heterogeneous and
+     asymmetric topology?"
+
+
+Version 2.5.0
+-------------
+* API
+  + Add hwloc/windows.h to query Windows processor groups.
+  + Add hwloc_get_obj_with_same_locality() to convert between objects
+    with same locality, for instance NUMA nodes and Packages,
+    or OS devices within a PCI device.
+  + Add hwloc_distances_transform() to modify distances structures.
+    - hwloc-annotate and lstopo have new distances-transform options.
+  + hwloc_distances_add() is replaced with _add_create() followed by
+    _add_values() and _add_commit(). See hwloc/distances.h for details.
+  + Add topology flags to mitigate binding modifications during
+    hwloc discovery, especially on Windows:
+    - HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING and _MEMBINDING
+      restrict discovery to PUs and NUMA nodes inside the binding.
+    - HWLOC_TOPOLOGY_FLAG_DONT_CHANGE_BINDING prevents from ever
+      changing the binding during discovery.
+* Backends
+  + Add a levelzero backend for oneAPI L0 devices, exposed as OS devices
+    of subtype "LevelZero" and name such as "ze0".
+    - Add hwloc/levelzero.h for interoperability between converting
+      between L0 API devices and hwloc cpusets or OS devices.
+  + Expose NEC Vector Engine cards on Linux as OS devices of subtype
+    "VectorEngine" and name "ve0", etc.
+    Thanks to Anara Kozhokanova, Tim Cramer and Erich Focht for the help.
+  + Add a NVLinkBandwidth distances structure between NVIDIA GPUs
+    (and POWER processor or NVSwitches) in the NVML backend,
+    and a XGMIBandwidth distances structure between AMD GPUs
+    in the RSMI backends.
+    - See "Topology Attributes: Distances, Memory Attributes and CPU Kinds"
+      in the documentation for details about these new distances.
+  + Add support for NUMA node 0 being offline in Linux, thanks to Jirka Hladky.
+* Build
+  + Add --with-cuda-version=<version> or look at the CUDA_VERSION
+    environment variable to find the appropriate CUDA pkg-config files.
+    Thanks to Stephen Herbein for the suggestion.
+    - Also add --with-cuda=<dir> to specify the CUDA installation path
+      manually (and its NVML and OpenCL components).
+      Thanks to Andrea Bocci for the suggestion.
+    - See "How do I enable CUDA and select which CUDA version to use?"
+      in the FAQ for details.
+* Tools
+  + lstopo now has a --windows-processor-groups option on Windows.
+  + hwloc-ps now has a --short-name option to avoid long/truncated
+    command path.
+  + hwloc-ps now has a --single-ancestor option to return a single
+    (possibly too large) object where a process is bound.
+  + hwloc-ps --pid-cmd may now query environment variables,
+    including MPI-specific variables to find out process ranks.
+
+
+Version 2.4.1
+-------------
+* Fix AMD OpenCL device locality when PCI bus or device number >= 128.
+  Thanks to Edgar Leon for reporting the issue.
+  + Applications using any of the following inline functions must
+    be recompiled to get the fix: hwloc_opencl_get_device_pci_busid()
+    hwloc_opencl_get_device_cpuset(), hwloc_opencl_get_device_osdev().
+* Fix the ranking of cpukinds on non-Windows systems,
+  thanks to Ivan Kochin for the report.
+* Fix the insertion of custom Groups after loading the topology,
+  thanks to Scott Hicks.
+* Add support for CPU0 being offline in Linux, thanks to Garrett Clay.
+* Fix missing x86 Package and Core objects FreeBSD/NetBSD.
+  Thanks to Thibault Payet and Yuri Victorovich for the report.
+* Fix the import of very large distances with heterogeneous object types.
+* Fix a memory leak in the Linux backend,
+  thanks to Perceval Anichini.
+
+
 Version 2.4.0
 -------------
 * API
--- a/src/3rdparty/hwloc/VERSION
+++ b/src/3rdparty/hwloc/VERSION
@@ -8,8 +8,8 @@
 # Please update HWLOC_VERSION* in contrib/windows/hwloc_config.h too.

 major=2
-minor=4
-release=0
+minor=7
+release=1

 # greek is used for alpha or beta release tags.  If it is non-empty,
 # it will be appended to the version number.  It does not have to be
@@ -22,7 +22,7 @@ greek=

 # The date when this release was created

-date="Nov 26, 2020"
+date="Mar 20, 2022"

 # If snapshot=1, then use the value from snapshot_version as the
 # entire hwloc version (i.e., ignore major, minor, release, and
@@ -41,7 +41,7 @@ snapshot_version=${major}.${minor}.${release}${greek}-git
 # 2. Version numbers are described in the Libtool current:revision:age
 # format.

-libhwloc_so_version=19:0:4
+libhwloc_so_version=20:3:5
 libnetloc_so_version=0:0:0

 # Please also update the <TargetName> lines in contrib/windows/libhwloc.vcxproj
--- a/src/3rdparty/hwloc/include/hwloc.h
+++ b/src/3rdparty/hwloc/include/hwloc.h
@@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009-2012 Université Bordeaux
 * Copyright © 2009-2020 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@@ -29,7 +29,7 @@
 * THAT IS IN THE PDF/HTML THAT IS ***NOT*** IN hwloc.h!
 *
 * There are entire paragraph-length descriptions, discussions, and
- * pretty prictures to explain subtle corner cases, provide concrete
+ * pretty pictures to explain subtle corner cases, provide concrete
 * examples, etc.
 *
 * Please, go read the documentation.  :-)
@@ -93,7 +93,7 @@ extern "C" {
 * Two stable releases of the same series usually have the same ::HWLOC_API_VERSION
 * even if their HWLOC_VERSION are different.
 */
-#define HWLOC_API_VERSION 0x00020400
+#define HWLOC_API_VERSION 0x00020500

 /** \brief Indicate at runtime which hwloc API version was used at build time.
 *
@@ -346,7 +346,8 @@ typedef enum hwloc_obj_osdev_type_e {
 				  * For instance the "eth0" interface on Linux. */
  HWLOC_OBJ_OSDEV_OPENFABRICS,	/**< \brief Operating system openfabrics device.
 				  * For instance the "mlx4_0" InfiniBand HCA,
-				  * or "hfi1_0" Omni-Path interface on Linux. */
+				  * "hfi1_0" Omni-Path interface,
+				  * or "bxi0" Atos/Bull BXI HCA on Linux. */
  HWLOC_OBJ_OSDEV_DMA,		/**< \brief Operating system dma engine device.
 				  * For instance the "dma0chan0" DMA channel on Linux. */
  HWLOC_OBJ_OSDEV_COPROC	/**< \brief Operating system co-processor device.
@@ -516,7 +517,7 @@ struct hwloc_obj {
                                          * objects).
                                          *
                                          * If the ::HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED configuration flag is set,
-                                          * some of these CPUs may not be allowed for binding,
+                                          * some of these CPUs may be online but not allowed for binding,
                                          * see hwloc_topology_get_allowed_cpuset().
                                          *
 					  * \note All objects have non-NULL CPU and node sets except Misc and I/O objects.
@@ -548,7 +549,7 @@ struct hwloc_obj {
                                          * nodes more precisely.
                                          *
                                          * If the ::HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED configuration flag is set,
-                                          * some of these nodes may not be allowed for allocation,
+                                          * some of these nodes may be online but not allowed for allocation,
                                          * see hwloc_topology_get_allowed_nodeset().
                                          *
                                          * If there are no NUMA nodes in the machine, all the memory is close to this
@@ -641,7 +642,7 @@ union hwloc_obj_attr_u {
    unsigned char revision;
    float linkspeed; /* in GB/s */
  } pcidev;
-  /** \brief Bridge specific Object Attribues */
+  /** \brief Bridge specific Object Attributes */
  struct hwloc_bridge_attr_s {
    union {
      struct hwloc_pcidev_attr_s pci;
@@ -1088,7 +1089,7 @@ HWLOC_DECLSPEC int hwloc_obj_add_info(hwloc_obj_t obj, const char *name, const c
 *
 * Some operating systems only support binding threads or processes to a single PU.
 * Others allow binding to larger sets such as entire Cores or Packages or
- * even random sets of invididual PUs. In such operating system, the scheduler
+ * even random sets of individual PUs. In such operating system, the scheduler
 * is free to run the task on one of these PU, then migrate it to another PU, etc.
 * It is often useful to call hwloc_bitmap_singlify() on the target CPU set before
 * passing it to the binding function to avoid these expensive migrations.
@@ -1166,7 +1167,7 @@ typedef enum {
   * CPUs are idle, operating systems may execute the thread/process
   * on those other CPUs instead of the designated CPUs, to let them
   * progress anyway.  Strict binding means that the thread/process
-   * will _never_ execute on other cpus than the designated CPUs, even
+   * will _never_ execute on other CPUs than the designated CPUs, even
   * when those are busy with other tasks and other CPUs are idle.
   *
   * \note Depending on the operating system, strict binding may not
@@ -1203,7 +1204,7 @@ typedef enum {
  HWLOC_CPUBIND_NOMEMBIND = (1<<3)
 } hwloc_cpubind_flags_t;

-/** \brief Bind current process or thread on cpus given in physical bitmap \p set.
+/** \brief Bind current process or thread on CPUs given in physical bitmap \p set.
 *
 * \return -1 with errno set to ENOSYS if the action is not supported
 * \return -1 with errno set to EXDEV if the binding cannot be enforced
@@ -1212,12 +1213,13 @@ HWLOC_DECLSPEC int hwloc_set_cpubind(hwloc_topology_t topology, hwloc_const_cpus

 /** \brief Get current process or thread binding.
 *
- * Writes into \p set the physical cpuset which the process or thread (according to \e
- * flags) was last bound to.
+ * The CPU-set \p set (previously allocated by the caller)
+ * is filled with the list of PUs which the process or
+ * thread (according to \e flags) was last bound to.
 */
 HWLOC_DECLSPEC int hwloc_get_cpubind(hwloc_topology_t topology, hwloc_cpuset_t set, int flags);

-/** \brief Bind a process \p pid on cpus given in physical bitmap \p set.
+/** \brief Bind a process \p pid on CPUs given in physical bitmap \p set.
 *
 * \note \p hwloc_pid_t is \p pid_t on Unix platforms,
 * and \p HANDLE on native Windows platforms.
@@ -1231,6 +1233,10 @@ HWLOC_DECLSPEC int hwloc_get_cpubind(hwloc_topology_t topology, hwloc_cpuset_t s
 HWLOC_DECLSPEC int hwloc_set_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_cpuset_t set, int flags);

 /** \brief Get the current physical binding of process \p pid.
+ *
+ * The CPU-set \p set (previously allocated by the caller)
+ * is filled with the list of PUs which the process
+ * was last bound to.
 *
 * \note \p hwloc_pid_t is \p pid_t on Unix platforms,
 * and \p HANDLE on native Windows platforms.
@@ -1244,7 +1250,7 @@ HWLOC_DECLSPEC int hwloc_set_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t
 HWLOC_DECLSPEC int hwloc_get_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_cpuset_t set, int flags);

 #ifdef hwloc_thread_t
-/** \brief Bind a thread \p thread on cpus given in physical bitmap \p set.
+/** \brief Bind a thread \p thread on CPUs given in physical bitmap \p set.
 *
 * \note \p hwloc_thread_t is \p pthread_t on Unix platforms,
 * and \p HANDLE on native Windows platforms.
@@ -1256,6 +1262,10 @@ HWLOC_DECLSPEC int hwloc_set_thread_cpubind(hwloc_topology_t topology, hwloc_thr

 #ifdef hwloc_thread_t
 /** \brief Get the current physical binding of thread \p tid.
+ *
+ * The CPU-set \p set (previously allocated by the caller)
+ * is filled with the list of PUs which the thread
+ * was last bound to.
 *
 * \note \p hwloc_thread_t is \p pthread_t on Unix platforms,
 * and \p HANDLE on native Windows platforms.
@@ -1266,6 +1276,10 @@ HWLOC_DECLSPEC int hwloc_get_thread_cpubind(hwloc_topology_t topology, hwloc_thr
 #endif

 /** \brief Get the last physical CPU where the current process or thread ran.
+ *
+ * The CPU-set \p set (previously allocated by the caller)
+ * is filled with the list of PUs which the process or
+ * thread (according to \e flags) last ran on.
 *
 * The operating system may move some tasks from one processor
 * to another at any time according to their binding,
@@ -1281,6 +1295,10 @@ HWLOC_DECLSPEC int hwloc_get_thread_cpubind(hwloc_topology_t topology, hwloc_thr
 HWLOC_DECLSPEC int hwloc_get_last_cpu_location(hwloc_topology_t topology, hwloc_cpuset_t set, int flags);

 /** \brief Get the last physical CPU where a process ran.
+ *
+ * The CPU-set \p set (previously allocated by the caller)
+ * is filled with the list of PUs which the process
+ * last ran on.
 *
 * The operating system may move some tasks from one processor
 * to another at any time according to their binding,
@@ -1511,6 +1529,9 @@ HWLOC_DECLSPEC int hwloc_set_membind(hwloc_topology_t topology, hwloc_const_bitm
 /** \brief Query the default memory binding policy and physical locality of the
 * current process or thread.
 *
+ * The bitmap \p set (previously allocated by the caller)
+ * is filled with the process or thread memory binding.
+ *
 * This function has two output parameters: \p set and \p policy.
 * The values returned in these parameters depend on both the \p flags
 * passed in and the current memory binding policies and nodesets in
@@ -1571,6 +1592,9 @@ HWLOC_DECLSPEC int hwloc_set_proc_membind(hwloc_topology_t topology, hwloc_pid_t
 /** \brief Query the default memory binding policy and physical locality of the
 * specified process.
 *
+ * The bitmap \p set (previously allocated by the caller)
+ * is filled with the process memory binding.
+ *
 * This function has two output parameters: \p set and \p policy.
 * The values returned in these parameters depend on both the \p flags
 * passed in and the current memory binding policies and nodesets in
@@ -1624,6 +1648,9 @@ HWLOC_DECLSPEC int hwloc_set_area_membind(hwloc_topology_t topology, const void
 /** \brief Query the CPUs near the physical NUMA node(s) and binding policy of
 * the memory identified by (\p addr, \p len ).
 *
+ * The bitmap \p set (previously allocated by the caller)
+ * is filled with the memory area binding.
+ *
 * This function has two output parameters: \p set and \p policy.
 * The values returned in these parameters depend on both the \p flags
 * passed in and the memory binding policies and nodesets of the pages
@@ -1652,7 +1679,8 @@ HWLOC_DECLSPEC int hwloc_get_area_membind(hwloc_topology_t topology, const void

 /** \brief Get the NUMA nodes where memory identified by (\p addr, \p len ) is physically allocated.
 *
- * Fills \p set according to the NUMA nodes where the memory area pages
+ * The bitmap \p set (previously allocated by the caller)
+ * is filled according to the NUMA nodes where the memory area pages
 * are physically allocated. If no page is actually allocated yet,
 * \p set may be empty.
 *
@@ -1698,9 +1726,12 @@ HWLOC_DECLSPEC void *hwloc_alloc_membind(hwloc_topology_t topology, size_t len,

 /** \brief Allocate some memory on NUMA memory nodes specified by \p set
 *
- * This is similar to hwloc_alloc_membind_nodeset() except that it is allowed to change
- * the current memory binding policy, thus providing more binding support, at
- * the expense of changing the current state.
+ * First, try to allocate properly with hwloc_alloc_membind().
+ * On failure, the current process or thread memory binding policy
+ * is changed with hwloc_set_membind() before allocating memory.
+ * Thus this function works in more cases, at the expense of changing
+ * the current state (possibly affecting future allocations that
+ * would not specify any policy).
 *
 * If ::HWLOC_MEMBIND_BYNODESET is specified, set is considered a nodeset.
 * Otherwise it's a cpuset.
@@ -1883,8 +1914,9 @@ HWLOC_DECLSPEC int hwloc_topology_set_components(hwloc_topology_t __hwloc_restri
 enum hwloc_topology_flags_e {
 /** \brief Detect the whole system, ignore reservations, include disallowed objects.
   *
-   * Gather all resources, even if some were disabled by the administrator.
+   * Gather all online resources, even if some were disabled by the administrator.
   * For instance, ignore Linux Cgroup/Cpusets and gather all processors and memory nodes.
+   * However offline PUs and NUMA nodes are still ignored.
   *
   * When this flag is not set, PUs and NUMA nodes that are disallowed are not added to the topology.
   * Parent objects (package, core, cache, etc.) are added only if some of their children are allowed.
@@ -1966,17 +1998,81 @@ enum hwloc_topology_flags_e {
   * hwloc and machine support.
   *
   */
-  HWLOC_TOPOLOGY_FLAG_IMPORT_SUPPORT = (1UL<<3)
+  HWLOC_TOPOLOGY_FLAG_IMPORT_SUPPORT = (1UL<<3),
+
+  /** \brief Do not consider resources outside of the process CPU binding.
+   *
+   * If the binding of the process is limited to a subset of cores,
+   * ignore the other cores during discovery.
+   *
+   * The resulting topology is identical to what a call to hwloc_topology_restrict()
+   * would generate, but this flag also prevents hwloc from ever touching other
+   * resources during the discovery.
+   *
+   * This flag especially tells the x86 backend to never temporarily
+   * rebind a thread on any excluded core. This is useful on Windows
+   * because such temporary rebinding can change the process binding.
+   * Another use-case is to avoid cores that would not be able to
+   * perform the hwloc discovery anytime soon because they are busy
+   * executing some high-priority real-time tasks.
+   *
+   * If process CPU binding is not supported,
+   * the thread CPU binding is considered instead if supported,
+   * or the flag is ignored.
+   *
+   * This flag requires ::HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM as well
+   * since binding support is required.
+   */
+  HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING = (1UL<<4),
+
+  /** \brief Do not consider resources outside of the process memory binding.
+   *
+   * If the binding of the process is limited to a subset of NUMA nodes,
+   * ignore the other NUMA nodes during discovery.
+   *
+   * The resulting topology is identical to what a call to hwloc_topology_restrict()
+   * would generate, but this flag also prevents hwloc from ever touching other
+   * resources during the discovery.
+   *
+   * This flag is meant to be used together with
+   * ::HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING when both cores
+   * and NUMA nodes should be ignored outside of the process binding.
+   *
+   * If process memory binding is not supported,
+   * the thread memory binding is considered instead if supported,
+   * or the flag is ignored.
+   *
+   * This flag requires ::HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM as well
+   * since binding support is required.
+   */
+  HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_MEMBINDING = (1UL<<5),
+
+  /** \brief Do not ever modify the process or thread binding during discovery.
+   *
+   * This flag disables all hwloc discovery steps that require a change of
+   * the process or thread binding. This currently only affects the x86
+   * backend which gets entirely disabled.
+   *
+   * This is useful when hwloc_topology_load() is called while the
+   * application also creates additional threads or modifies the binding.
+   *
+   * This flag is also a strict way to make sure the process binding will
+   * not change to due thread binding changes on Windows
+   * (see ::HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING).
+   */
+  HWLOC_TOPOLOGY_FLAG_DONT_CHANGE_BINDING = (1UL<<6)
 };

 /** \brief Set OR'ed flags to non-yet-loaded topology.
 *
 * Set a OR'ed set of ::hwloc_topology_flags_e onto a topology that was not yet loaded.
 *
- * If this function is called multiple times, the last invokation will erase
+ * If this function is called multiple times, the last invocation will erase
 * and replace the set of flags that was previously set.
 *
- * The flags set in a topology may be retrieved with hwloc_topology_get_flags()
+ * By default, no flags are set (\c 0).
+ *
+ * The flags set in a topology may be retrieved with hwloc_topology_get_flags().
 */
 HWLOC_DECLSPEC int hwloc_topology_set_flags (hwloc_topology_t topology, unsigned long flags);

@@ -1984,6 +2080,9 @@ HWLOC_DECLSPEC int hwloc_topology_set_flags (hwloc_topology_t topology, unsigned
 *
 * Get the OR'ed set of ::hwloc_topology_flags_e of a topology.
 *
+ * If hwloc_topology_set_flags() was not called earlier,
+ * no flags are set (\c 0 is returned).
+ *
 * \return the flags previously set with hwloc_topology_set_flags().
 */
 HWLOC_DECLSPEC unsigned long hwloc_topology_get_flags (hwloc_topology_t topology);
@@ -2362,22 +2461,9 @@ HWLOC_DECLSPEC hwloc_obj_t hwloc_topology_insert_misc_object(hwloc_topology_t to
 /** \brief Allocate a Group object to insert later with hwloc_topology_insert_group_object().
 *
 * This function returns a new Group object.
- * The caller should (at least) initialize its sets before inserting the object.
- * See hwloc_topology_insert_group_object().
 *
- * The \p subtype object attribute may be set to display something else
- * than "Group" as the type name for this object in lstopo.
- * Custom name/value info pairs may be added with hwloc_obj_add_info() after
- * insertion.
- *
- * The \p kind group attribute should be 0. The \p subkind group attribute may
- * be set to identify multiple Groups of the same level.
- *
- * It is recommended not to set any other object attribute before insertion,
- * since the Group may get discarded during insertion.
- *
- * The object will be destroyed if passed to hwloc_topology_insert_group_object()
- * without any set defined.
+ * The caller should (at least) initialize its sets before inserting
+ * the object in the topology. See hwloc_topology_insert_group_object().
 */
 HWLOC_DECLSPEC hwloc_obj_t hwloc_topology_alloc_group_object(hwloc_topology_t topology);

@@ -2388,34 +2474,44 @@ HWLOC_DECLSPEC hwloc_obj_t hwloc_topology_alloc_group_object(hwloc_topology_t to
 * the final location of the Group in the topology.
 * Then the object can be passed to this function for actual insertion in the topology.
 *
- * The group \p dont_merge attribute may be set to prevent the core from
- * ever merging this object with another object hierarchically-identical.
- *
 * Either the cpuset or nodeset field (or both, if compatible) must be set
 * to a non-empty bitmap. The complete_cpuset or complete_nodeset may be set
 * instead if inserting with respect to the complete topology
 * (including disallowed, offline or unknown objects).
- *
- * It grouping several objects, hwloc_obj_add_other_obj_sets() is an easy way
+ * If grouping several objects, hwloc_obj_add_other_obj_sets() is an easy way
 * to build the Group sets iteratively.
- *
 * These sets cannot be larger than the current topology, or they would get
 * restricted silently.
- *
 * The core will setup the other sets after actual insertion.
 *
+ * The \p subtype object attribute may be defined (to a dynamically
+ * allocated string) to display something else than "Group" as the
+ * type name for this object in lstopo.
+ * Custom name/value info pairs may be added with hwloc_obj_add_info() after
+ * insertion.
+ *
+ * The group \p dont_merge attribute may be set to \c 1 to prevent
+ * the hwloc core from ever merging this object with another
+ * hierarchically-identical object.
+ * This is useful when the Group itself describes an important feature
+ * that cannot be exposed anywhere else in the hierarchy.
+ *
+ * The group \p kind attribute may be set to a high value such
+ * as \c 0xffffffff to tell hwloc that this new Group should always
+ * be discarded in favor of any existing Group with the same locality.
+ *
 * \return The inserted object if it was properly inserted.
 *
- * \return An existing object if the Group was discarded because the topology already
- * contained an object at the same location (the Group did not add any locality information).
- * Any name/info key pair set before inserting is appended to the existing object.
+ * \return An existing object if the Group was merged or discarded
+ * because the topology already contained an object at the same
+ * location (the Group did not add any hierarchy information).
 *
 * \return \c NULL if the insertion failed because of conflicting sets in topology tree.
 *
 * \return \c NULL if Group objects are filtered-out of the topology (::HWLOC_TYPE_FILTER_KEEP_NONE).
 *
- * \return \c NULL if the object was discarded because no set was initialized in the Group
- * before insert, or all of them were empty.
+ * \return \c NULL if the object was discarded because no set was
+ * initialized in the Group before insert, or all of them were empty.
 */
 HWLOC_DECLSPEC hwloc_obj_t hwloc_topology_insert_group_object(hwloc_topology_t topology, hwloc_obj_t group);

--- a/src/3rdparty/hwloc/include/hwloc/autogen/config.h
+++ b/src/3rdparty/hwloc/include/hwloc/autogen/config.h
@@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009-2012 Université Bordeaux
 * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@@ -11,10 +11,10 @@
 #ifndef HWLOC_CONFIG_H
 #define HWLOC_CONFIG_H

-#define HWLOC_VERSION "2.4.0"
+#define HWLOC_VERSION "2.7.1"
 #define HWLOC_VERSION_MAJOR 2
-#define HWLOC_VERSION_MINOR 4
-#define HWLOC_VERSION_RELEASE 0
+#define HWLOC_VERSION_MINOR 7
+#define HWLOC_VERSION_RELEASE 1
 #define HWLOC_VERSION_GREEK ""

 #define __hwloc_restrict
--- a/src/3rdparty/hwloc/include/hwloc/cpukinds.h
+++ b/src/3rdparty/hwloc/include/hwloc/cpukinds.h
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2020 Inria.  All rights reserved.
+ * Copyright © 2020-2021 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@@ -42,18 +42,23 @@ extern "C" {
 * (for instance the "CoreType" and "FrequencyMaxMHz",
 *  see \ref topoattrs_cpukinds).
 *
- * A higher efficiency value means intrinsic greater performance
+ * A higher efficiency value means greater intrinsic performance
 * (and possibly less performance/power efficiency).
- * Kinds with lower efficiency are ranked first:
+ * Kinds with lower efficiency values are ranked first:
 * Passing 0 as \p kind_index to hwloc_cpukinds_get_info() will
- * return information about the less efficient CPU kind.
+ * return information about the CPU kind with lower performance
+ * but higher energy-efficiency.
+ * Higher \p kind_index values would rather return information
+ * about power-hungry high-performance cores.
 *
- * When available, efficiency values are gathered from the operating
- * system (when \p cpukind_efficiency is set in the
- * struct hwloc_topology_discovery_support array, only on Windows 10 for now).
- * Otherwise hwloc tries to compute efficiencies
- * by comparing CPU kinds using frequencies (on ARM),
- * or core types and frequencies (on other architectures).
+ * When available, efficiency values are gathered from the operating system.
+ * If so, \p cpukind_efficiency is set in the struct hwloc_topology_discovery_support array.
+ * This is currently available on Windows 10, Mac OS X (Darwin),
+ * and on some Linux platforms where core "capacity" is exposed in sysfs.
+ *
+ * If the operating system does not expose core efficiencies natively,
+ * hwloc tries to compute efficiencies by comparing CPU kinds using
+ * frequencies (on ARM), or core types and frequencies (on other architectures).
 * The environment variable HWLOC_CPUKINDS_RANKING may be used
 * to change this heuristics, see \ref envvar.
 *
--- a/src/3rdparty/hwloc/include/hwloc/cuda.h
+++ b/src/3rdparty/hwloc/include/hwloc/cuda.h
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2010-2020 Inria.  All rights reserved.
+ * Copyright © 2010-2021 Inria.  All rights reserved.
 * Copyright © 2010-2011 Université Bordeaux
 * Copyright © 2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@@ -75,7 +75,7 @@ hwloc_cuda_get_device_pci_ids(hwloc_topology_t topology __hwloc_attribute_unused
 /** \brief Get the CPU set of processors that are physically
 * close to device \p cudevice.
 *
- * Return the CPU set describing the locality of the CUDA device \p cudevice.
+ * Store in \p set the CPU-set describing the locality of the CUDA device \p cudevice.
 *
 * Topology \p topology and device \p cudevice must match the local machine.
 * I/O devices detection and the CUDA component are not needed in the topology.
@@ -120,8 +120,8 @@ hwloc_cuda_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,
 /** \brief Get the hwloc PCI device object corresponding to the
 * CUDA device \p cudevice.
 *
- * Return the PCI device object describing the CUDA device \p cudevice.
- * Return NULL if there is none.
+ * \return The hwloc PCI device object describing the CUDA device \p cudevice.
+ * \return \c NULL if none could be found.
 *
 * Topology \p topology and device \p cudevice must match the local machine.
 * I/O devices detection must be enabled in topology \p topology.
@@ -140,8 +140,8 @@ hwloc_cuda_get_device_pcidev(hwloc_topology_t topology, CUdevice cudevice)

 /** \brief Get the hwloc OS device object corresponding to CUDA device \p cudevice.
 *
- * Return the hwloc OS device object that describes the given
- * CUDA device \p cudevice. Return NULL if there is none.
+ * \return The hwloc OS device object that describes the given CUDA device \p cudevice.
+ * \return \c NULL if none could be found.
 *
 * Topology \p topology and device \p cudevice must match the local machine.
 * I/O devices detection and the CUDA component must be enabled in the topology.
@@ -183,8 +183,8 @@ hwloc_cuda_get_device_osdev(hwloc_topology_t topology, CUdevice cudevice)
 /** \brief Get the hwloc OS device object corresponding to the
 * CUDA device whose index is \p idx.
 *
- * Return the OS device object describing the CUDA device whose
- * index is \p idx. Return NULL if there is none.
+ * \return The hwloc OS device object describing the CUDA device whose index is \p idx.
+ * \return \c NULL if none could be found.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
--- a/src/3rdparty/hwloc/include/hwloc/cudart.h
+++ b/src/3rdparty/hwloc/include/hwloc/cudart.h
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2010-2020 Inria.  All rights reserved.
+ * Copyright © 2010-2021 Inria.  All rights reserved.
 * Copyright © 2010-2011 Université Bordeaux
 * Copyright © 2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@@ -72,7 +72,7 @@ hwloc_cudart_get_device_pci_ids(hwloc_topology_t topology __hwloc_attribute_unus
 /** \brief Get the CPU set of processors that are physically
 * close to device \p idx.
 *
- * Return the CPU set describing the locality of the CUDA device
+ * Store in \p set the CPU-set describing the locality of the CUDA device
 * whose index is \p idx.
 *
 * Topology \p topology and device \p idx must match the local machine.
@@ -117,8 +117,8 @@ hwloc_cudart_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unuse
 /** \brief Get the hwloc PCI device object corresponding to the
 * CUDA device whose index is \p idx.
 *
- * Return the PCI device object describing the CUDA device whose
- * index is \p idx. Return NULL if there is none.
+ * \return The hwloc PCI device object describing the CUDA device whose index is \p idx.
+ * \return \c NULL if none could be found.
 *
 * Topology \p topology and device \p idx must match the local machine.
 * I/O devices detection must be enabled in topology \p topology.
@@ -138,8 +138,8 @@ hwloc_cudart_get_device_pcidev(hwloc_topology_t topology, int idx)
 /** \brief Get the hwloc OS device object corresponding to the
 * CUDA device whose index is \p idx.
 *
- * Return the OS device object describing the CUDA device whose
- * index is \p idx. Return NULL if there is none.
+ * \return The hwloc OS device object describing the CUDA device whose index is \p idx.
+ * \return \c NULL if none could be found.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
--- a/src/3rdparty/hwloc/include/hwloc/deprecated.h
+++ b/src/3rdparty/hwloc/include/hwloc/deprecated.h
@@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2018 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009-2012 Université Bordeaux
 * Copyright © 2009-2010 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@@ -30,6 +30,15 @@ extern "C" {
 /* backward compat with v1.10 before Node->NUMANode clarification */
 #define HWLOC_OBJ_NODE HWLOC_OBJ_NUMANODE

+/** \brief Add a distances structure.
+ *
+ * Superseded by hwloc_distances_add_create()+hwloc_distances_add_values()+hwloc_distances_add_commit()
+ * in v2.5.
+ */
+HWLOC_DECLSPEC int hwloc_distances_add(hwloc_topology_t topology,
+				       unsigned nbobjs, hwloc_obj_t *objs, hwloc_uint64_t *values,
+				       unsigned long kind, unsigned long flags) __hwloc_attribute_deprecated;
+
 /** \brief Insert a misc object by parent.
 *
 * Identical to hwloc_topology_insert_misc_object().
--- a/src/3rdparty/hwloc/include/hwloc/distances.h
+++ b/src/3rdparty/hwloc/include/hwloc/distances.h
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2010-2020 Inria.  All rights reserved.
+ * Copyright © 2010-2021 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@@ -35,9 +35,20 @@ extern "C" {
 * from a core in another node.
 * The corresponding kind is ::HWLOC_DISTANCES_KIND_FROM_OS | ::HWLOC_DISTANCES_KIND_FROM_USER.
 * The name of this distances structure is "NUMALatency".
+ * Others distance structures include and "XGMIBandwidth", "XGMIHops"
+ * and "NVLinkBandwidth".
 *
 * The matrix may also contain bandwidths between random sets of objects,
 * possibly provided by the user, as specified in the \p kind attribute.
+ *
+ * Pointers \p objs and \p values should not be replaced, reallocated, freed, etc.
+ * However callers are allowed to modify \p kind as well as the contents
+ * of \p objs and \p values arrays.
+ * For instance, if there is a single NUMA node per Package,
+ * hwloc_get_obj_with_same_locality() may be used to convert between them
+ * and replace NUMA nodes in the \p objs array with the corresponding Packages.
+ * See also hwloc_distances_transform() for applying some transformations
+ * to the structure.
 */
 struct hwloc_distances_s {
  unsigned nbobjs;		/**< \brief Number of objects described by the distance matrix. */
@@ -91,6 +102,8 @@ enum hwloc_distances_kind_e {
  HWLOC_DISTANCES_KIND_MEANS_BANDWIDTH = (1UL<<3),

  /** \brief This distances structure covers objects of different types.
+   * This may apply to the "NVLinkBandwidth" structure in presence
+   * of a NVSwitch or POWER processor NVLink port.
   * \hideinitializer
   */
  HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES = (1UL<<4)
@@ -147,6 +160,7 @@ hwloc_distances_get_by_type(hwloc_topology_t topology, hwloc_obj_type_t type,
 * Usually only one distances structure may match a given name.
 *
 * The name of the most common structure is "NUMALatency".
+ * Others include "XGMIBandwidth", "XGMIHops" and "NVLinkBandwidth".
 */
 HWLOC_DECLSPEC int
 hwloc_distances_get_by_name(hwloc_topology_t topology, const char *name,
@@ -168,6 +182,85 @@ hwloc_distances_get_name(hwloc_topology_t topology, struct hwloc_distances_s *di
 HWLOC_DECLSPEC void
 hwloc_distances_release(hwloc_topology_t topology, struct hwloc_distances_s *distances);

+/** \brief Transformations of distances structures. */
+enum hwloc_distances_transform_e {
+  /** \brief Remove \c NULL objects from the distances structure.
+   *
+   * Every object that was replaced with \c NULL in the \p objs array
+   * is removed and the \p values array is updated accordingly.
+   *
+   * At least \c 2 objects must remain, otherwise hwloc_distances_transform()
+   * will return \c -1 with \p errno set to \c EINVAL.
+   *
+   * \p kind will be updated with or without ::HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES
+   * according to the remaining objects.
+   *
+   * \hideinitializer
+   */
+  HWLOC_DISTANCES_TRANSFORM_REMOVE_NULL = 0,
+
+  /** \brief Replace bandwidth values with a number of links.
+   *
+   * Usually all values will be either \c 0 (no link) or \c 1 (one link).
+   * However some matrices could get larger values if some pairs of
+   * peers are connected by different numbers of links.
+   *
+   * Values on the diagonal are set to \c 0.
+   *
+   * This transformation only applies to bandwidth matrices.
+   *
+   * \hideinitializer
+   */
+  HWLOC_DISTANCES_TRANSFORM_LINKS = 1,
+
+  /** \brief Merge switches with multiple ports into a single object.
+   * This currently only applies to NVSwitches where GPUs seem connected to different
+   * separate switch ports in the NVLinkBandwidth matrix. This transformation will
+   * replace all of them with the same port connected to all GPUs.
+   * Other ports are removed by applying ::HWLOC_DISTANCES_TRANSFORM_REMOVE_NULL internally.
+   * \hideinitializer
+   */
+  HWLOC_DISTANCES_TRANSFORM_MERGE_SWITCH_PORTS = 2,
+
+  /** \brief Apply a transitive closure to the matrix to connect objects across switches.
+   * This currently only applies to GPUs and NVSwitches in the NVLinkBandwidth matrix.
+   * All pairs of GPUs will be reported as directly connected.
+   * \hideinitializer
+   */
+  HWLOC_DISTANCES_TRANSFORM_TRANSITIVE_CLOSURE = 3
+};
+
+/** \brief Apply a transformation to a distances structure.
+ *
+ * Modify a distances structure that was previously obtained with
+ * hwloc_distances_get() or one of its variants.
+ *
+ * This modifies the local copy of the distances structures but does
+ * not modify the distances information stored inside the topology
+ * (retrieved by another call to hwloc_distances_get() or exported to XML).
+ * To do so, one should add a new distances structure with same
+ * name, kind, objects and values (see \ref hwlocality_distances_add)
+ * and then remove this old one with hwloc_distances_release_remove().
+ *
+ * \p transform must be one of the transformations listed
+ * in ::hwloc_distances_transform_e.
+ *
+ * These transformations may modify the contents of the \p objs or \p values arrays.
+ *
+ * \p transform_attr must be \c NULL for now.
+ *
+ * \p flags must be \c 0 for now.
+ *
+ * \note Objects in distances array \p objs may be directly modified
+ * in place without using hwloc_distances_transform().
+ * One may use hwloc_get_obj_with_same_locality() to easily convert
+ * between similar objects of different types.
+ */
+HWLOC_DECLSPEC int hwloc_distances_transform(hwloc_topology_t topology, struct hwloc_distances_s *distances,
+                                             enum hwloc_distances_transform_e transform,
+                                             void *transform_attr,
+                                             unsigned long flags);
+
 /** @} */


@@ -215,13 +308,84 @@ hwloc_distances_obj_pair_values(struct hwloc_distances_s *distances,



-/** \defgroup hwlocality_distances_add Add or remove distances between objects
+/** \defgroup hwlocality_distances_add Add distances between objects
+ *
+ * The usual way to add distances is:
+ * \code
+ * hwloc_distances_add_handle_t handle;
+ * int err = -1;
+ * handle = hwloc_distances_add_create(topology, "name", kind, 0);
+ * if (handle) {
+ *   err = hwloc_distances_add_values(topology, handle, nbobjs, objs, values, 0);
+ *   if (!err)
+ *     err = hwloc_distances_add_commit(topology, handle, flags);
+ * }
+ * \endcode
+ * If \p err is \c 0 at the end, then addition was successful.
+ *
 * @{
 */

+/** \brief Handle to a new distances structure during its addition to the topology. */
+typedef void * hwloc_distances_add_handle_t;
+
+/** \brief Create a new empty distances structure.
+ *
+ * Create an empty distances structure
+ * to be filled with hwloc_distances_add_values()
+ * and then committed with hwloc_distances_add_commit().
+ *
+ * Parameter \p name is optional, it may be \c NULL.
+ * Otherwise, it will be copied internally and may later be freed by the caller.
+ *
+ * \p kind specifies the kind of distance as a OR'ed set of ::hwloc_distances_kind_e.
+ * Kind ::HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES will be automatically set
+ * according to objects having different types in hwloc_distances_add_values().
+ *
+ * \p flags must be \c 0 for now.
+ *
+ * \return A hwloc_distances_add_handle_t that should then be passed
+ * to hwloc_distances_add_values() and hwloc_distances_add_commit().
+ *
+ * \return \c NULL on error.
+ */
+HWLOC_DECLSPEC hwloc_distances_add_handle_t
+hwloc_distances_add_create(hwloc_topology_t topology,
+                           const char *name, unsigned long kind,
+                           unsigned long flags);
+
+/** \brief Specify the objects and values in a new empty distances structure.
+ *
+ * Specify the objects and values for a new distances structure
+ * that was returned as a handle by hwloc_distances_add_create().
+ * The structure must then be committed with hwloc_distances_add_commit().
+ *
+ * The number of objects is \p nbobjs and the array of objects is \p objs.
+ * Distance values are stored as a one-dimension array in \p values.
+ * The distance from object i to object j is in slot i*nbobjs+j.
+ *
+ * \p nbobjs must be at least 2.
+ *
+ * Arrays \p objs and \p values will be copied internally,
+ * they may later be freed by the caller.
+ *
+ * On error, the temporary distances structure and its content are destroyed.
+ *
+ * \p flags must be \c 0 for now.
+ *
+ * \return \c 0 on success.
+ * \return \c -1 on error.
+ */
+HWLOC_DECLSPEC int hwloc_distances_add_values(hwloc_topology_t topology,
+                                              hwloc_distances_add_handle_t handle,
+                                              unsigned nbobjs, hwloc_obj_t *objs,
+                                              hwloc_uint64_t *values,
+                                              unsigned long flags);
+
 /** \brief Flags for adding a new distances to a topology. */
 enum hwloc_distances_add_flag_e {
  /** \brief Try to group objects based on the newly provided distance information.
+   * This is ignored for distances between objects of different types.
   * \hideinitializer
   */
  HWLOC_DISTANCES_ADD_FLAG_GROUP = (1UL<<0),
@@ -233,23 +397,33 @@ enum hwloc_distances_add_flag_e {
  HWLOC_DISTANCES_ADD_FLAG_GROUP_INACCURATE = (1UL<<1)
 };

-/** \brief Provide a new distance matrix.
+/** \brief Commit a new distances structure.
 *
- * Provide the matrix of distances between a set of objects given by \p nbobjs
- * and the \p objs array. \p nbobjs must be at least 2.
- * The distances are stored as a one-dimension array in \p values.
- * The distance from object i to object j is in slot i*nbobjs+j.
+ * This function finalizes the distances structure and inserts in it the topology.
 *
- * \p kind specifies the kind of distance as a OR'ed set of ::hwloc_distances_kind_e.
- * Kind ::HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES will be automatically added
- * if objects of different types are given.
+ * Parameter \p handle was previously returned by hwloc_distances_add_create().
+ * Then objects and values were specified with hwloc_distances_add_values().
 *
 * \p flags configures the behavior of the function using an optional OR'ed set of
 * ::hwloc_distances_add_flag_e.
+ * It may be used to request the grouping of existing objects based on distances.
+ *
+ * On error, the temporary distances structure and its content are destroyed.
+ *
+ * \return \c 0 on success.
+ * \return \c -1 on error.
+ */
+HWLOC_DECLSPEC int hwloc_distances_add_commit(hwloc_topology_t topology,
+                                              hwloc_distances_add_handle_t handle,
+                                              unsigned long flags);
+
+/** @} */
+
+
+
+/** \defgroup hwlocality_distances_remove Remove distances between objects
+ * @{
 */
-HWLOC_DECLSPEC int hwloc_distances_add(hwloc_topology_t topology,
-				       unsigned nbobjs, hwloc_obj_t *objs, hwloc_uint64_t *values,
-				       unsigned long kind, unsigned long flags);

 /** \brief Remove all distance matrices from a topology.
 *
--- a/src/3rdparty/hwloc/include/hwloc/gl.h
+++ b/src/3rdparty/hwloc/include/hwloc/gl.h
@@ -1,6 +1,6 @@
 /*
 * Copyright © 2012 Blue Brain Project, EPFL. All rights reserved.
- * Copyright © 2012-2013 Inria.  All rights reserved.
+ * Copyright © 2012-2021 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@@ -39,9 +39,9 @@ extern "C" {
 /** \brief Get the hwloc OS device object corresponding to the
 * OpenGL display given by port and device index.
 *
- * Return the OS device object describing the OpenGL display
+ * \return The hwloc OS device object describing the OpenGL display
 * whose port (server) is \p port and device (screen) is \p device.
- * Return NULL if there is none.
+ * \return \c NULL if none could be found.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
@@ -70,9 +70,9 @@ hwloc_gl_get_display_osdev_by_port_device(hwloc_topology_t topology,
 /** \brief Get the hwloc OS device object corresponding to the
 * OpenGL display given by name.
 *
- * Return the OS device object describing the OpenGL display
+ * \return The hwloc OS device object describing the OpenGL display
 * whose name is \p name, built as ":port.device" such as ":0.0" .
- * Return NULL if there is none.
+ * \return \c NULL if none could be found.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
@@ -99,9 +99,10 @@ hwloc_gl_get_display_osdev_by_name(hwloc_topology_t topology,
 /** \brief Get the OpenGL display port and device corresponding
 * to the given hwloc OS object.
 *
- * Return the OpenGL display port (server) in \p port and device (screen)
+ * Retrieves the OpenGL display port (server) in \p port and device (screen)
 * in \p screen that correspond to the given hwloc OS device object.
- * Return \c -1 if there is none.
+ *
+ * \return \c -1 if none could be found.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
--- a/src/3rdparty/hwloc/include/hwloc/helper.h
+++ b/src/3rdparty/hwloc/include/hwloc/helper.h
@@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009-2012 Université Bordeaux
 * Copyright © 2009-2010 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@@ -807,6 +807,49 @@ hwloc_get_obj_below_array_by_type (hwloc_topology_t topology, int nr, hwloc_obj_
  return obj;
 }

+/** \brief Return an object of a different type with same locality.
+ *
+ * If the source object \p src is a normal or memory type,
+ * this function returns an object of type \p type with same
+ * CPU and node sets, either below or above in the hierarchy.
+ *
+ * If the source object \p src is a PCI or an OS device within a PCI
+ * device, the function may either return that PCI device, or another
+ * OS device in the same PCI parent.
+ * This may for instance be useful for converting between OS devices
+ * such as "nvml0" or "rsmi1" used in distance structures into the
+ * the PCI device, or the CUDA or OpenCL OS device that correspond
+ * to the same physical card.
+ *
+ * If not \c NULL, parameter \p subtype only select objects whose
+ * subtype attribute exists and is \p subtype (case-insensitively),
+ * for instance "OpenCL" or "CUDA".
+ *
+ * If not \c NULL, parameter \p nameprefix only selects objects whose
+ * name attribute exists and starts with \p nameprefix (case-insensitively),
+ * for instance "rsmi" for matching "rsmi0".
+ *
+ * If multiple objects match, the first one is returned.
+ *
+ * This function will not walk the hierarchy across bridges since
+ * the PCI locality may become different.
+ * This function cannot also convert between normal/memory objects
+ * and I/O or Misc objects.
+ *
+ * \p flags must be \c 0 for now.
+ *
+ * \return An object with identical locality,
+ * matching \p subtype and \p nameprefix if any.
+ *
+ * \return \c NULL if no matching object could be found,
+ * or if the source object and target type are incompatible,
+ * for instance if converting between CPU and I/O objects.
+ */
+HWLOC_DECLSPEC hwloc_obj_t
+hwloc_get_obj_with_same_locality(hwloc_topology_t topology, hwloc_obj_t src,
+                                 hwloc_obj_type_t type, const char *subtype, const char *nameprefix,
+                                 unsigned long flags);
+
 /** @} */


--- a/src/3rdparty/hwloc/include/hwloc/levelzero.h
+++ b/src/3rdparty/hwloc/include/hwloc/levelzero.h
@@ -0,0 +1,157 @@
+/*
+ * Copyright © 2021 Inria.  All rights reserved.
+ * See COPYING in top-level directory.
+ */
+
+/** \file
+ * \brief Macros to help interaction between hwloc and the oneAPI Level Zero interface.
+ *
+ * Applications that use both hwloc and Level Zero may want to
+ * include this file so as to get topology information for L0 devices.
+ */
+
+#ifndef HWLOC_LEVELZERO_H
+#define HWLOC_LEVELZERO_H
+
+#include "hwloc.h"
+#include "hwloc/autogen/config.h"
+#include "hwloc/helper.h"
+#ifdef HWLOC_LINUX_SYS
+#include "hwloc/linux.h"
+#endif
+
+#include <level_zero/ze_api.h>
+#include <level_zero/zes_api.h>
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+/** \defgroup hwlocality_levelzero Interoperability with the oneAPI Level Zero interface.
+ *
+ * This interface offers ways to retrieve topology information about
+ * devices managed by the Level Zero API.
+ *
+ * @{
+ */
+
+/** \brief Get the CPU set of logical processors that are physically
+ * close to the Level Zero device \p device
+ *
+ * Store in \p set the CPU-set describing the locality of
+ * the Level Zero device \p device.
+ *
+ * Topology \p topology and device \p device must match the local machine.
+ * The Level Zero must have been initialized with Sysman enabled
+ * (ZES_ENABLE_SYSMAN=1 in the environment).
+ * I/O devices detection and the Level Zero component are not needed in the
+ * topology.
+ *
+ * The function only returns the locality of the device.
+ * If more information about the device is needed, OS objects should
+ * be used instead, see hwloc_levelzero_get_device_osdev().
+ *
+ * This function is currently only implemented in a meaningful way for
+ * Linux; other systems will simply get a full cpuset.
+ */
+static __hwloc_inline int
+hwloc_levelzero_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,
+                                  ze_device_handle_t device, hwloc_cpuset_t set)
+{
+#ifdef HWLOC_LINUX_SYS
+  /* If we're on Linux, use the sysfs mechanism to get the local cpus */
+#define HWLOC_LEVELZERO_DEVICE_SYSFS_PATH_MAX 128
+  char path[HWLOC_LEVELZERO_DEVICE_SYSFS_PATH_MAX];
+  zes_pci_properties_t pci;
+  zes_device_handle_t sdevice = device;
+  ze_result_t res;
+
+  if (!hwloc_topology_is_thissystem(topology)) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  res = zesDevicePciGetProperties(sdevice, &pci);
+  if (res != ZE_RESULT_SUCCESS) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  sprintf(path, "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/local_cpus",
+          pci.address.domain, pci.address.bus, pci.address.device, pci.address.function);
+  if (hwloc_linux_read_path_as_cpumask(path, set) < 0
+      || hwloc_bitmap_iszero(set))
+    hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology));
+#else
+  /* Non-Linux systems simply get a full cpuset */
+  hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology));
+#endif
+  return 0;
+}
+
+/** \brief Get the hwloc OS device object corresponding to Level Zero device
+ * \p device.
+ *
+ * \return The hwloc OS device object that describes the given Level Zero device \p device.
+ * \return \c NULL if none could be found.
+ *
+ * Topology \p topology and device \p dv_ind must match the local machine.
+ * I/O devices detection and the Level Zero component must be enabled in the
+ * topology. If not, the locality of the object may still be found using
+ * hwloc_levelzero_get_device_cpuset().
+ *
+ * \note The corresponding hwloc PCI device may be found by looking
+ * at the result parent pointer (unless PCI devices are filtered out).
+ */
+static __hwloc_inline hwloc_obj_t
+hwloc_levelzero_get_device_osdev(hwloc_topology_t topology, ze_device_handle_t device)
+{
+  zes_device_handle_t sdevice = device;
+  zes_pci_properties_t pci;
+  ze_result_t res;
+  hwloc_obj_t osdev;
+
+  if (!hwloc_topology_is_thissystem(topology)) {
+    errno = EINVAL;
+    return NULL;
+  }
+
+  res = zesDevicePciGetProperties(sdevice, &pci);
+  if (res != ZE_RESULT_SUCCESS) {
+    /* L0 was likely initialized without sysman, don't bother */
+    errno = EINVAL;
+    return NULL;
+  }
+
+  osdev = NULL;
+  while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) {
+    hwloc_obj_t pcidev = osdev->parent;
+
+    if (strncmp(osdev->name, "ze", 2))
+      continue;
+
+    if (pcidev
+      && pcidev->type == HWLOC_OBJ_PCI_DEVICE
+      && pcidev->attr->pcidev.domain == pci.address.domain
+      && pcidev->attr->pcidev.bus == pci.address.bus
+      && pcidev->attr->pcidev.dev == pci.address.device
+      && pcidev->attr->pcidev.func == pci.address.function)
+      return osdev;
+
+    /* FIXME: when we'll have serialnumber, try it in case PCI is filtered-out */
+  }
+
+  return NULL;
+}
+
+/** @} */
+
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+
+#endif /* HWLOC_LEVELZERO_H */
--- a/src/3rdparty/hwloc/include/hwloc/linux.h
+++ b/src/3rdparty/hwloc/include/hwloc/linux.h
@@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2016 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009-2011 Université Bordeaux
 * See COPYING in top-level directory.
 */
@@ -44,6 +44,10 @@ extern "C" {
 HWLOC_DECLSPEC int hwloc_linux_set_tid_cpubind(hwloc_topology_t topology, pid_t tid, hwloc_const_cpuset_t set);

 /** \brief Get the current binding of thread \p tid
+ *
+ * The CPU-set \p set (previously allocated by the caller)
+ * is filled with the list of PUs which the thread
+ * was last bound to.
 *
 * The behavior is exactly the same as the Linux sched_getaffinity system call,
 * but uses a hwloc cpuset.
@@ -54,6 +58,9 @@ HWLOC_DECLSPEC int hwloc_linux_set_tid_cpubind(hwloc_topology_t topology, pid_t
 HWLOC_DECLSPEC int hwloc_linux_get_tid_cpubind(hwloc_topology_t topology, pid_t tid, hwloc_cpuset_t set);

 /** \brief Get the last physical CPU where thread \p tid ran.
+ *
+ * The CPU-set \p set (previously allocated by the caller)
+ * is filled with the PU which the thread last ran on.
 *
 * \note This is equivalent to calling hwloc_get_proc_last_cpu_location() with
 * ::HWLOC_CPUBIND_THREAD as flags.
--- a/src/3rdparty/hwloc/include/hwloc/memattrs.h
+++ b/src/3rdparty/hwloc/include/hwloc/memattrs.h
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2019-2020 Inria.  All rights reserved.
+ * Copyright © 2019-2022 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@@ -354,7 +354,7 @@ hwloc_memattr_register(hwloc_topology_t topology,
 * \p flags must be \c 0 for now.
 *
 * \note The initiator \p initiator should be of type ::HWLOC_LOCATION_TYPE_CPUSET
- * when refering to accesses performed by CPU cores.
+ * when referring to accesses performed by CPU cores.
 * ::HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc,
 * but users may for instance use it to provide custom information about
 * host memory accesses performed by GPUs.
@@ -398,7 +398,7 @@ hwloc_memattr_set_value(hwloc_topology_t topology,
 * values.
 *
 * \note The initiator \p initiator should be of type ::HWLOC_LOCATION_TYPE_CPUSET
- * when refering to accesses performed by CPU cores.
+ * when referring to accesses performed by CPU cores.
 * ::HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc,
 * but users may for instance use it to provide custom information about
 * host memory accesses performed by GPUs.
@@ -408,7 +408,7 @@ hwloc_memattr_get_targets(hwloc_topology_t topology,
                          hwloc_memattr_id_t attribute,
                          struct hwloc_location *initiator,
                          unsigned long flags,
-                          unsigned *nrp, hwloc_obj_t *targets, hwloc_uint64_t *values);
+                          unsigned *nr, hwloc_obj_t *targets, hwloc_uint64_t *values);

 /** \brief Return the initiators that have values for a given attribute for a specific target NUMA node.
 *
--- a/src/3rdparty/hwloc/include/hwloc/nvml.h
+++ b/src/3rdparty/hwloc/include/hwloc/nvml.h
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2012-2020 Inria.  All rights reserved.
+ * Copyright © 2012-2021 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@@ -39,7 +39,7 @@ extern "C" {
 /** \brief Get the CPU set of processors that are physically
 * close to NVML device \p device.
 *
- * Return the CPU set describing the locality of the NVML device \p device.
+ * Store in \p set the CPU-set describing the locality of the NVML device \p device.
 *
 * Topology \p topology and device \p device must match the local machine.
 * I/O devices detection and the NVML component are not needed in the topology.
@@ -88,8 +88,8 @@ hwloc_nvml_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,
 /** \brief Get the hwloc OS device object corresponding to the
 * NVML device whose index is \p idx.
 *
- * Return the OS device object describing the NVML device whose
- * index is \p idx. Returns NULL if there is none.
+ * \return The hwloc OS device object describing the NVML device whose index is \p idx.
+ * \return \c NULL if none could be found.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
@@ -114,8 +114,8 @@ hwloc_nvml_get_device_osdev_by_index(hwloc_topology_t topology, unsigned idx)

 /** \brief Get the hwloc OS device object corresponding to NVML device \p device.
 *
- * Return the hwloc OS device object that describes the given
- * NVML device \p device. Return NULL if there is none.
+ * \return The hwloc OS device object that describes the given NVML device \p device.
+ * \return \c NULL if none could be found.
 *
 * Topology \p topology and device \p device must match the local machine.
 * I/O devices detection and the NVML component must be enabled in the topology.
--- a/src/3rdparty/hwloc/include/hwloc/opencl.h
+++ b/src/3rdparty/hwloc/include/hwloc/opencl.h
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2012-2020 Inria.  All rights reserved.
+ * Copyright © 2012-2021 Inria.  All rights reserved.
 * Copyright © 2013, 2018 Université Bordeaux.  All right reserved.
 * See COPYING in top-level directory.
 */
@@ -82,9 +82,10 @@ hwloc_opencl_get_device_pci_busid(cl_device_id device,
 	if (CL_SUCCESS == clret
 	    && HWLOC_CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD == amdtopo.raw.type) {
 		*domain = 0; /* can't do anything better */
-		*bus = (unsigned) amdtopo.pcie.bus;
-		*dev = (unsigned) amdtopo.pcie.device;
-		*func = (unsigned) amdtopo.pcie.function;
+		/* cl_device_topology_amd stores bus ID in cl_char, dont convert those signed char directly to unsigned int */
+		*bus = (unsigned) (unsigned char) amdtopo.pcie.bus;
+		*dev = (unsigned) (unsigned char) amdtopo.pcie.device;
+		*func = (unsigned) (unsigned char) amdtopo.pcie.function;
 		return 0;
 	}

@@ -112,7 +113,7 @@ hwloc_opencl_get_device_pci_busid(cl_device_id device,
 /** \brief Get the CPU set of processors that are physically
 * close to OpenCL device \p device.
 *
- * Return the CPU set describing the locality of the OpenCL device \p device.
+ * Store in \p set the CPU-set describing the locality of the OpenCL device \p device.
 *
 * Topology \p topology and device \p device must match the local machine.
 * I/O devices detection and the OpenCL component are not needed in the topology.
@@ -161,10 +162,10 @@ hwloc_opencl_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unuse
 /** \brief Get the hwloc OS device object corresponding to the
 * OpenCL device for the given indexes.
 *
- * Return the OS device object describing the OpenCL device
+ * \return The hwloc OS device object describing the OpenCL device
 * whose platform index is \p platform_index,
 * and whose device index within this platform if \p device_index.
- * Return NULL if there is none.
+ * \return \c NULL if there is none.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
@@ -191,8 +192,9 @@ hwloc_opencl_get_device_osdev_by_index(hwloc_topology_t topology,

 /** \brief Get the hwloc OS device object corresponding to OpenCL device \p deviceX.
 *
- * Use OpenCL device attributes to find the corresponding hwloc OS device object.
- * Return NULL if there is none or if useful attributes are not available.
+ * \return The hwloc OS device object corresponding to the given OpenCL device \p device.
+ * \return \c NULL if none could be found, for instance
+ * if required OpenCL attributes are not available.
 *
 * This function currently only works on AMD and NVIDIA OpenCL devices that support
 * relevant OpenCL extensions. hwloc_opencl_get_device_osdev_by_index()
--- a/src/3rdparty/hwloc/include/hwloc/openfabrics-verbs.h
+++ b/src/3rdparty/hwloc/include/hwloc/openfabrics-verbs.h
@@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009-2010 Université Bordeaux
 * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@@ -44,7 +44,7 @@ extern "C" {
 /** \brief Get the CPU set of processors that are physically
 * close to device \p ibdev.
 *
- * Return the CPU set describing the locality of the OpenFabrics
+ * Store in \p set the CPU-set describing the locality of the OpenFabrics
 * device \p ibdev (InfiniBand, etc).
 *
 * Topology \p topology and device \p ibdev must match the local machine.
@@ -88,10 +88,11 @@ hwloc_ibv_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,
 /** \brief Get the hwloc OS device object corresponding to the OpenFabrics
 * device named \p ibname.
 *
- * Return the OS device object describing the OpenFabrics device
+ * \return The hwloc OS device object describing the OpenFabrics device
 * (InfiniBand, Omni-Path, usNIC, etc) whose name is \p ibname
 * (mlx5_0, hfi1_0, usnic_0, qib0, etc).
- * Returns NULL if there is none.
+ * \return \c NULL if none could be found.
+ *
 * The name \p ibname is usually obtained from ibv_get_device_name().
 *
 * The topology \p topology does not necessarily have to match the current
@@ -117,8 +118,9 @@ hwloc_ibv_get_device_osdev_by_name(hwloc_topology_t topology,
 /** \brief Get the hwloc OS device object corresponding to the OpenFabrics
 * device \p ibdev.
 *
- * Return the OS device object describing the OpenFabrics device \p ibdev
- * (InfiniBand, etc). Returns NULL if there is none.
+ * \return The hwloc OS device object describing the OpenFabrics
+ * device \p ibdev (InfiniBand, etc).
+ * \return \c NULL if none could be found.
 *
 * Topology \p topology and device \p ibdev must match the local machine.
 * I/O devices detection must be enabled in the topology.
--- a/src/3rdparty/hwloc/include/hwloc/plugins.h
+++ b/src/3rdparty/hwloc/include/hwloc/plugins.h
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2013-2020 Inria.  All rights reserved.
+ * Copyright © 2013-2021 Inria.  All rights reserved.
 * Copyright © 2016 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
 */
@@ -27,6 +27,9 @@ struct hwloc_backend;


 /** \defgroup hwlocality_disc_components Components and Plugins: Discovery components
+ *
+ * \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
+ *
 * @{
 */

@@ -93,6 +96,9 @@ struct hwloc_disc_component {


 /** \defgroup hwlocality_disc_backends Components and Plugins: Discovery backends
+ *
+ * \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
+ *
 * @{
 */

@@ -241,6 +247,9 @@ HWLOC_DECLSPEC int hwloc_backend_enable(struct hwloc_backend *backend);


 /** \defgroup hwlocality_generic_components Components and Plugins: Generic components
+ *
+ * \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
+ *
 * @{
 */

@@ -310,10 +319,26 @@ struct hwloc_component {


 /** \defgroup hwlocality_components_core_funcs Components and Plugins: Core functions to be used by components
+ *
+ * \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
+ *
 * @{
 */

-/** \brief Check whether insertion errors are hidden */
+/** \brief Check whether error messages are hidden.
+ *
+ * Callers should print critical error messages
+ * (e.g. invalid hw topo info, invalid config)
+ * only if this function returns strictly less than 2.
+ *
+ * Callers should print non-critical error messages
+ * (e.g. failure to initialize CUDA)
+ * if this function returns 0.
+ *
+ * This function return 1 by default (show critical only),
+ * 0 in lstopo (show all),
+ * or anything set in HWLOC_HIDE_ERRORS in the environment.
+ */
 HWLOC_DECLSPEC int hwloc_hide_errors(void);

 /** \brief Add an object to the topology.
@@ -455,6 +480,9 @@ hwloc_plugin_check_namespace(const char *pluginname __hwloc_attribute_unused, co


 /** \defgroup hwlocality_components_filtering Components and Plugins: Filtering objects
+ *
+ * \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
+ *
 * @{
 */

@@ -469,9 +497,11 @@ hwloc_filter_check_pcidev_subtype_important(unsigned classid)
  return (baseclass == 0x03 /* PCI_BASE_CLASS_DISPLAY */
 	  || baseclass == 0x02 /* PCI_BASE_CLASS_NETWORK */
 	  || baseclass == 0x01 /* PCI_BASE_CLASS_STORAGE */
+	  || baseclass == 0x00 /* Unclassified, for Atos/Bull BXI */
 	  || baseclass == 0x0b /* PCI_BASE_CLASS_PROCESSOR */
 	  || classid == 0x0c04 /* PCI_CLASS_SERIAL_FIBER */
 	  || classid == 0x0c06 /* PCI_CLASS_SERIAL_INFINIBAND */
+          || baseclass == 0x06 /* PCI_BASE_CLASS_BRIDGE with non-PCI downstream. the core will drop the useless ones later */
 	  || baseclass == 0x12 /* Processing Accelerators */);
 }

@@ -527,6 +557,9 @@ hwloc_filter_check_keep_object(hwloc_topology_t topology, hwloc_obj_t obj)


 /** \defgroup hwlocality_components_pcidisc Components and Plugins: helpers for PCI discovery
+ *
+ * \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
+ *
 * @{
 */

@@ -578,18 +611,76 @@ HWLOC_DECLSPEC int hwloc_pcidisc_tree_attach(struct hwloc_topology *topology, st


 /** \defgroup hwlocality_components_pcifind Components and Plugins: finding PCI objects during other discoveries
+ *
+ * \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
+ *
 * @{
 */

-/** \brief Find the normal parent of a PCI bus ID.
+/** \brief Find the object or a parent of a PCI bus ID.
 *
- * Look at PCI affinity to find out where the given PCI bus ID should be attached.
+ * When attaching a new object (typically an OS device) whose locality
+ * is specified by PCI bus ID, this function returns the PCI object
+ * to use as a parent for attaching.
 *
- * This function should be used to attach an I/O device under the corresponding
- * PCI object (if any), or under a normal (non-I/O) object with same locality.
+ * If the exact PCI device with this bus ID exists, it is returned.
+ * Otherwise (for instance if it was filtered out), the function returns
+ * another object with similar locality (for instance a parent bridge,
+ * or the local CPU Package).
 */
 HWLOC_DECLSPEC struct hwloc_obj * hwloc_pci_find_parent_by_busid(struct hwloc_topology *topology, unsigned domain, unsigned bus, unsigned dev, unsigned func);

+/** \brief Find the PCI device or bridge matching a PCI bus ID exactly.
+ *
+ * This is useful for adding specific information about some objects
+ * based on their PCI id. When it comes to attaching objects based on
+ * PCI locality, hwloc_pci_find_parent_by_busid() should be preferred.
+ */
+HWLOC_DECLSPEC struct hwloc_obj * hwloc_pci_find_by_busid(struct hwloc_topology *topology, unsigned domain, unsigned bus, unsigned dev, unsigned func);
+
+/** \brief Handle to a new distances structure during its addition to the topology. */
+typedef void * hwloc_backend_distances_add_handle_t;
+
+/** \brief Create a new empty distances structure.
+ *
+ * This is identical to hwloc_distances_add_create()
+ * but this variant is designed for backend inserting
+ * distances during topology discovery.
+ */
+HWLOC_DECLSPEC hwloc_backend_distances_add_handle_t
+hwloc_backend_distances_add_create(hwloc_topology_t topology,
+                                   const char *name, unsigned long kind,
+                                   unsigned long flags);
+
+/** \brief Specify the objects and values in a new empty distances structure.
+ *
+ * This is similar to hwloc_distances_add_values()
+ * but this variant is designed for backend inserting
+ * distances during topology discovery.
+ *
+ * The only semantical difference is that \p objs and \p values
+ * are not duplicated, but directly attached to the topology.
+ * On success, these arrays are given to the core and should not
+ * ever be freed by the caller anymore.
+ */
+HWLOC_DECLSPEC int
+hwloc_backend_distances_add_values(hwloc_topology_t topology,
+                                   hwloc_backend_distances_add_handle_t handle,
+                                   unsigned nbobjs, hwloc_obj_t *objs,
+                                   hwloc_uint64_t *values,
+                                   unsigned long flags);
+
+/** \brief Commit a new distances structure.
+ *
+ * This is similar to hwloc_distances_add_commit()
+ * but this variant is designed for backend inserting
+ * distances during topology discovery.
+ */
+HWLOC_DECLSPEC int
+hwloc_backend_distances_add_commit(hwloc_topology_t topology,
+                                   hwloc_backend_distances_add_handle_t handle,
+                                   unsigned long flags);
+
 /** @} */


--- a/src/3rdparty/hwloc/include/hwloc/rename.h
+++ b/src/3rdparty/hwloc/include/hwloc/rename.h
@@ -1,6 +1,6 @@
 /*
 * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
- * Copyright © 2010-2020 Inria.  All rights reserved.
+ * Copyright © 2010-2021 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@@ -120,6 +120,9 @@ extern "C" {
 #define HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM HWLOC_NAME_CAPS(TOPOLOGY_FLAG_IS_THISSYSTEM)
 #define HWLOC_TOPOLOGY_FLAG_THISSYSTEM_ALLOWED_RESOURCES HWLOC_NAME_CAPS(TOPOLOGY_FLAG_THISSYSTEM_ALLOWED_RESOURCES)
 #define HWLOC_TOPOLOGY_FLAG_IMPORT_SUPPORT HWLOC_NAME_CAPS(TOPOLOGY_FLAG_IMPORT_SUPPORT)
+#define HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING HWLOC_NAME_CAPS(TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING)
+#define HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_MEMBINDING HWLOC_NAME_CAPS(TOPOLOGY_FLAG_RESTRICT_TO_MEMBINDING)
+#define HWLOC_TOPOLOGY_FLAG_DONT_CHANGE_BINDING HWLOC_NAME_CAPS(TOPOLOGY_FLAG_DONT_CHANGE_BINDING)

 #define hwloc_topology_set_pid HWLOC_NAME(topology_set_pid)
 #define hwloc_topology_set_synthetic HWLOC_NAME(topology_set_synthetic)
@@ -356,6 +359,7 @@ extern "C" {
 #define hwloc_get_closest_objs HWLOC_NAME(get_closest_objs)
 #define hwloc_get_obj_below_by_type HWLOC_NAME(get_obj_below_by_type)
 #define hwloc_get_obj_below_array_by_type HWLOC_NAME(get_obj_below_array_by_type)
+#define hwloc_get_obj_with_same_locality HWLOC_NAME(get_obj_with_same_locality)
 #define hwloc_distrib_flags_e HWLOC_NAME(distrib_flags_e)
 #define HWLOC_DISTRIB_FLAG_REVERSE HWLOC_NAME_CAPS(DISTRIB_FLAG_REVERSE)
 #define hwloc_distrib HWLOC_NAME(distrib)
@@ -454,11 +458,22 @@ extern "C" {
 #define hwloc_distances_obj_index HWLOC_NAME(distances_obj_index)
 #define hwloc_distances_obj_pair_values HWLOC_NAME(distances_pair_values)

+#define hwloc_distances_transform_e HWLOC_NAME(distances_transform_e)
+#define HWLOC_DISTANCES_TRANSFORM_REMOVE_NULL HWLOC_NAME_CAPS(DISTANCES_TRANSFORM_REMOVE_NULL)
+#define HWLOC_DISTANCES_TRANSFORM_LINKS HWLOC_NAME_CAPS(DISTANCES_TRANSFORM_LINKS)
+#define HWLOC_DISTANCES_TRANSFORM_MERGE_SWITCH_PORTS HWLOC_NAME_CAPS(DISTANCES_TRANSFORM_MERGE_SWITCH_PORTS)
+#define HWLOC_DISTANCES_TRANSFORM_TRANSITIVE_CLOSURE HWLOC_NAME_CAPS(DISTANCES_TRANSFORM_TRANSITIVE_CLOSURE)
+#define hwloc_distances_transform HWLOC_NAME(distances_transform)
+
 #define hwloc_distances_add_flag_e HWLOC_NAME(distances_add_flag_e)
 #define HWLOC_DISTANCES_ADD_FLAG_GROUP HWLOC_NAME_CAPS(DISTANCES_ADD_FLAG_GROUP)
 #define HWLOC_DISTANCES_ADD_FLAG_GROUP_INACCURATE HWLOC_NAME_CAPS(DISTANCES_ADD_FLAG_GROUP_INACCURATE)

-#define hwloc_distances_add HWLOC_NAME(distances_add)
+#define hwloc_distances_add_handle_t HWLOC_NAME(distances_add_handle_t)
+#define hwloc_distances_add_create HWLOC_NAME(distances_add_create)
+#define hwloc_distances_add_values HWLOC_NAME(distances_add_values)
+#define hwloc_distances_add_commit HWLOC_NAME(distances_add_commit)
+
 #define hwloc_distances_remove HWLOC_NAME(distances_remove)
 #define hwloc_distances_remove_by_depth HWLOC_NAME(distances_remove_by_depth)
 #define hwloc_distances_remove_by_type HWLOC_NAME(distances_remove_by_type)
@@ -523,6 +538,11 @@ extern "C" {
 #define hwloc_linux_get_tid_last_cpu_location HWLOC_NAME(linux_get_tid_last_cpu_location)
 #define hwloc_linux_read_path_as_cpumask HWLOC_NAME(linux_read_file_cpumask)

+/* windows.h */
+
+#define hwloc_windows_get_nr_processor_groups HWLOC_NAME(windows_get_nr_processor_groups)
+#define hwloc_windows_get_processor_group_cpuset HWLOC_NAME(windows_get_processor_group_cpuset)
+
 /* openfabrics-verbs.h */

 #define hwloc_ibv_get_device_cpuset HWLOC_NAME(ibv_get_device_cpuset)
@@ -564,6 +584,11 @@ extern "C" {
 #define hwloc_rsmi_get_device_osdev HWLOC_NAME(rsmi_get_device_osdev)
 #define hwloc_rsmi_get_device_osdev_by_index HWLOC_NAME(rsmi_get_device_osdev_by_index)

+/* levelzero.h */
+
+#define hwloc_levelzero_get_device_cpuset HWLOC_NAME(levelzero_get_device_cpuset)
+#define hwloc_levelzero_get_device_osdev HWLOC_NAME(levelzero_get_device_osdev)
+
 /* gl.h */

 #define hwloc_gl_get_display_osdev_by_port_device HWLOC_NAME(gl_get_display_osdev_by_port_device)
@@ -620,10 +645,18 @@ extern "C" {
 #define hwloc_pcidisc_tree_insert_by_busid HWLOC_NAME(pcidisc_tree_insert_by_busid)
 #define hwloc_pcidisc_tree_attach HWLOC_NAME(pcidisc_tree_attach)

+#define hwloc_pci_find_by_busid HWLOC_NAME(pcidisc_find_by_busid)
 #define hwloc_pci_find_parent_by_busid HWLOC_NAME(pcidisc_find_busid_parent)

+#define hwloc_backend_distances_add_handle_t HWLOC_NAME(backend_distances_add_handle_t)
+#define hwloc_backend_distances_add_create HWLOC_NAME(backend_distances_add_create)
+#define hwloc_backend_distances_add_values HWLOC_NAME(backend_distances_add_values)
+#define hwloc_backend_distances_add_commit HWLOC_NAME(backend_distances_add_commit)
+
 /* hwloc/deprecated.h */

+#define hwloc_distances_add HWLOC_NAME(distances_add)
+
 #define hwloc_topology_insert_misc_object_by_parent HWLOC_NAME(topology_insert_misc_object_by_parent)
 #define hwloc_obj_cpuset_snprintf HWLOC_NAME(obj_cpuset_snprintf)
 #define hwloc_obj_type_sscanf HWLOC_NAME(obj_type_sscanf)
@@ -733,6 +766,7 @@ extern "C" {

 #define hwloc_cuda_component HWLOC_NAME(cuda_component)
 #define hwloc_gl_component HWLOC_NAME(gl_component)
+#define hwloc_levelzero_component HWLOC_NAME(levelzero_component)
 #define hwloc_nvml_component HWLOC_NAME(nvml_component)
 #define hwloc_rsmi_component HWLOC_NAME(rsmi_component)
 #define hwloc_opencl_component HWLOC_NAME(opencl_component)
@@ -772,7 +806,6 @@ extern "C" {
 #define hwloc_pci_discovery_init HWLOC_NAME(pci_discovery_init)
 #define hwloc_pci_discovery_prepare HWLOC_NAME(pci_discovery_prepare)
 #define hwloc_pci_discovery_exit HWLOC_NAME(pci_discovery_exit)
-#define hwloc_pci_find_by_busid HWLOC_NAME(pcidisc_find_by_busid)
 #define hwloc_find_insert_io_parent_by_complete_cpuset HWLOC_NAME(hwloc_find_insert_io_parent_by_complete_cpuset)

 #define hwloc__add_info HWLOC_NAME(_add_info)
@@ -816,7 +849,6 @@ extern "C" {
 #define hwloc_internal_distances_dup HWLOC_NAME(internal_distances_dup)
 #define hwloc_internal_distances_refresh HWLOC_NAME(internal_distances_refresh)
 #define hwloc_internal_distances_destroy HWLOC_NAME(internal_distances_destroy)
-
 #define hwloc_internal_distances_add HWLOC_NAME(internal_distances_add)
 #define hwloc_internal_distances_add_by_index HWLOC_NAME(internal_distances_add_by_index)
 #define hwloc_internal_distances_invalidate_cached_objs HWLOC_NAME(hwloc_internal_distances_invalidate_cached_objs)
--- a/src/3rdparty/hwloc/include/hwloc/rsmi.h
+++ b/src/3rdparty/hwloc/include/hwloc/rsmi.h
@@ -0,0 +1,203 @@
+/*
+ * Copyright © 2012-2021 Inria.  All rights reserved.
+ * Copyright (c) 2020, Advanced Micro Devices, Inc. All rights reserved.
+ * Written by Advanced Micro Devices,
+ * See COPYING in top-level directory.
+ */
+
+/** \file
+ * \brief Macros to help interaction between hwloc and the ROCm SMI Management Library.
+ *
+ * Applications that use both hwloc and the ROCm SMI Management Library may want to
+ * include this file so as to get topology information for AMD GPU devices.
+ */
+
+#ifndef HWLOC_RSMI_H
+#define HWLOC_RSMI_H
+
+#include "hwloc.h"
+#include "hwloc/autogen/config.h"
+#include "hwloc/helper.h"
+#ifdef HWLOC_LINUX_SYS
+#include "hwloc/linux.h"
+#endif
+
+#include <rocm_smi/rocm_smi.h>
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+/** \defgroup hwlocality_rsmi Interoperability with the ROCm SMI Management Library
+ *
+ * This interface offers ways to retrieve topology information about
+ * devices managed by the ROCm SMI Management Library.
+ *
+ * @{
+ */
+
+/** \brief Get the CPU set of logical processors that are physically
+ * close to AMD GPU device whose index is \p dv_ind.
+ *
+ * Store in \p set the CPU-set describing the locality of the AMD GPU device
+ * whose index is \p dv_ind.
+ *
+ * Topology \p topology and device \p dv_ind must match the local machine.
+ * I/O devices detection and the ROCm SMI component are not needed in the
+ * topology.
+ *
+ * The function only returns the locality of the device.
+ * If more information about the device is needed, OS objects should
+ * be used instead, see hwloc_rsmi_get_device_osdev()
+ * and hwloc_rsmi_get_device_osdev_by_index().
+ *
+ * This function is currently only implemented in a meaningful way for
+ * Linux; other systems will simply get a full cpuset.
+ */
+static __hwloc_inline int
+hwloc_rsmi_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,
+                             uint32_t dv_ind, hwloc_cpuset_t set)
+{
+#ifdef HWLOC_LINUX_SYS
+  /* If we're on Linux, use the sysfs mechanism to get the local cpus */
+#define HWLOC_RSMI_DEVICE_SYSFS_PATH_MAX 128
+  char path[HWLOC_RSMI_DEVICE_SYSFS_PATH_MAX];
+  rsmi_status_t ret;
+  uint64_t bdfid = 0;
+  unsigned domain, device, bus;
+
+  if (!hwloc_topology_is_thissystem(topology)) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  ret = rsmi_dev_pci_id_get(dv_ind, &bdfid);
+  if (RSMI_STATUS_SUCCESS != ret) {
+    errno = EINVAL;
+    return -1;
+  }
+  domain = (bdfid>>32) & 0xffffffff;
+  bus = ((bdfid & 0xffff)>>8) & 0xff;
+  device = ((bdfid & 0xff)>>3) & 0x1f;
+
+  sprintf(path, "/sys/bus/pci/devices/%04x:%02x:%02x.0/local_cpus", domain, bus, device);
+  if (hwloc_linux_read_path_as_cpumask(path, set) < 0
+      || hwloc_bitmap_iszero(set))
+    hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology));
+#else
+  /* Non-Linux systems simply get a full cpuset */
+  hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology));
+#endif
+  return 0;
+}
+
+/** \brief Get the hwloc OS device object corresponding to the
+ * AMD GPU device whose index is \p dv_ind.
+ *
+ * \return The hwloc OS device object describing the AMD GPU device whose
+ * index is \p dv_ind.
+ * \return \c NULL if none could be found.
+ *
+ * The topology \p topology does not necessarily have to match the current
+ * machine. For instance the topology may be an XML import of a remote host.
+ * I/O devices detection and the ROCm SMI component must be enabled in the
+ * topology.
+ *
+ * \note The corresponding PCI device object can be obtained by looking
+ * at the OS device parent object (unless PCI devices are filtered out).
+ */
+static __hwloc_inline hwloc_obj_t
+hwloc_rsmi_get_device_osdev_by_index(hwloc_topology_t topology, uint32_t dv_ind)
+{
+  hwloc_obj_t osdev = NULL;
+  while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) {
+    if (HWLOC_OBJ_OSDEV_GPU == osdev->attr->osdev.type
+      && osdev->name
+      && !strncmp("rsmi", osdev->name, 4)
+      && atoi(osdev->name + 4) == (int) dv_ind)
+      return osdev;
+  }
+  return NULL;
+}
+
+/** \brief Get the hwloc OS device object corresponding to AMD GPU device,
+ * whose index is \p dv_ind.
+ *
+ * \return The hwloc OS device object that describes the given
+ * AMD GPU, whose index is \p dv_ind.
+ * \return \c NULL if none could be found.
+ *
+ * Topology \p topology and device \p dv_ind must match the local machine.
+ * I/O devices detection and the ROCm SMI component must be enabled in the
+ * topology. If not, the locality of the object may still be found using
+ * hwloc_rsmi_get_device_cpuset().
+ *
+ * \note The corresponding hwloc PCI device may be found by looking
+ * at the result parent pointer (unless PCI devices are filtered out).
+ */
+static __hwloc_inline hwloc_obj_t
+hwloc_rsmi_get_device_osdev(hwloc_topology_t topology, uint32_t dv_ind)
+{
+  hwloc_obj_t osdev;
+  rsmi_status_t ret;
+  uint64_t bdfid = 0;
+  unsigned domain, device, bus, func;
+  uint64_t id;
+  char uuid[64];
+
+  if (!hwloc_topology_is_thissystem(topology)) {
+    errno = EINVAL;
+    return NULL;
+  }
+
+  ret = rsmi_dev_pci_id_get(dv_ind, &bdfid);
+  if (RSMI_STATUS_SUCCESS != ret) {
+    errno = EINVAL;
+    return NULL;
+  }
+  domain = (bdfid>>32) & 0xffffffff;
+  bus = ((bdfid & 0xffff)>>8) & 0xff;
+  device = ((bdfid & 0xff)>>3) & 0x1f;
+  func = bdfid & 0x7;
+
+  ret = rsmi_dev_unique_id_get(dv_ind, &id);
+  if (RSMI_STATUS_SUCCESS != ret)
+    uuid[0] = '\0';
+  else
+    sprintf(uuid, "%lx", id);
+
+  osdev = NULL;
+  while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) {
+    hwloc_obj_t pcidev = osdev->parent;
+    const char *info;
+
+    if (strncmp(osdev->name, "rsmi", 4))
+      continue;
+
+    if (pcidev
+      && pcidev->type == HWLOC_OBJ_PCI_DEVICE
+      && pcidev->attr->pcidev.domain == domain
+      && pcidev->attr->pcidev.bus == bus
+      && pcidev->attr->pcidev.dev == device
+      && pcidev->attr->pcidev.func == func)
+      return osdev;
+
+    info = hwloc_obj_get_info_by_name(osdev, "AMDUUID");
+    if (info && !strcmp(info, uuid))
+      return osdev;
+  }
+
+  return NULL;
+}
+
+/** @} */
+
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+
+#endif /* HWLOC_RSMI_H */
--- a/src/3rdparty/hwloc/include/hwloc/windows.h
+++ b/src/3rdparty/hwloc/include/hwloc/windows.h
@@ -0,0 +1,76 @@
+/*
+ * Copyright © 2021 Inria.  All rights reserved.
+ * See COPYING in top-level directory.
+ */
+
+/** \file
+ * \brief Macros to help interaction between hwloc and Windows.
+ *
+ * Applications that use hwloc on Windows may want to include this file
+ * for Windows specific hwloc features.
+ */
+
+#ifndef HWLOC_WINDOWS_H
+#define HWLOC_WINDOWS_H
+
+#include "hwloc.h"
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+/** \defgroup hwlocality_windows Windows-specific helpers
+ *
+ * These functions query Windows processor groups.
+ * These groups partition the operating system into virtual sets
+ * of up to 64 neighbor PUs.
+ * Threads and processes may only be bound inside a single group.
+ * Although Windows processor groups may be exposed in the hwloc
+ * hierarchy as hwloc Groups, they are also often merged into
+ * existing hwloc objects such as NUMA nodes or Packages.
+ * This API provides explicit information about Windows processor
+ * groups so that applications know whether binding to a large
+ * set of PUs may fail because it spans over multiple Windows
+ * processor groups.
+ *
+ * @{
+ */
+
+
+/** \brief Get the number of Windows processor groups
+ *
+ * \p flags must be 0 for now.
+ *
+ * \return at least \c 1 on success.
+ * \return -1 on error, for instance if the topology does not match
+ * the current system (e.g. loaded from another machine through XML).
+ */
+HWLOC_DECLSPEC int hwloc_windows_get_nr_processor_groups(hwloc_topology_t topology, unsigned long flags);
+
+/** \brief Get the CPU-set of a Windows processor group.
+ *
+ * Get the set of PU included in the processor group specified
+ * by \p pg_index.
+ * \p pg_index must be between \c 0 and the value returned
+ * by hwloc_windows_get_nr_processor_groups() minus 1.
+ *
+ * \p flags must be 0 for now.
+ *
+ * \return \c 0 on success.
+ * \return \c -1 on error, for instance if \p pg_index is invalid,
+ * or if the topology does not match the current system (e.g. loaded
+ * from another machine through XML).
+ */
+HWLOC_DECLSPEC int hwloc_windows_get_processor_group_cpuset(hwloc_topology_t topology, unsigned pg_index, hwloc_cpuset_t cpuset, unsigned long flags);
+
+/** @} */
+
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+
+#endif /* HWLOC_WINDOWS_H */
--- a/src/3rdparty/hwloc/include/private/autogen/config.h
+++ b/src/3rdparty/hwloc/include/private/autogen/config.h
@@ -1,6 +1,6 @@
 /*
 * Copyright © 2009, 2011, 2012 CNRS.  All rights reserved.
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009, 2011, 2012, 2015 Université Bordeaux.  All rights reserved.
 * Copyright © 2009-2020 Cisco Systems, Inc.  All rights reserved.
 * $COPYRIGHT$
@@ -290,10 +290,6 @@
 /* Define to '1' if sysctlbyname is present and usable */
 /* #undef HAVE_SYSCTLBYNAME */

-/* Define to 1 if the system has the type
-   `SYSTEM_LOGICAL_PROCESSOR_INFORMATION'. */
-#define HAVE_SYSTEM_LOGICAL_PROCESSOR_INFORMATION 1
-
 /* Define to 1 if the system has the type
   `SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX'. */
 #define HAVE_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX 1
--- a/src/3rdparty/hwloc/include/private/internal-components.h
+++ b/src/3rdparty/hwloc/include/private/internal-components.h
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2018-2019 Inria.  All rights reserved.
+ * Copyright © 2018-2020 Inria.  All rights reserved.
 *
 * See COPYING in top-level directory.
 */
@@ -31,6 +31,7 @@ HWLOC_DECLSPEC extern const struct hwloc_component hwloc_cuda_component;
 HWLOC_DECLSPEC extern const struct hwloc_component hwloc_gl_component;
 HWLOC_DECLSPEC extern const struct hwloc_component hwloc_nvml_component;
 HWLOC_DECLSPEC extern const struct hwloc_component hwloc_rsmi_component;
+HWLOC_DECLSPEC extern const struct hwloc_component hwloc_levelzero_component;
 HWLOC_DECLSPEC extern const struct hwloc_component hwloc_opencl_component;
 HWLOC_DECLSPEC extern const struct hwloc_component hwloc_pci_component;

--- a/src/3rdparty/hwloc/include/private/misc.h
+++ b/src/3rdparty/hwloc/include/private/misc.h
@@ -504,7 +504,7 @@ hwloc__obj_type_is_icache(hwloc_obj_type_t type)
  }                                    \
 } while(0)
 #else /* HAVE_USELOCALE */
-#if __HWLOC_HAVE_ATTRIBUTE_UNUSED
+#if HWLOC_HAVE_ATTRIBUTE_UNUSED
 #define hwloc_localeswitch_declare int __dummy_nolocale __hwloc_attribute_unused
 #define hwloc_localeswitch_init()
 #else
--- a/src/3rdparty/hwloc/include/private/private.h
+++ b/src/3rdparty/hwloc/include/private/private.h
@@ -1,6 +1,6 @@
 /*
 * Copyright © 2009      CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009-2012, 2020 Université Bordeaux
 * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
 *
@@ -166,6 +166,7 @@ struct hwloc_topology {
    unsigned long kind;

 #define HWLOC_INTERNAL_DIST_FLAG_OBJS_VALID (1U<<0) /* if the objs array is valid below */
+#define HWLOC_INTERNAL_DIST_FLAG_NOT_COMMITTED (1U<<1) /* if the distances isn't in the list yet */
    unsigned iflags;

    /* objects are currently stored in physical_index order */
@@ -304,11 +305,6 @@ extern void hwloc_pci_discovery_init(struct hwloc_topology *topology);
 extern void hwloc_pci_discovery_prepare(struct hwloc_topology *topology);
 extern void hwloc_pci_discovery_exit(struct hwloc_topology *topology);

-/* Look for an object matching the given domain/bus/func,
- * either exactly or return the smallest container bridge
- */
-extern struct hwloc_obj * hwloc_pci_find_by_busid(struct hwloc_topology *topology, unsigned domain, unsigned bus, unsigned dev, unsigned func);
-
 /* Look for an object matching complete cpuset exactly, or insert one.
 * Return NULL on failure.
 * Return a good fallback (object above) on failure to insert.
@@ -408,10 +404,14 @@ extern void hwloc_internal_distances_prepare(hwloc_topology_t topology);
 extern void hwloc_internal_distances_destroy(hwloc_topology_t topology);
 extern int hwloc_internal_distances_dup(hwloc_topology_t new, hwloc_topology_t old);
 extern void hwloc_internal_distances_refresh(hwloc_topology_t topology);
-extern int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name, unsigned nbobjs, hwloc_obj_t *objs, uint64_t *values, unsigned long kind, unsigned long flags);
-extern int hwloc_internal_distances_add_by_index(hwloc_topology_t topology, const char *name, hwloc_obj_type_t unique_type, hwloc_obj_type_t *different_types, unsigned nbobjs, uint64_t *indexes, uint64_t *values, unsigned long kind, unsigned long flags);
 extern void hwloc_internal_distances_invalidate_cached_objs(hwloc_topology_t topology);

+/* these distances_add() functions are higher-level than those in hwloc/plugins.h
+ * but they may change in the future, hence they are not exported to plugins.
+ */
+extern int hwloc_internal_distances_add_by_index(hwloc_topology_t topology, const char *name, hwloc_obj_type_t unique_type, hwloc_obj_type_t *different_types, unsigned nbobjs, uint64_t *indexes, uint64_t *values, unsigned long kind, unsigned long flags);
+extern int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name, unsigned nbobjs, hwloc_obj_t *objs, uint64_t *values, unsigned long kind, unsigned long flags);
+
 extern void hwloc_internal_memattrs_init(hwloc_topology_t topology);
 extern void hwloc_internal_memattrs_prepare(hwloc_topology_t topology);
 extern void hwloc_internal_memattrs_destroy(hwloc_topology_t topology);
@@ -480,6 +480,7 @@ extern char * hwloc_progname(struct hwloc_topology *topology);
 #define HWLOC_GROUP_KIND_AIX_SDL_UNKNOWN		210	/* subkind is SDL level */
 #define HWLOC_GROUP_KIND_WINDOWS_PROCESSOR_GROUP	220	/* no subkind */
 #define HWLOC_GROUP_KIND_WINDOWS_RELATIONSHIP_UNKNOWN	221	/* no subkind */
+#define HWLOC_GROUP_KIND_LINUX_CLUSTER                  222     /* no subkind */
 /* distance groups */
 #define HWLOC_GROUP_KIND_DISTANCE			900	/* subkind is round of adding these groups during distance based grouping */
 /* finally, hwloc-specific groups required to insert something else, should disappear as soon as possible */
--- a/src/3rdparty/hwloc/include/private/windows.h
+++ b/src/3rdparty/hwloc/include/private/windows.h
@@ -0,0 +1,21 @@
+/*
+ * Copyright © 2009 Université Bordeaux
+ * Copyright © 2020 Inria.  All rights reserved.
+ *
+ * See COPYING in top-level directory.
+ */
+
+#ifndef HWLOC_PRIVATE_WINDOWS_H
+#define HWLOC_PRIVATE_WINDOWS_H
+
+#ifdef __GNUC__
+#define _ANONYMOUS_UNION __extension__
+#define _ANONYMOUS_STRUCT __extension__
+#else
+#define _ANONYMOUS_UNION
+#define _ANONYMOUS_STRUCT
+#endif /* __GNUC__ */
+#define DUMMYUNIONNAME
+#define DUMMYSTRUCTNAME
+
+#endif /* HWLOC_PRIVATE_WINDOWS_H */
--- a/src/3rdparty/hwloc/src/components.c
+++ b/src/3rdparty/hwloc/src/components.c
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2012 Université Bordeaux
 * See COPYING in top-level directory.
 */
@@ -124,7 +124,7 @@ hwloc_dlforeachfile(const char *_paths,
      *colon = '\0';

    if (hwloc_plugins_verbose)
-      fprintf(stderr, " Looking under %s\n", path);
+      fprintf(stderr, "hwloc:  Looking under %s\n", path);

    dir = opendir(path);
    if (!dir)
@@ -198,7 +198,7 @@ hwloc__dlforeach_cb(const char *filename, void *_data __hwloc_attribute_unused)
  char *componentsymbolname;

  if (hwloc_plugins_verbose)
-    fprintf(stderr, "Plugin dlforeach found `%s'\n", filename);
+    fprintf(stderr, "hwloc: Plugin dlforeach found `%s'\n", filename);

  basename = strrchr(filename, '/');
  if (!basename)
@@ -208,7 +208,7 @@ hwloc__dlforeach_cb(const char *filename, void *_data __hwloc_attribute_unused)

  if (hwloc_plugins_blacklist && strstr(hwloc_plugins_blacklist, basename)) {
    if (hwloc_plugins_verbose)
-      fprintf(stderr, "Plugin `%s' is blacklisted in the environment\n", basename);
+      fprintf(stderr, "hwloc: Plugin `%s' is blacklisted in the environment\n", basename);
    goto out;
  }

@@ -216,14 +216,14 @@ hwloc__dlforeach_cb(const char *filename, void *_data __hwloc_attribute_unused)
  handle = hwloc_dlopenext(filename);
  if (!handle) {
    if (hwloc_plugins_verbose)
-      fprintf(stderr, "Failed to load plugin: %s\n", hwloc_dlerror());
+      fprintf(stderr, "hwloc: Failed to load plugin: %s\n", hwloc_dlerror());
    goto out;
  }

  componentsymbolname = malloc(strlen(basename)+10+1);
  if (!componentsymbolname) {
    if (hwloc_plugins_verbose)
-      fprintf(stderr, "Failed to allocation component `%s' symbol\n",
+      fprintf(stderr, "hwloc: Failed to allocation component `%s' symbol\n",
 	      basename);
    goto out_with_handle;
  }
@@ -231,38 +231,38 @@ hwloc__dlforeach_cb(const char *filename, void *_data __hwloc_attribute_unused)
  component = hwloc_dlsym(handle, componentsymbolname);
  if (!component) {
    if (hwloc_plugins_verbose)
-      fprintf(stderr, "Failed to find component symbol `%s'\n",
+      fprintf(stderr, "hwloc: Failed to find component symbol `%s'\n",
 	      componentsymbolname);
    free(componentsymbolname);
    goto out_with_handle;
  }
  if (component->abi != HWLOC_COMPONENT_ABI) {
    if (hwloc_plugins_verbose)
-      fprintf(stderr, "Plugin symbol ABI %u instead of %d\n",
+      fprintf(stderr, "hwloc: Plugin symbol ABI %u instead of %d\n",
 	      component->abi, HWLOC_COMPONENT_ABI);
    free(componentsymbolname);
    goto out_with_handle;
  }
  if (hwloc_plugins_verbose)
-    fprintf(stderr, "Plugin contains expected symbol `%s'\n",
+    fprintf(stderr, "hwloc: Plugin contains expected symbol `%s'\n",
 	    componentsymbolname);
  free(componentsymbolname);

  if (HWLOC_COMPONENT_TYPE_DISC == component->type) {
    if (strncmp(basename, "hwloc_", 6)) {
      if (hwloc_plugins_verbose)
-	fprintf(stderr, "Plugin name `%s' doesn't match its type DISCOVERY\n", basename);
+	fprintf(stderr, "hwloc: Plugin name `%s' doesn't match its type DISCOVERY\n", basename);
      goto out_with_handle;
    }
  } else if (HWLOC_COMPONENT_TYPE_XML == component->type) {
    if (strncmp(basename, "hwloc_xml_", 10)) {
      if (hwloc_plugins_verbose)
-	fprintf(stderr, "Plugin name `%s' doesn't match its type XML\n", basename);
+	fprintf(stderr, "hwloc: Plugin name `%s' doesn't match its type XML\n", basename);
      goto out_with_handle;
    }
  } else {
    if (hwloc_plugins_verbose)
-      fprintf(stderr, "Plugin name `%s' has invalid type %u\n",
+      fprintf(stderr, "hwloc: Plugin name `%s' has invalid type %u\n",
 	      basename, (unsigned) component->type);
    goto out_with_handle;
  }
@@ -277,7 +277,7 @@ hwloc__dlforeach_cb(const char *filename, void *_data __hwloc_attribute_unused)
  desc->handle = handle;
  desc->next = NULL;
  if (hwloc_plugins_verbose)
-    fprintf(stderr, "Plugin descriptor `%s' ready\n", basename);
+    fprintf(stderr, "hwloc: Plugin descriptor `%s' ready\n", basename);

  /* append to the list */
  prevdesc = &hwloc_plugins;
@@ -285,7 +285,7 @@ hwloc__dlforeach_cb(const char *filename, void *_data __hwloc_attribute_unused)
    prevdesc = &((*prevdesc)->next);
  *prevdesc = desc;
  if (hwloc_plugins_verbose)
-    fprintf(stderr, "Plugin descriptor `%s' queued\n", basename);
+    fprintf(stderr, "hwloc: Plugin descriptor `%s' queued\n", basename);
  return 0;

 out_with_handle:
@@ -300,7 +300,7 @@ hwloc_plugins_exit(void)
  struct hwloc__plugin_desc *desc, *next;

  if (hwloc_plugins_verbose)
-    fprintf(stderr, "Closing all plugins\n");
+    fprintf(stderr, "hwloc: Closing all plugins\n");

  desc = hwloc_plugins;
  while (desc) {
@@ -340,7 +340,7 @@ hwloc_plugins_init(void)
  hwloc_plugins = NULL;

  if (hwloc_plugins_verbose)
-    fprintf(stderr, "Starting plugin dlforeach in %s\n", path);
+    fprintf(stderr, "hwloc: Starting plugin dlforeach in %s\n", path);
  err = hwloc_dlforeachfile(path, hwloc__dlforeach_cb, NULL);
  if (err)
    goto out_with_init;
@@ -364,14 +364,14 @@ hwloc_disc_component_register(struct hwloc_disc_component *component,
  /* check that the component name is valid */
  if (!strcmp(component->name, HWLOC_COMPONENT_STOP_NAME)) {
    if (hwloc_components_verbose)
-      fprintf(stderr, "Cannot register discovery component with reserved name `" HWLOC_COMPONENT_STOP_NAME "'\n");
+      fprintf(stderr, "hwloc: Cannot register discovery component with reserved name `" HWLOC_COMPONENT_STOP_NAME "'\n");
    return -1;
  }
  if (strchr(component->name, HWLOC_COMPONENT_EXCLUDE_CHAR)
      || strchr(component->name, HWLOC_COMPONENT_PHASESEP_CHAR)
      || strcspn(component->name, HWLOC_COMPONENT_SEPS) != strlen(component->name)) {
    if (hwloc_components_verbose)
-      fprintf(stderr, "Cannot register discovery component with name `%s' containing reserved characters `%c" HWLOC_COMPONENT_SEPS "'\n",
+      fprintf(stderr, "hwloc: Cannot register discovery component with name `%s' containing reserved characters `%c" HWLOC_COMPONENT_SEPS "'\n",
 	      component->name, HWLOC_COMPONENT_EXCLUDE_CHAR);
    return -1;
  }
@@ -386,8 +386,9 @@ hwloc_disc_component_register(struct hwloc_disc_component *component,
 				   |HWLOC_DISC_PHASE_MISC
 				   |HWLOC_DISC_PHASE_ANNOTATE
 				   |HWLOC_DISC_PHASE_TWEAK))) {
-    fprintf(stderr, "Cannot register discovery component `%s' with invalid phases 0x%x\n",
-	    component->name, component->phases);
+    if (hwloc_hide_errors() < 2)
+      fprintf(stderr, "hwloc: Cannot register discovery component `%s' with invalid phases 0x%x\n",
+              component->name, component->phases);
    return -1;
  }

@@ -398,13 +399,13 @@ hwloc_disc_component_register(struct hwloc_disc_component *component,
      if ((*prev)->priority < component->priority) {
 	/* drop the existing component */
 	if (hwloc_components_verbose)
-	  fprintf(stderr, "Dropping previously registered discovery component `%s', priority %u lower than new one %u\n",
+	  fprintf(stderr, "hwloc: Dropping previously registered discovery component `%s', priority %u lower than new one %u\n",
 		  (*prev)->name, (*prev)->priority, component->priority);
 	*prev = (*prev)->next;
      } else {
 	/* drop the new one */
 	if (hwloc_components_verbose)
-	  fprintf(stderr, "Ignoring new discovery component `%s', priority %u lower than previously registered one %u\n",
+	  fprintf(stderr, "hwloc: Ignoring new discovery component `%s', priority %u lower than previously registered one %u\n",
 		  component->name, component->priority, (*prev)->priority);
 	return -1;
      }
@@ -412,7 +413,7 @@ hwloc_disc_component_register(struct hwloc_disc_component *component,
    prev = &((*prev)->next);
  }
  if (hwloc_components_verbose)
-    fprintf(stderr, "Registered discovery component `%s' phases 0x%x with priority %u (%s%s)\n",
+    fprintf(stderr, "hwloc: Registered discovery component `%s' phases 0x%x with priority %u (%s%s)\n",
 	    component->name, component->phases, component->priority,
 	    filename ? "from plugin " : "statically build", filename ? filename : "");

@@ -475,15 +476,16 @@ hwloc_components_init(void)
  /* hwloc_static_components is created by configure in static-components.h */
  for(i=0; NULL != hwloc_static_components[i]; i++) {
    if (hwloc_static_components[i]->flags) {
-      fprintf(stderr, "Ignoring static component with invalid flags %lx\n",
-	      hwloc_static_components[i]->flags);
+      if (hwloc_hide_errors() < 2)
+        fprintf(stderr, "hwloc: Ignoring static component with invalid flags %lx\n",
+                hwloc_static_components[i]->flags);
      continue;
    }

    /* initialize the component */
    if (hwloc_static_components[i]->init && hwloc_static_components[i]->init(0) < 0) {
      if (hwloc_components_verbose)
-	fprintf(stderr, "Ignoring static component, failed to initialize\n");
+	fprintf(stderr, "hwloc: Ignoring static component, failed to initialize\n");
      continue;
    }
    /* queue ->finalize() callback if any */
@@ -503,15 +505,16 @@ hwloc_components_init(void)
 #ifdef HWLOC_HAVE_PLUGINS
  for(desc = hwloc_plugins; NULL != desc; desc = desc->next) {
    if (desc->component->flags) {
-      fprintf(stderr, "Ignoring plugin `%s' component with invalid flags %lx\n",
-	      desc->name, desc->component->flags);
+      if (hwloc_hide_errors() < 2)
+        fprintf(stderr, "hwloc: Ignoring plugin `%s' component with invalid flags %lx\n",
+                desc->name, desc->component->flags);
      continue;
    }

    /* initialize the component */
    if (desc->component->init && desc->component->init(0) < 0) {
      if (hwloc_components_verbose)
-	fprintf(stderr, "Ignoring plugin `%s', failed to initialize\n", desc->name);
+	fprintf(stderr, "hwloc: Ignoring plugin `%s', failed to initialize\n", desc->name);
      continue;
    }
    /* queue ->finalize() callback if any */
@@ -608,7 +611,7 @@ hwloc_disc_component_blacklist_one(struct hwloc_topology *topology,
    /* replace linuxpci and linuxio with linux (with IO phases)
     * for backward compatibility with pre-v2.0 and v2.0 respectively */
    if (hwloc_components_verbose)
-      fprintf(stderr, "Replacing deprecated component `%s' with `linux' IO phases in blacklisting\n", name);
+      fprintf(stderr, "hwloc: Replacing deprecated component `%s' with `linux' IO phases in blacklisting\n", name);
    comp = hwloc_disc_component_find("linux", NULL);
    phases = HWLOC_DISC_PHASE_PCI | HWLOC_DISC_PHASE_IO | HWLOC_DISC_PHASE_MISC | HWLOC_DISC_PHASE_ANNOTATE;

@@ -624,7 +627,7 @@ hwloc_disc_component_blacklist_one(struct hwloc_topology *topology,
  }

  if (hwloc_components_verbose)
-    fprintf(stderr, "Blacklisting component `%s` phases 0x%x\n", comp->name, phases);
+    fprintf(stderr, "hwloc: Blacklisting component `%s` phases 0x%x\n", comp->name, phases);

  for(i=0; i<topology->nr_blacklisted_components; i++) {
    if (topology->blacklisted_components[i].component == comp) {
@@ -727,7 +730,7 @@ hwloc_disc_component_try_enable(struct hwloc_topology *topology,
    if (hwloc_components_verbose)
      /* do not warn if envvar_forced since system-wide HWLOC_COMPONENTS must be silently ignored after set_xml() etc.
       */
-      fprintf(stderr, "Excluding discovery component `%s' phases 0x%x, conflicts with excludes 0x%x\n",
+      fprintf(stderr, "hwloc: Excluding discovery component `%s' phases 0x%x, conflicts with excludes 0x%x\n",
 	      comp->name, comp->phases, topology->backend_excluded_phases);
    return -1;
  }
@@ -735,8 +738,8 @@ hwloc_disc_component_try_enable(struct hwloc_topology *topology,
  backend = comp->instantiate(topology, comp, topology->backend_excluded_phases | blacklisted_phases,
 			      NULL, NULL, NULL);
  if (!backend) {
-    if (hwloc_components_verbose || envvar_forced)
-      fprintf(stderr, "Failed to instantiate discovery component `%s'\n", comp->name);
+    if (hwloc_components_verbose || (envvar_forced && hwloc_hide_errors() < 2))
+      fprintf(stderr, "hwloc: Failed to instantiate discovery component `%s'\n", comp->name);
    return -1;
  }

@@ -817,7 +820,7 @@ hwloc_disc_components_enable_others(struct hwloc_topology *topology)
 	name = curenv;
 	if (!strcmp(name, "linuxpci") || !strcmp(name, "linuxio")) {
 	  if (hwloc_components_verbose)
-	    fprintf(stderr, "Replacing deprecated component `%s' with `linux' in envvar forcing\n", name);
+	    fprintf(stderr, "hwloc: Replacing deprecated component `%s' with `linux' in envvar forcing\n", name);
 	  name = "linux";
 	}

@@ -832,7 +835,8 @@ hwloc_disc_components_enable_others(struct hwloc_topology *topology)
 	  if (comp->phases & ~blacklisted_phases)
 	    hwloc_disc_component_try_enable(topology, comp, 1 /* envvar forced */, blacklisted_phases);
 	} else {
-	  fprintf(stderr, "Cannot find discovery component `%s'\n", name);
+          if (hwloc_hide_errors() < 2)
+            fprintf(stderr, "hwloc: Cannot find discovery component `%s'\n", name);
 	}

 	/* restore chars (the second loop below needs env to be unmodified) */
@@ -864,7 +868,7 @@ hwloc_disc_components_enable_others(struct hwloc_topology *topology)

      if (!(comp->phases & ~blacklisted_phases)) {
 	if (hwloc_components_verbose)
-	  fprintf(stderr, "Excluding blacklisted discovery component `%s' phases 0x%x\n",
+	  fprintf(stderr, "hwloc: Excluding blacklisted discovery component `%s' phases 0x%x\n",
 		  comp->name, comp->phases);
 	goto nextcomp;
      }
@@ -879,7 +883,7 @@ nextcomp:
    /* print a summary */
    int first = 1;
    backend = topology->backends;
-    fprintf(stderr, "Final list of enabled discovery components: ");
+    fprintf(stderr, "hwloc: Final list of enabled discovery components: ");
    while (backend != NULL) {
      fprintf(stderr, "%s%s(0x%x)", first ? "" : ",", backend->component->name, backend->phases);
      backend = backend->next;
@@ -935,7 +939,7 @@ hwloc_backend_alloc(struct hwloc_topology *topology,
  /* filter-out component phases that are excluded */
  backend->phases = component->phases & ~topology->backend_excluded_phases;
  if (backend->phases != component->phases && hwloc_components_verbose)
-    fprintf(stderr, "Trying discovery component `%s' with phases 0x%x instead of 0x%x\n",
+    fprintf(stderr, "hwloc: Trying discovery component `%s' with phases 0x%x instead of 0x%x\n",
 	    component->name, backend->phases, component->phases);
  backend->flags = 0;
  backend->discover = NULL;
@@ -963,8 +967,9 @@ hwloc_backend_enable(struct hwloc_backend *backend)

  /* check backend flags */
  if (backend->flags) {
-    fprintf(stderr, "Cannot enable discovery component `%s' phases 0x%x with unknown flags %lx\n",
-	    backend->component->name, backend->component->phases, backend->flags);
+    if (hwloc_hide_errors() < 2)
+      fprintf(stderr, "hwloc: Cannot enable discovery component `%s' phases 0x%x with unknown flags %lx\n",
+              backend->component->name, backend->component->phases, backend->flags);
    return -1;
  }

@@ -973,7 +978,7 @@ hwloc_backend_enable(struct hwloc_backend *backend)
  while (NULL != *pprev) {
    if ((*pprev)->component == backend->component) {
      if (hwloc_components_verbose)
-	fprintf(stderr, "Cannot enable  discovery component `%s' phases 0x%x twice\n",
+	fprintf(stderr, "hwloc: Cannot enable  discovery component `%s' phases 0x%x twice\n",
 		backend->component->name, backend->component->phases);
      hwloc_backend_disable(backend);
      errno = EBUSY;
@@ -983,7 +988,7 @@ hwloc_backend_enable(struct hwloc_backend *backend)
  }

  if (hwloc_components_verbose)
-    fprintf(stderr, "Enabling discovery component `%s' with phases 0x%x (among 0x%x)\n",
+    fprintf(stderr, "hwloc: Enabling discovery component `%s' with phases 0x%x (among 0x%x)\n",
 	    backend->component->name, backend->phases, backend->component->phases);

  /* enqueue at the end */
@@ -1067,7 +1072,7 @@ hwloc_backends_disable_all(struct hwloc_topology *topology)
  while (NULL != (backend = topology->backends)) {
    struct hwloc_backend *next = backend->next;
    if (hwloc_components_verbose)
-      fprintf(stderr, "Disabling discovery component `%s'\n",
+      fprintf(stderr, "hwloc: Disabling discovery component `%s'\n",
 	      backend->component->name);
    hwloc_backend_disable(backend);
    topology->backends = next;
--- a/src/3rdparty/hwloc/src/cpukinds.c
+++ b/src/3rdparty/hwloc/src/cpukinds.c
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2020 Inria.  All rights reserved.
+ * Copyright © 2020-2021 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@@ -42,6 +42,9 @@ hwloc_internal_cpukinds_dup(hwloc_topology_t new, hwloc_topology_t old)
  struct hwloc_internal_cpukind_s *kinds;
  unsigned i;

+  if (!old->nr_cpukinds)
+    return 0;
+
  kinds = hwloc_tma_malloc(tma, old->nr_cpukinds * sizeof(*kinds));
  if (!kinds)
    return -1;
@@ -270,7 +273,7 @@ hwloc__cpukinds_check_duplicate_rankings(struct hwloc_topology *topology)
  unsigned i,j;
  for(i=0; i<topology->nr_cpukinds; i++)
    for(j=i+1; j<topology->nr_cpukinds; j++)
-      if (topology->cpukinds[i].forced_efficiency == topology->cpukinds[j].forced_efficiency)
+      if (topology->cpukinds[i].ranking_value == topology->cpukinds[j].ranking_value)
        /* if any duplicate, fail */
        return -1;
  return 0;
@@ -343,7 +346,8 @@ enum hwloc_cpukinds_ranking {
  HWLOC_CPUKINDS_RANKING_DEFAULT, /* forced + frequency on ARM, forced + coretype_frequency otherwise */
  HWLOC_CPUKINDS_RANKING_NO_FORCED_EFFICIENCY, /* default without forced */
  HWLOC_CPUKINDS_RANKING_FORCED_EFFICIENCY,
-  HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY,
+  HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY, /* either coretype or frequency or both */
+  HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY_STRICT, /* both coretype and frequency are required */
  HWLOC_CPUKINDS_RANKING_CORETYPE,
  HWLOC_CPUKINDS_RANKING_FREQUENCY,
  HWLOC_CPUKINDS_RANKING_FREQUENCY_MAX,
@@ -358,9 +362,9 @@ hwloc__cpukinds_try_rank_by_info(struct hwloc_topology *topology,
 {
  unsigned i;

-  if (HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY == heuristics) {
-    hwloc_debug("Trying to rank cpukinds by coretype+frequency...\n");
-    /* we need intel_core_type + (base or max freq) for all kinds */
+  if (HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY_STRICT == heuristics) {
+    hwloc_debug("Trying to rank cpukinds by coretype+frequency_strict...\n");
+    /* we need intel_core_type AND (base or max freq) for all kinds */
    if (!summary->have_intel_core_type
        || (!summary->have_max_freq && !summary->have_base_freq))
      return -1;
@@ -373,6 +377,21 @@ hwloc__cpukinds_try_rank_by_info(struct hwloc_topology *topology,
        kind->ranking_value = (summary->summaries[i].intel_core_type << 20) + summary->summaries[i].max_freq;
    }

+  } else if (HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY == heuristics) {
+    hwloc_debug("Trying to rank cpukinds by coretype+frequency...\n");
+    /* we need intel_core_type OR (base or max freq) for all kinds */
+    if (!summary->have_intel_core_type
+        && (!summary->have_max_freq && !summary->have_base_freq))
+      return -1;
+    /* rank first by coretype (Core>>Atom) then by frequency, base if available, max otherwise */
+    for(i=0; i<topology->nr_cpukinds; i++) {
+      struct hwloc_internal_cpukind_s *kind = &topology->cpukinds[i];
+      if (summary->have_base_freq)
+        kind->ranking_value = (summary->summaries[i].intel_core_type << 20) + summary->summaries[i].base_freq;
+      else
+        kind->ranking_value = (summary->summaries[i].intel_core_type << 20) + summary->summaries[i].max_freq;
+    }
+
  } else if (HWLOC_CPUKINDS_RANKING_CORETYPE == heuristics) {
    hwloc_debug("Trying to rank cpukinds by coretype...\n");
    /* we need intel_core_type */
@@ -429,7 +448,9 @@ static int hwloc__cpukinds_compare_ranking_values(const void *_a, const void *_b
 {
  const struct hwloc_internal_cpukind_s *a = _a;
  const struct hwloc_internal_cpukind_s *b = _b;
-  return a->ranking_value - b->ranking_value;
+  uint64_t arv = a->ranking_value;
+  uint64_t brv = b->ranking_value;
+  return arv < brv ? -1 : arv > brv ? 1 : 0;
 }

 /* this function requires ranking values to be unique */
@@ -469,6 +490,8 @@ hwloc_internal_cpukinds_rank(struct hwloc_topology *topology)
      heuristics = HWLOC_CPUKINDS_RANKING_NONE;
    else if (!strcmp(env, "coretype+frequency"))
      heuristics = HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY;
+    else if (!strcmp(env, "coretype+frequency_strict"))
+      heuristics = HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY_STRICT;
    else if (!strcmp(env, "coretype"))
      heuristics = HWLOC_CPUKINDS_RANKING_CORETYPE;
    else if (!strcmp(env, "frequency"))
@@ -481,16 +504,14 @@ hwloc_internal_cpukinds_rank(struct hwloc_topology *topology)
      heuristics = HWLOC_CPUKINDS_RANKING_FORCED_EFFICIENCY;
    else if (!strcmp(env, "no_forced_efficiency"))
      heuristics = HWLOC_CPUKINDS_RANKING_NO_FORCED_EFFICIENCY;
-    else if (!hwloc_hide_errors())
-      fprintf(stderr, "Failed to recognize HWLOC_CPUKINDS_RANKING value %s\n", env);
+    else if (hwloc_hide_errors() < 2)
+      fprintf(stderr, "hwloc: Failed to recognize HWLOC_CPUKINDS_RANKING value %s\n", env);
  }

  if (heuristics == HWLOC_CPUKINDS_RANKING_DEFAULT
      || heuristics == HWLOC_CPUKINDS_RANKING_NO_FORCED_EFFICIENCY) {
    /* default is forced_efficiency first */
    struct hwloc_cpukinds_info_summary summary;
-    enum hwloc_cpukinds_ranking subheuristics;
-    const char *arch;

    if (heuristics == HWLOC_CPUKINDS_RANKING_DEFAULT)
      hwloc_debug("Using default ranking strategy...\n");
@@ -508,16 +529,7 @@ hwloc_internal_cpukinds_rank(struct hwloc_topology *topology)
      goto failed;
    hwloc__cpukinds_summarize_info(topology, &summary);

-    arch = hwloc_obj_get_info_by_name(topology->levels[0][0], "Architecture");
-    /* TODO: rather coretype_frequency only on x86/Intel? */
-    if (arch && (!strncmp(arch, "arm", 3) || !strncmp(arch, "aarch", 5)))
-      /* then frequency on ARM */
-      subheuristics = HWLOC_CPUKINDS_RANKING_FREQUENCY;
-    else
-      /* or coretype+frequency otherwise */
-      subheuristics = HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY;
-
-    err = hwloc__cpukinds_try_rank_by_info(topology, subheuristics, &summary);
+    err = hwloc__cpukinds_try_rank_by_info(topology, HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY, &summary);
    free(summary.summaries);
    if (!err)
      goto ready;
--- a/src/3rdparty/hwloc/src/distances.c
+++ b/src/3rdparty/hwloc/src/distances.c
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2010-2020 Inria.  All rights reserved.
+ * Copyright © 2010-2021 Inria.  All rights reserved.
 * Copyright © 2011-2012 Université Bordeaux
 * Copyright © 2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@@ -17,6 +17,37 @@
 static struct hwloc_internal_distances_s *
 hwloc__internal_distances_from_public(hwloc_topology_t topology, struct hwloc_distances_s *distances);

+static void
+hwloc__groups_by_distances(struct hwloc_topology *topology, unsigned nbobjs, struct hwloc_obj **objs, uint64_t *values, unsigned long kind, unsigned nbaccuracies, float *accuracies, int needcheck);
+
+static void
+hwloc_internal_distances_restrict(hwloc_obj_t *objs,
+				  uint64_t *indexes,
+				  hwloc_obj_type_t *different_types,
+				  uint64_t *values,
+				  unsigned nbobjs, unsigned disappeared);
+
+static void
+hwloc_internal_distances_print_matrix(struct hwloc_internal_distances_s *dist)
+{
+  unsigned nbobjs = dist->nbobjs;
+  hwloc_obj_t *objs = dist->objs;
+  hwloc_uint64_t *values = dist->values;
+  int gp = !HWLOC_DIST_TYPE_USE_OS_INDEX(dist->unique_type);
+  unsigned i, j;
+
+  fprintf(stderr, "%s", gp ? "gp_index" : "os_index");
+  for(j=0; j<nbobjs; j++)
+    fprintf(stderr, " % 5d", (int)(gp ? objs[j]->gp_index : objs[j]->os_index));
+  fprintf(stderr, "\n");
+  for(i=0; i<nbobjs; i++) {
+    fprintf(stderr, "  % 5d", (int)(gp ? objs[i]->gp_index : objs[i]->os_index));
+    for(j=0; j<nbobjs; j++)
+      fprintf(stderr, " % 5lld", (long long) values[i*nbobjs + j]);
+    fprintf(stderr, "\n");
+  }
+}
+
 /******************************************************
 * Global init, prepare, destroy, dup
 */
@@ -244,27 +275,33 @@ int hwloc_distances_release_remove(hwloc_topology_t topology,
  return 0;
 }

-/******************************************************
- * Add distances to the topology
+/*********************************************************
+ * Backend functions for adding distances to the topology
 */

+/* cancel a distances handle. only needed internally for now */
 static void
-hwloc__groups_by_distances(struct hwloc_topology *topology, unsigned nbobjs, struct hwloc_obj **objs, uint64_t *values, unsigned long kind, unsigned nbaccuracies, float *accuracies, int needcheck);
+hwloc_backend_distances_add__cancel(struct hwloc_internal_distances_s *dist)
+{
+  /* everything is set to NULL in hwloc_backend_distances_add_create() */
+  free(dist->name);
+  free(dist->indexes);
+  free(dist->objs);
+  free(dist->different_types);
+  free(dist->values);
+  free(dist);
+}

-/* insert a distance matrix in the topology.
- * the caller gives us the distances and objs pointers, we'll free them later.
+/* prepare a distances handle for later commit in the topology.
+ * we duplicate the caller's name.
 */
-static int
-hwloc_internal_distances__add(hwloc_topology_t topology, const char *name,
-			      hwloc_obj_type_t unique_type, hwloc_obj_type_t *different_types,
-			      unsigned nbobjs, hwloc_obj_t *objs, uint64_t *indexes, uint64_t *values,
-			      unsigned long kind, unsigned iflags)
+hwloc_backend_distances_add_handle_t
+hwloc_backend_distances_add_create(hwloc_topology_t topology,
+                                   const char *name, unsigned long kind, unsigned long flags)
 {
  struct hwloc_internal_distances_s *dist;

-  if (different_types) {
-    kind |= HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES; /* the user isn't forced to give it */
-  } else if (kind & HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES) {
+  if (flags) {
    errno = EINVAL;
    goto err;
  }
@@ -273,110 +310,54 @@ hwloc_internal_distances__add(hwloc_topology_t topology, const char *name,
  if (!dist)
    goto err;

-  if (name)
+  if (name) {
    dist->name = strdup(name); /* ignore failure */
-
-  dist->unique_type = unique_type;
-  dist->different_types = different_types;
-  dist->nbobjs = nbobjs;
-  dist->kind = kind;
-  dist->iflags = iflags;
-
-  assert(!!(iflags & HWLOC_INTERNAL_DIST_FLAG_OBJS_VALID) == !!objs);
-
-  if (!objs) {
-    assert(indexes);
-    /* we only have indexes, we'll refresh objs from there */
-    dist->indexes = indexes;
-    dist->objs = calloc(nbobjs, sizeof(hwloc_obj_t));
-    if (!dist->objs)
+    if (!dist->name)
      goto err_with_dist;
-
-  } else {
-    unsigned i;
-    assert(!indexes);
-    /* we only have objs, generate the indexes arrays so that we can refresh objs later */
-    dist->objs = objs;
-    dist->indexes = malloc(nbobjs * sizeof(*dist->indexes));
-    if (!dist->indexes)
-      goto err_with_dist;
-    if (HWLOC_DIST_TYPE_USE_OS_INDEX(dist->unique_type)) {
-      for(i=0; i<nbobjs; i++)
-	dist->indexes[i] = objs[i]->os_index;
-    } else {
-      for(i=0; i<nbobjs; i++)
-	dist->indexes[i] = objs[i]->gp_index;
-    }
  }

-  dist->values = values;
+  dist->kind = kind;
+  dist->iflags = HWLOC_INTERNAL_DIST_FLAG_NOT_COMMITTED;
+
+  dist->unique_type = HWLOC_OBJ_TYPE_NONE;
+  dist->different_types = NULL;
+  dist->nbobjs = 0;
+  dist->indexes = NULL;
+  dist->objs = NULL;
+  dist->values = NULL;

  dist->id = topology->next_dist_id++;
-
-  if (topology->last_dist)
-    topology->last_dist->next = dist;
-  else
-    topology->first_dist = dist;
-  dist->prev = topology->last_dist;
-  dist->next = NULL;
-  topology->last_dist = dist;
-  return 0;
+  return dist;

 err_with_dist:
-  if (name)
-    free(dist->name);
-  free(dist);
+  hwloc_backend_distances_add__cancel(dist);
 err:
-  free(different_types);
-  free(objs);
-  free(indexes);
-  free(values);
-  return -1;
+  return NULL;
 }

-int hwloc_internal_distances_add_by_index(hwloc_topology_t topology, const char *name,
-					  hwloc_obj_type_t unique_type, hwloc_obj_type_t *different_types, unsigned nbobjs, uint64_t *indexes, uint64_t *values,
-					  unsigned long kind, unsigned long flags)
+/* attach objects and values to a distances handle.
+ * on success, objs and values arrays are attached and will be freed with the distances.
+ * on failure, the handle is freed.
+ */
+int
+hwloc_backend_distances_add_values(hwloc_topology_t topology __hwloc_attribute_unused,
+                                   hwloc_backend_distances_add_handle_t handle,
+                                   unsigned nbobjs, hwloc_obj_t *objs,
+                                   hwloc_uint64_t *values,
+                                   unsigned long flags)
 {
-  unsigned iflags = 0; /* objs not valid */
-
-  if (nbobjs < 2) {
-    errno = EINVAL;
-    goto err;
-  }
-
-  /* cannot group without objects,
-   * and we don't group from XML anyway since the hwloc that generated the XML should have grouped already.
-   */
-  if (flags & HWLOC_DISTANCES_ADD_FLAG_GROUP) {
-    errno = EINVAL;
-    goto err;
-  }
-
-  return hwloc_internal_distances__add(topology, name, unique_type, different_types, nbobjs, NULL, indexes, values, kind, iflags);
-
- err:
-  free(indexes);
-  free(values);
-  free(different_types);
-  return -1;
-}
-
-static void
-hwloc_internal_distances_restrict(hwloc_obj_t *objs,
-				  uint64_t *indexes,
-				  uint64_t *values,
-				  unsigned nbobjs, unsigned disappeared);
-
-int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name,
-				 unsigned nbobjs, hwloc_obj_t *objs, uint64_t *values,
-				 unsigned long kind, unsigned long flags)
-{
-  hwloc_obj_type_t unique_type, *different_types;
+  struct hwloc_internal_distances_s *dist = handle;
+  hwloc_obj_type_t unique_type, *different_types = NULL;
+  hwloc_uint64_t *indexes = NULL;
  unsigned i, disappeared = 0;
-  unsigned iflags = HWLOC_INTERNAL_DIST_FLAG_OBJS_VALID;

-  if (nbobjs < 2) {
+  if (dist->nbobjs || !(dist->iflags & HWLOC_INTERNAL_DIST_FLAG_NOT_COMMITTED)) {
+    /* target distances is already set */
+    errno = EINVAL;
+    goto err;
+  }
+
+  if (flags || nbobjs < 2 || !objs || !values) {
    errno = EINVAL;
    goto err;
  }
@@ -389,15 +370,18 @@ int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name,
    /* some objects are NULL */
    if (disappeared == nbobjs) {
      /* nothing left, drop the matrix */
-      free(objs);
-      free(values);
-      return 0;
+      errno = ENOENT;
+      goto err;
    }
    /* restrict the matrix */
-    hwloc_internal_distances_restrict(objs, NULL, values, nbobjs, disappeared);
+    hwloc_internal_distances_restrict(objs, NULL, NULL, values, nbobjs, disappeared);
    nbobjs -= disappeared;
  }

+  indexes = malloc(nbobjs * sizeof(*indexes));
+  if (!indexes)
+    goto err;
+
  unique_type = objs[0]->type;
  for(i=1; i<nbobjs; i++)
    if (objs[i]->type != unique_type) {
@@ -408,16 +392,108 @@ int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name,
    /* heterogeneous types */
    different_types = malloc(nbobjs * sizeof(*different_types));
    if (!different_types)
-      goto err;
+      goto err_with_indexes;
    for(i=0; i<nbobjs; i++)
      different_types[i] = objs[i]->type;
-
-  } else {
-    /* homogeneous types */
-    different_types = NULL;
  }

-  if (topology->grouping && (flags & HWLOC_DISTANCES_ADD_FLAG_GROUP) && !different_types) {
+  dist->nbobjs = nbobjs;
+  dist->objs = objs;
+  dist->iflags |= HWLOC_INTERNAL_DIST_FLAG_OBJS_VALID;
+  dist->indexes = indexes;
+  dist->unique_type = unique_type;
+  dist->different_types = different_types;
+  dist->values = values;
+
+  if (different_types)
+    dist->kind |= HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES;
+
+  if (HWLOC_DIST_TYPE_USE_OS_INDEX(dist->unique_type)) {
+      for(i=0; i<nbobjs; i++)
+	dist->indexes[i] = objs[i]->os_index;
+    } else {
+      for(i=0; i<nbobjs; i++)
+	dist->indexes[i] = objs[i]->gp_index;
+    }
+
+  return 0;
+
+ err_with_indexes:
+  free(indexes);
+ err:
+  hwloc_backend_distances_add__cancel(dist);
+  return -1;
+}
+
+/* attach objects and values to a distance handle.
+ * on success, objs and values arrays are attached and will be freed with the distances.
+ * on failure, the handle is freed.
+ */
+static int
+hwloc_backend_distances_add_values_by_index(hwloc_topology_t topology __hwloc_attribute_unused,
+                                            hwloc_backend_distances_add_handle_t handle,
+                                            unsigned nbobjs, hwloc_obj_type_t unique_type, hwloc_obj_type_t *different_types, hwloc_uint64_t *indexes,
+                                            hwloc_uint64_t *values)
+{
+  struct hwloc_internal_distances_s *dist = handle;
+  hwloc_obj_t *objs;
+
+  if (dist->nbobjs || !(dist->iflags & HWLOC_INTERNAL_DIST_FLAG_NOT_COMMITTED)) {
+    /* target distances is already set */
+    errno = EINVAL;
+    goto err;
+  }
+  if (nbobjs < 2 || !indexes || !values || (unique_type == HWLOC_OBJ_TYPE_NONE && !different_types)) {
+    errno = EINVAL;
+    goto err;
+  }
+
+  objs = malloc(nbobjs * sizeof(*objs));
+  if (!objs)
+    goto err;
+
+  dist->nbobjs = nbobjs;
+  dist->objs = objs;
+  dist->indexes = indexes;
+  dist->unique_type = unique_type;
+  dist->different_types = different_types;
+  dist->values = values;
+
+  if (different_types)
+    dist->kind |= HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES;
+
+  return 0;
+
+ err:
+  hwloc_backend_distances_add__cancel(dist);
+  return -1;
+}
+
+/* commit a distances handle.
+ * on failure, the handle is freed with its objects and values arrays.
+ */
+int
+hwloc_backend_distances_add_commit(hwloc_topology_t topology,
+                                   hwloc_backend_distances_add_handle_t handle,
+                                   unsigned long flags)
+{
+  struct hwloc_internal_distances_s *dist = handle;
+
+  if (!dist->nbobjs || !(dist->iflags & HWLOC_INTERNAL_DIST_FLAG_NOT_COMMITTED)) {
+    /* target distances not ready for commit */
+    errno = EINVAL;
+    goto err;
+  }
+
+  if ((flags & HWLOC_DISTANCES_ADD_FLAG_GROUP) && !dist->objs) {
+    /* cannot group without objects,
+     * and we don't group from XML anyway since the hwloc that generated the XML should have grouped already.
+     */
+    errno = EINVAL;
+    goto err;
+  }
+
+  if (topology->grouping && (flags & HWLOC_DISTANCES_ADD_FLAG_GROUP) && !dist->different_types) {
    float full_accuracy = 0.f;
    float *accuracies;
    unsigned nbaccuracies;
@@ -431,26 +507,94 @@ int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name,
    }

    if (topology->grouping_verbose) {
-      unsigned j;
-      int gp = !HWLOC_DIST_TYPE_USE_OS_INDEX(unique_type);
      fprintf(stderr, "Trying to group objects using distance matrix:\n");
-      fprintf(stderr, "%s", gp ? "gp_index" : "os_index");
-      for(j=0; j<nbobjs; j++)
-	fprintf(stderr, " % 5d", (int)(gp ? objs[j]->gp_index : objs[j]->os_index));
-      fprintf(stderr, "\n");
-      for(i=0; i<nbobjs; i++) {
-	fprintf(stderr, "  % 5d", (int)(gp ? objs[i]->gp_index : objs[i]->os_index));
-	for(j=0; j<nbobjs; j++)
-	  fprintf(stderr, " % 5lld", (long long) values[i*nbobjs + j]);
-	fprintf(stderr, "\n");
-      }
+      hwloc_internal_distances_print_matrix(dist);
    }

-    hwloc__groups_by_distances(topology, nbobjs, objs, values,
-			       kind, nbaccuracies, accuracies, 1 /* check the first matrice */);
+    hwloc__groups_by_distances(topology, dist->nbobjs, dist->objs, dist->values,
+			       dist->kind, nbaccuracies, accuracies, 1 /* check the first matrix */);
  }

-  return hwloc_internal_distances__add(topology, name, unique_type, different_types, nbobjs, objs, NULL, values, kind, iflags);
+  if (topology->last_dist)
+    topology->last_dist->next = dist;
+  else
+    topology->first_dist = dist;
+  dist->prev = topology->last_dist;
+  dist->next = NULL;
+  topology->last_dist = dist;
+
+  dist->iflags &= ~HWLOC_INTERNAL_DIST_FLAG_NOT_COMMITTED;
+  return 0;
+
+ err:
+  hwloc_backend_distances_add__cancel(dist);
+  return -1;
+}
+
+/* all-in-one backend function not exported to plugins, only used by XML for now */
+int hwloc_internal_distances_add_by_index(hwloc_topology_t topology, const char *name,
+                                          hwloc_obj_type_t unique_type, hwloc_obj_type_t *different_types, unsigned nbobjs, uint64_t *indexes, uint64_t *values,
+                                          unsigned long kind, unsigned long flags)
+{
+  hwloc_backend_distances_add_handle_t handle;
+  int err;
+
+  handle = hwloc_backend_distances_add_create(topology, name, kind, 0);
+  if (!handle)
+    goto err;
+
+  err = hwloc_backend_distances_add_values_by_index(topology, handle,
+                                                    nbobjs, unique_type, different_types, indexes,
+                                                    values);
+  if (err < 0)
+    goto err;
+
+  /* arrays are now attached to the handle */
+  indexes = NULL;
+  different_types = NULL;
+  values = NULL;
+
+  err = hwloc_backend_distances_add_commit(topology, handle, flags);
+  if (err < 0)
+    goto err;
+
+  return 0;
+
+ err:
+  free(indexes);
+  free(different_types);
+  free(values);
+  return -1;
+}
+
+/* all-in-one backend function not exported to plugins, used by OS backends */
+int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name,
+                                 unsigned nbobjs, hwloc_obj_t *objs, uint64_t *values,
+                                 unsigned long kind, unsigned long flags)
+{
+  hwloc_backend_distances_add_handle_t handle;
+  int err;
+
+  handle = hwloc_backend_distances_add_create(topology, name, kind, 0);
+  if (!handle)
+    goto err;
+
+  err = hwloc_backend_distances_add_values(topology, handle,
+                                           nbobjs, objs,
+                                           values,
+                                           0);
+  if (err < 0)
+    goto err;
+
+  /* arrays are now attached to the handle */
+  objs = NULL;
+  values = NULL;
+
+  err = hwloc_backend_distances_add_commit(topology, handle, flags);
+  if (err < 0)
+    goto err;
+
+  return 0;

 err:
  free(objs);
@@ -458,44 +602,54 @@ int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name,
  return -1;
 }

+/********************************
+ * User API for adding distances
+ */
+
 #define HWLOC_DISTANCES_KIND_FROM_ALL (HWLOC_DISTANCES_KIND_FROM_OS|HWLOC_DISTANCES_KIND_FROM_USER)
 #define HWLOC_DISTANCES_KIND_MEANS_ALL (HWLOC_DISTANCES_KIND_MEANS_LATENCY|HWLOC_DISTANCES_KIND_MEANS_BANDWIDTH)
-#define HWLOC_DISTANCES_KIND_ALL (HWLOC_DISTANCES_KIND_FROM_ALL|HWLOC_DISTANCES_KIND_MEANS_ALL)
+#define HWLOC_DISTANCES_KIND_ALL (HWLOC_DISTANCES_KIND_FROM_ALL|HWLOC_DISTANCES_KIND_MEANS_ALL|HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES)
 #define HWLOC_DISTANCES_ADD_FLAG_ALL (HWLOC_DISTANCES_ADD_FLAG_GROUP|HWLOC_DISTANCES_ADD_FLAG_GROUP_INACCURATE)

-/* The actual function exported to the user
- */
-int hwloc_distances_add(hwloc_topology_t topology,
-			unsigned nbobjs, hwloc_obj_t *objs, hwloc_uint64_t *values,
-			unsigned long kind, unsigned long flags)
+void * hwloc_distances_add_create(hwloc_topology_t topology,
+                                  const char *name, unsigned long kind,
+                                  unsigned long flags)
+{
+  if (!topology->is_loaded) {
+    errno = EINVAL;
+    return NULL;
+  }
+  if (topology->adopted_shmem_addr) {
+    errno = EPERM;
+    return NULL;
+  }
+  if ((kind & ~HWLOC_DISTANCES_KIND_ALL)
+      || hwloc_weight_long(kind & HWLOC_DISTANCES_KIND_FROM_ALL) != 1
+      || hwloc_weight_long(kind & HWLOC_DISTANCES_KIND_MEANS_ALL) != 1) {
+    errno = EINVAL;
+    return NULL;
+  }
+
+  return hwloc_backend_distances_add_create(topology, name, kind, flags);
+}
+
+int hwloc_distances_add_values(hwloc_topology_t topology,
+                               void *handle,
+                               unsigned nbobjs, hwloc_obj_t *objs,
+                               hwloc_uint64_t *values,
+                               unsigned long flags)
 {
  unsigned i;
  uint64_t *_values;
  hwloc_obj_t *_objs;
  int err;

-  if (nbobjs < 2 || !objs || !values || !topology->is_loaded) {
-    errno = EINVAL;
-    return -1;
-  }
-  if (topology->adopted_shmem_addr) {
-    errno = EPERM;
-    return -1;
-  }
-  if ((kind & ~HWLOC_DISTANCES_KIND_ALL)
-      || hwloc_weight_long(kind & HWLOC_DISTANCES_KIND_FROM_ALL) != 1
-      || hwloc_weight_long(kind & HWLOC_DISTANCES_KIND_MEANS_ALL) != 1
-      || (flags & ~HWLOC_DISTANCES_ADD_FLAG_ALL)) {
-    errno = EINVAL;
-    return -1;
-  }
-
  /* no strict need to check for duplicates, things shouldn't break */

  for(i=1; i<nbobjs; i++)
    if (!objs[i]) {
      errno = EINVAL;
-      return -1;
+      goto out;
    }

  /* copy the input arrays and give them to the topology */
@@ -506,22 +660,78 @@ int hwloc_distances_add(hwloc_topology_t topology,

  memcpy(_objs, objs, nbobjs*sizeof(hwloc_obj_t));
  memcpy(_values, values, nbobjs*nbobjs*sizeof(*_values));
-  err = hwloc_internal_distances_add(topology, NULL, nbobjs, _objs, _values, kind, flags);
-  if (err < 0)
-    goto out; /* _objs and _values freed in hwloc_internal_distances_add() */
+
+  err = hwloc_backend_distances_add_values(topology, handle, nbobjs, _objs, _values, flags);
+  if (err < 0) {
+    /* handle was canceled inside hwloc_backend_distances_add_values */
+    handle = NULL;
+    goto out_with_arrays;
+  }
+
+  return 0;
+
+ out_with_arrays:
+  free(_objs);
+  free(_values);
+ out:
+  if (handle)
+    hwloc_backend_distances_add__cancel(handle);
+  return -1;
+}
+
+int
+hwloc_distances_add_commit(hwloc_topology_t topology,
+                           void *handle,
+                           unsigned long flags)
+{
+  int err;
+
+  if (flags & ~HWLOC_DISTANCES_ADD_FLAG_ALL) {
+    errno = EINVAL;
+    goto out;
+  }
+
+  err = hwloc_backend_distances_add_commit(topology, handle, flags);
+  if (err < 0) {
+    /* handle was canceled inside hwloc_backend_distances_add_commit */
+    handle = NULL;
+    goto out;
+  }

  /* in case we added some groups, see if we need to reconnect */
  hwloc_topology_reconnect(topology, 0);

  return 0;

- out_with_arrays:
-  free(_values);
-  free(_objs);
 out:
+  if (handle)
+    hwloc_backend_distances_add__cancel(handle);
  return -1;
 }

+/* deprecated all-in-one user function */
+int hwloc_distances_add(hwloc_topology_t topology,
+			unsigned nbobjs, hwloc_obj_t *objs, hwloc_uint64_t *values,
+			unsigned long kind, unsigned long flags)
+{
+  void *handle;
+  int err;
+
+  handle = hwloc_distances_add_create(topology, NULL, kind, 0);
+  if (!handle)
+    return -1;
+
+  err = hwloc_distances_add_values(topology, handle, nbobjs, objs, values, 0);
+  if (err < 0)
+    return -1;
+
+  err = hwloc_distances_add_commit(topology, handle, flags);
+  if (err < 0)
+    return -1;
+
+  return 0;
+}
+
 /******************************************************
 * Refresh objects in distances
 */
@@ -529,6 +739,7 @@ int hwloc_distances_add(hwloc_topology_t topology,
 static void
 hwloc_internal_distances_restrict(hwloc_obj_t *objs,
 				  uint64_t *indexes,
+                                  hwloc_obj_type_t *different_types,
 				  uint64_t *values,
 				  unsigned nbobjs, unsigned disappeared)
 {
@@ -550,6 +761,8 @@ hwloc_internal_distances_restrict(hwloc_obj_t *objs,
      objs[newi] = objs[i];
      if (indexes)
 	indexes[newi] = indexes[i];
+      if (different_types)
+        different_types[newi] = different_types[i];
      newi++;
    }
 }
@@ -594,7 +807,7 @@ hwloc_internal_distances_refresh_one(hwloc_topology_t topology,
    return -1;

  if (disappeared) {
-    hwloc_internal_distances_restrict(objs, dist->indexes, dist->values, nbobjs, disappeared);
+    hwloc_internal_distances_restrict(objs, dist->indexes, dist->different_types, dist->values, nbobjs, disappeared);
    dist->nbobjs -= disappeared;
  }

@@ -1087,3 +1300,210 @@ hwloc__groups_by_distances(struct hwloc_topology *topology,
 out_with_groupids:
  free(groupids);
 }
+
+static int
+hwloc__distances_transform_remove_null(struct hwloc_distances_s *distances)
+{
+  hwloc_uint64_t *values = distances->values;
+  hwloc_obj_t *objs = distances->objs;
+  unsigned i, nb, nbobjs = distances->nbobjs;
+  hwloc_obj_type_t unique_type;
+
+  for(i=0, nb=0; i<nbobjs; i++)
+    if (objs[i])
+      nb++;
+
+  if (nb < 2) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  if (nb == nbobjs)
+    return 0;
+
+  hwloc_internal_distances_restrict(objs, NULL, NULL, values, nbobjs, nbobjs-nb);
+  distances->nbobjs = nb;
+
+  /* update HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES for convenience */
+  unique_type = objs[0]->type;
+  for(i=1; i<nb; i++)
+    if (objs[i]->type != unique_type) {
+      unique_type = HWLOC_OBJ_TYPE_NONE;
+      break;
+    }
+  if (unique_type == HWLOC_OBJ_TYPE_NONE)
+    distances->kind |= HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES;
+  else
+    distances->kind &= ~HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES;
+
+  return 0;
+}
+
+static int
+hwloc__distances_transform_links(struct hwloc_distances_s *distances)
+{
+  /* FIXME: we should look for the greatest common denominator
+   * but we just use the smallest positive value, that's enough for current use-cases.
+   * We'll return -1 in other cases.
+   */
+  hwloc_uint64_t divider, *values = distances->values;
+  unsigned i, nbobjs = distances->nbobjs;
+
+  if (!(distances->kind & HWLOC_DISTANCES_KIND_MEANS_BANDWIDTH)) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  for(i=0; i<nbobjs; i++)
+    values[i*nbobjs+i] = 0;
+
+  /* find the smallest positive value */
+  divider = 0;
+  for(i=0; i<nbobjs*nbobjs; i++)
+    if (values[i] && (!divider || values[i] < divider))
+      divider = values[i];
+
+  if (!divider)
+    /* only zeroes? do nothing */
+    return 0;
+
+  /* check it divides all values */
+  for(i=0; i<nbobjs*nbobjs; i++)
+    if (values[i]%divider) {
+      errno = ENOENT;
+      return -1;
+    }
+
+  /* ok, now divide for real */
+  for(i=0; i<nbobjs*nbobjs; i++)
+    values[i] /= divider;
+
+  return 0;
+}
+
+static __hwloc_inline int is_nvswitch(hwloc_obj_t obj)
+{
+  return obj && obj->subtype && !strcmp(obj->subtype, "NVSwitch");
+}
+
+static int
+hwloc__distances_transform_merge_switch_ports(hwloc_topology_t topology,
+                                              struct hwloc_distances_s *distances)
+{
+  struct hwloc_internal_distances_s *dist = hwloc__internal_distances_from_public(topology, distances);
+  hwloc_obj_t *objs = distances->objs;
+  hwloc_uint64_t *values = distances->values;
+  unsigned first, i, j, nbobjs = distances->nbobjs;
+
+  if (strcmp(dist->name, "NVLinkBandwidth")) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  /* find the first port */
+  first = (unsigned) -1;
+  for(i=0; i<nbobjs; i++)
+    if (is_nvswitch(objs[i])) {
+      first = i;
+      break;
+    }
+  if (first == (unsigned)-1) {
+    errno = ENOENT;
+    return -1;
+  }
+
+  for(j=i+1; j<nbobjs; j++) {
+    if (is_nvswitch(objs[j])) {
+      /* another port, merge it */
+      unsigned k;
+      for(k=0; k<nbobjs; k++) {
+        if (k==i || k==j)
+          continue;
+        values[k*nbobjs+i] += values[k*nbobjs+j];
+        values[k*nbobjs+j] = 0;
+        values[i*nbobjs+k] += values[j*nbobjs+k];
+        values[j*nbobjs+k] = 0;
+      }
+      values[i*nbobjs+i] += values[j*nbobjs+j];
+      values[j*nbobjs+j] = 0;
+    }
+    /* the caller will also call REMOVE_NULL to remove other ports */
+    objs[j] = NULL;
+  }
+
+  return 0;
+}
+
+static int
+hwloc__distances_transform_transitive_closure(hwloc_topology_t topology,
+                                              struct hwloc_distances_s *distances)
+{
+  struct hwloc_internal_distances_s *dist = hwloc__internal_distances_from_public(topology, distances);
+  hwloc_obj_t *objs = distances->objs;
+  hwloc_uint64_t *values = distances->values;
+  unsigned nbobjs = distances->nbobjs;
+  unsigned i, j, k;
+
+  if (strcmp(dist->name, "NVLinkBandwidth")) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  for(i=0; i<nbobjs; i++) {
+    hwloc_uint64_t bw_i2sw = 0;
+    if (is_nvswitch(objs[i]))
+      continue;
+    /* count our BW to the switch */
+    for(k=0; k<nbobjs; k++)
+      if (is_nvswitch(objs[k]))
+        bw_i2sw += values[i*nbobjs+k];
+
+    for(j=0; j<nbobjs; j++) {
+      hwloc_uint64_t bw_sw2j = 0;
+      if (i == j || is_nvswitch(objs[j]))
+        continue;
+      /* count our BW from the switch */
+      for(k=0; k<nbobjs; k++)
+        if (is_nvswitch(objs[k]))
+          bw_sw2j += values[k*nbobjs+j];
+
+      /* bandwidth from i to j is now min(i2sw,sw2j) */
+      values[i*nbobjs+j] = bw_i2sw > bw_sw2j ? bw_sw2j : bw_i2sw;
+    }
+  }
+
+  return 0;
+}
+
+int
+hwloc_distances_transform(hwloc_topology_t topology,
+                          struct hwloc_distances_s *distances,
+                          enum hwloc_distances_transform_e transform,
+                          void *transform_attr,
+                          unsigned long flags)
+{
+  if (flags || transform_attr) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  switch (transform) {
+  case HWLOC_DISTANCES_TRANSFORM_REMOVE_NULL:
+    return hwloc__distances_transform_remove_null(distances);
+  case HWLOC_DISTANCES_TRANSFORM_LINKS:
+    return hwloc__distances_transform_links(distances);
+  case HWLOC_DISTANCES_TRANSFORM_MERGE_SWITCH_PORTS:
+  {
+    int err;
+    err = hwloc__distances_transform_merge_switch_ports(topology, distances);
+    if (!err)
+      err = hwloc__distances_transform_remove_null(distances);
+    return err;
+  }
+  case HWLOC_DISTANCES_TRANSFORM_TRANSITIVE_CLOSURE:
+    return hwloc__distances_transform_transitive_closure(topology, distances);
+  default:
+    errno = EINVAL;
+    return -1;
+  }
+}
--- a/src/3rdparty/hwloc/src/memattrs.c
+++ b/src/3rdparty/hwloc/src/memattrs.c
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2020 Inria.  All rights reserved.
+ * Copyright © 2020-2021 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@@ -127,6 +127,8 @@ hwloc_internal_memattrs_dup(struct hwloc_topology *new, struct hwloc_topology *o
  struct hwloc_internal_memattr_s *imattrs;
  hwloc_memattr_id_t id;

+  /* old->nr_memattrs is always > 0 thanks to default memattrs */
+
  imattrs = hwloc_tma_malloc(tma, old->nr_memattrs * sizeof(*imattrs));
  if (!imattrs)
    return -1;
--- a/src/3rdparty/hwloc/src/pci-common.c
+++ b/src/3rdparty/hwloc/src/pci-common.c
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@@ -146,8 +146,9 @@ hwloc_pci_discovery_prepare(struct hwloc_topology *topology)
 	  }
 	  free(buffer);
 	} else {
-	  fprintf(stderr, "Ignoring HWLOC_PCI_LOCALITY file `%s' too large (%lu bytes)\n",
-		  env, (unsigned long) st.st_size);
+          if (hwloc_hide_errors() < 2)
+            fprintf(stderr, "hwloc/pci: Ignoring HWLOC_PCI_LOCALITY file `%s' too large (%lu bytes)\n",
+                    env, (unsigned long) st.st_size);
 	}
      }
      close(fd);
@@ -206,8 +207,11 @@ hwloc_pci_traverse_print_cb(void * cbdata __hwloc_attribute_unused,
    else
      hwloc_debug("%s Bridge [%04x:%04x]", busid,
 		  pcidev->attr->pcidev.vendor_id, pcidev->attr->pcidev.device_id);
-    hwloc_debug(" to %04x:[%02x:%02x]\n",
-		pcidev->attr->bridge.downstream.pci.domain, pcidev->attr->bridge.downstream.pci.secondary_bus, pcidev->attr->bridge.downstream.pci.subordinate_bus);
+    if (pcidev->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI)
+      hwloc_debug(" to %04x:[%02x:%02x]\n",
+                  pcidev->attr->bridge.downstream.pci.domain, pcidev->attr->bridge.downstream.pci.secondary_bus, pcidev->attr->bridge.downstream.pci.subordinate_bus);
+    else
+      assert(0);
  } else
    hwloc_debug("%s Device [%04x:%04x (%04x:%04x) rev=%02x class=%04x]\n", busid,
 		pcidev->attr->pcidev.vendor_id, pcidev->attr->pcidev.device_id,
@@ -251,11 +255,11 @@ hwloc_pci_compare_busids(struct hwloc_obj *a, struct hwloc_obj *b)
  if (a->attr->pcidev.domain > b->attr->pcidev.domain)
    return HWLOC_PCI_BUSID_HIGHER;

-  if (a->type == HWLOC_OBJ_BRIDGE
+  if (a->type == HWLOC_OBJ_BRIDGE && a->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI
      && b->attr->pcidev.bus >= a->attr->bridge.downstream.pci.secondary_bus
      && b->attr->pcidev.bus <= a->attr->bridge.downstream.pci.subordinate_bus)
    return HWLOC_PCI_BUSID_SUPERSET;
-  if (b->type == HWLOC_OBJ_BRIDGE
+  if (b->type == HWLOC_OBJ_BRIDGE && b->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI
      && a->attr->pcidev.bus >= b->attr->bridge.downstream.pci.secondary_bus
      && a->attr->pcidev.bus <= b->attr->bridge.downstream.pci.subordinate_bus)
    return HWLOC_PCI_BUSID_INCLUDED;
@@ -302,7 +306,7 @@ hwloc_pci_add_object(struct hwloc_obj *parent, struct hwloc_obj **parent_io_firs
      new->next_sibling = *curp;
      *curp = new;
      new->parent = parent;
-      if (new->type == HWLOC_OBJ_BRIDGE) {
+      if (new->type == HWLOC_OBJ_BRIDGE && new->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI) {
 	/* look at remaining siblings and move some below new */
 	childp = &new->io_first_child;
 	curp = &new->next_sibling;
@@ -329,7 +333,7 @@ hwloc_pci_add_object(struct hwloc_obj *parent, struct hwloc_obj **parent_io_firs
    }
    case HWLOC_PCI_BUSID_EQUAL: {
      static int reported = 0;
-      if (!reported && !hwloc_hide_errors()) {
+      if (!reported && hwloc_hide_errors() < 2) {
        fprintf(stderr, "*********************************************************\n");
        fprintf(stderr, "* hwloc %s received invalid PCI information.\n", HWLOC_VERSION);
        fprintf(stderr, "*\n");
@@ -411,7 +415,7 @@ hwloc_pcidisc_add_hostbridges(struct hwloc_topology *topology,
    dstnextp = &child->next_sibling;

    /* compute hostbridge secondary/subordinate buses */
-    if (child->type == HWLOC_OBJ_BRIDGE
+    if (child->type == HWLOC_OBJ_BRIDGE && child->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI
 	&& child->attr->bridge.downstream.pci.subordinate_bus > current_subordinate)
      current_subordinate = child->attr->bridge.downstream.pci.subordinate_bus;

@@ -486,7 +490,8 @@ hwloc__pci_find_busid_parent(struct hwloc_topology *topology, struct hwloc_pcide
    if (env) {
      static int reported = 0;
      if (!topology->pci_has_forced_locality && !reported) {
-	fprintf(stderr, "Environment variable %s is deprecated, please use HWLOC_PCI_LOCALITY instead.\n", env);
+        if (!hwloc_hide_errors())
+          fprintf(stderr, "hwloc/pci: Environment variable %s is deprecated, please use HWLOC_PCI_LOCALITY instead.\n", env);
 	reported = 1;
      }
      if (*env) {
@@ -565,7 +570,7 @@ hwloc_pcidisc_tree_attach(struct hwloc_topology *topology, struct hwloc_obj *tre
    assert(pciobj->type == HWLOC_OBJ_PCI_DEVICE
 	   || (pciobj->type == HWLOC_OBJ_BRIDGE && pciobj->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI));

-    if (obj->type == HWLOC_OBJ_BRIDGE) {
+    if (obj->type == HWLOC_OBJ_BRIDGE && obj->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI) {
      domain = obj->attr->bridge.downstream.pci.domain;
      bus_min = obj->attr->bridge.downstream.pci.secondary_bus;
      bus_max = obj->attr->bridge.downstream.pci.subordinate_bus;
@@ -805,13 +810,14 @@ hwloc_pcidisc_find_linkspeed(const unsigned char *config,
   * PCIe Gen3 = 8  GT/s signal-rate per lane with 128/130 encoding = 1   GB/s data-rate per lane
   * PCIe Gen4 = 16 GT/s signal-rate per lane with 128/130 encoding = 2   GB/s data-rate per lane
   * PCIe Gen5 = 32 GT/s signal-rate per lane with 128/130 encoding = 4   GB/s data-rate per lane
+   * PCIe Gen6 = 64 GT/s signal-rate per lane with 128/130 encoding = 8   GB/s data-rate per lane
   */

  /* lanespeed in Gbit/s */
  if (speed <= 2)
    lanespeed = 2.5f * speed * 0.8f;
  else
-    lanespeed = 8.0f * (1<<(speed-3)) * 128/130; /* assume Gen6 will be 64 GT/s and so on */
+    lanespeed = 8.0f * (1<<(speed-3)) * 128/130; /* assume Gen7 will be 128 GT/s and so on */

  /* linkspeed in GB/s */
  *linkspeed = lanespeed * width / 8;
--- a/src/3rdparty/hwloc/src/static-components.h
+++ b/src/3rdparty/hwloc/src/static-components.h
@@ -1,4 +1,9 @@
-#include <private/internal-components.h>
+HWLOC_DECLSPEC extern const struct hwloc_component hwloc_noos_component;
+HWLOC_DECLSPEC extern const struct hwloc_component hwloc_xml_component;
+HWLOC_DECLSPEC extern const struct hwloc_component hwloc_synthetic_component;
+HWLOC_DECLSPEC extern const struct hwloc_component hwloc_xml_nolibxml_component;
+HWLOC_DECLSPEC extern const struct hwloc_component hwloc_windows_component;
+HWLOC_DECLSPEC extern const struct hwloc_component hwloc_x86_component;
 static const struct hwloc_component * hwloc_static_components[] = {
  &hwloc_noos_component,
  &hwloc_xml_component,
--- a/src/3rdparty/hwloc/src/topology-windows.c
+++ b/src/3rdparty/hwloc/src/topology-windows.c
@@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009-2012, 2020 Université Bordeaux
 * Copyright © 2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@@ -11,7 +11,9 @@

 #include "private/autogen/config.h"
 #include "hwloc.h"
+#include "hwloc/windows.h"
 #include "private/private.h"
+#include "private/windows.h" /* must be before windows.h */
 #include "private/debug.h"

 #include <windows.h>
@@ -64,26 +66,6 @@ typedef enum _LOGICAL_PROCESSOR_RELATIONSHIP {
 #  endif /* HAVE_RELATIONPROCESSORPACKAGE */
 #endif /* HAVE_LOGICAL_PROCESSOR_RELATIONSHIP */

-#ifndef HAVE_SYSTEM_LOGICAL_PROCESSOR_INFORMATION
-typedef struct _SYSTEM_LOGICAL_PROCESSOR_INFORMATION {
-  ULONG_PTR ProcessorMask;
-  LOGICAL_PROCESSOR_RELATIONSHIP Relationship;
-  _ANONYMOUS_UNION
-  union {
-    struct {
-      BYTE flags;
-    } ProcessorCore;
-    struct {
-      DWORD NodeNumber;
-    } NumaNode;
-    CACHE_DESCRIPTOR Cache;
-    ULONGLONG Reserved[2];
-  } DUMMYUNIONNAME;
-} SYSTEM_LOGICAL_PROCESSOR_INFORMATION, *PSYSTEM_LOGICAL_PROCESSOR_INFORMATION;
-#endif
-
-/* Extended interface, for group support */
-
 #ifndef HAVE_GROUP_AFFINITY
 typedef struct _GROUP_AFFINITY {
  KAFFINITY Mask;
@@ -92,35 +74,40 @@ typedef struct _GROUP_AFFINITY {
 } GROUP_AFFINITY, *PGROUP_AFFINITY;
 #endif

-#ifndef HAVE_PROCESSOR_RELATIONSHIP
+/* always use our own structure because the EfficiencyClass field didn't exist before Win10 */
 typedef struct HWLOC_PROCESSOR_RELATIONSHIP {
  BYTE Flags;
-  BYTE EfficiencyClass; /* for RelationProcessorCore, higher means greater performance but less efficiency, only available in Win10+ */
+  BYTE EfficiencyClass; /* for RelationProcessorCore, higher means greater performance but less efficiency */
  BYTE Reserved[20];
  WORD GroupCount;
  GROUP_AFFINITY GroupMask[ANYSIZE_ARRAY];
-} PROCESSOR_RELATIONSHIP, *PPROCESSOR_RELATIONSHIP;
-#endif
+} HWLOC_PROCESSOR_RELATIONSHIP;

-#ifndef HAVE_NUMA_NODE_RELATIONSHIP
-typedef struct _NUMA_NODE_RELATIONSHIP {
+/* always use our own structure because the GroupCount and GroupMasks fields didn't exist in some Win10 */
+typedef struct HWLOC_NUMA_NODE_RELATIONSHIP {
  DWORD NodeNumber;
-  BYTE Reserved[20];
-  GROUP_AFFINITY GroupMask;
-} NUMA_NODE_RELATIONSHIP, *PNUMA_NODE_RELATIONSHIP;
-#endif
+  BYTE Reserved[18];
+  WORD GroupCount;
+  _ANONYMOUS_UNION
+  union {
+    GROUP_AFFINITY GroupMask;
+    GROUP_AFFINITY GroupMasks[ANYSIZE_ARRAY];
+  } DUMMYUNIONNAME;
+} HWLOC_NUMA_NODE_RELATIONSHIP;

-#ifndef HAVE_CACHE_RELATIONSHIP
-typedef struct _CACHE_RELATIONSHIP {
+typedef struct HWLOC_CACHE_RELATIONSHIP {
  BYTE Level;
  BYTE Associativity;
  WORD LineSize;
  DWORD CacheSize;
  PROCESSOR_CACHE_TYPE Type;
-  BYTE Reserved[20];
-  GROUP_AFFINITY GroupMask;
-} CACHE_RELATIONSHIP, *PCACHE_RELATIONSHIP;
-#endif
+  BYTE Reserved[18];
+  WORD GroupCount;
+  union {
+    GROUP_AFFINITY GroupMask;
+    GROUP_AFFINITY GroupMasks[ANYSIZE_ARRAY];
+  } DUMMYUNIONNAME;
+} HWLOC_CACHE_RELATIONSHIP;

 #ifndef HAVE_PROCESSOR_GROUP_INFO
 typedef struct _PROCESSOR_GROUP_INFO {
@@ -140,20 +127,19 @@ typedef struct _GROUP_RELATIONSHIP {
 } GROUP_RELATIONSHIP, *PGROUP_RELATIONSHIP;
 #endif

-#ifndef HAVE_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX
-typedef struct _SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX {
+/* always use our own structure because we need our own HWLOC_PROCESSOR/CACHE/NUMA_NODE_RELATIONSHIP */
+typedef struct HWLOC_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX {
  LOGICAL_PROCESSOR_RELATIONSHIP Relationship;
  DWORD Size;
  _ANONYMOUS_UNION
  union {
-    PROCESSOR_RELATIONSHIP Processor;
-    NUMA_NODE_RELATIONSHIP NumaNode;
-    CACHE_RELATIONSHIP Cache;
+    HWLOC_PROCESSOR_RELATIONSHIP Processor;
+    HWLOC_NUMA_NODE_RELATIONSHIP NumaNode;
+    HWLOC_CACHE_RELATIONSHIP Cache;
    GROUP_RELATIONSHIP Group;
    /* Odd: no member to tell the cpu mask of the package... */
  } DUMMYUNIONNAME;
-} SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX, *PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX;
-#endif
+} HWLOC_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX;

 #ifndef HAVE_PSAPI_WORKING_SET_EX_BLOCK
 typedef union _PSAPI_WORKING_SET_EX_BLOCK {
@@ -190,9 +176,6 @@ typedef struct _PROCESSOR_NUMBER {
 typedef WORD (WINAPI *PFN_GETACTIVEPROCESSORGROUPCOUNT)(void);
 static PFN_GETACTIVEPROCESSORGROUPCOUNT GetActiveProcessorGroupCountProc;

-static unsigned long nr_processor_groups = 1;
-static unsigned long max_numanode_index = 0;
-
 typedef WORD (WINAPI *PFN_GETACTIVEPROCESSORCOUNT)(WORD);
 static PFN_GETACTIVEPROCESSORCOUNT GetActiveProcessorCountProc;

@@ -202,10 +185,7 @@ static PFN_GETCURRENTPROCESSORNUMBER GetCurrentProcessorNumberProc;
 typedef VOID (WINAPI *PFN_GETCURRENTPROCESSORNUMBEREX)(PPROCESSOR_NUMBER);
 static PFN_GETCURRENTPROCESSORNUMBEREX GetCurrentProcessorNumberExProc;

-typedef BOOL (WINAPI *PFN_GETLOGICALPROCESSORINFORMATION)(PSYSTEM_LOGICAL_PROCESSOR_INFORMATION Buffer, PDWORD ReturnLength);
-static PFN_GETLOGICALPROCESSORINFORMATION GetLogicalProcessorInformationProc;
-
-typedef BOOL (WINAPI *PFN_GETLOGICALPROCESSORINFORMATIONEX)(LOGICAL_PROCESSOR_RELATIONSHIP relationship, PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX Buffer, PDWORD ReturnLength);
+typedef BOOL (WINAPI *PFN_GETLOGICALPROCESSORINFORMATIONEX)(LOGICAL_PROCESSOR_RELATIONSHIP relationship, HWLOC_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *Buffer, PDWORD ReturnLength);
 static PFN_GETLOGICALPROCESSORINFORMATIONEX GetLogicalProcessorInformationExProc;

 typedef BOOL (WINAPI *PFN_SETTHREADGROUPAFFINITY)(HANDLE hThread, const GROUP_AFFINITY *GroupAffinity, PGROUP_AFFINITY PreviousGroupAffinity);
@@ -246,8 +226,6 @@ static void hwloc_win_get_function_ptrs(void)
 	(PFN_GETACTIVEPROCESSORGROUPCOUNT) GetProcAddress(kernel32, "GetActiveProcessorGroupCount");
      GetActiveProcessorCountProc =
 	(PFN_GETACTIVEPROCESSORCOUNT) GetProcAddress(kernel32, "GetActiveProcessorCount");
-      GetLogicalProcessorInformationProc =
-	(PFN_GETLOGICALPROCESSORINFORMATION) GetProcAddress(kernel32, "GetLogicalProcessorInformation");
      GetCurrentProcessorNumberProc =
 	(PFN_GETCURRENTPROCESSORNUMBER) GetProcAddress(kernel32, "GetCurrentProcessorNumber");
      GetCurrentProcessorNumberExProc =
@@ -270,9 +248,6 @@ static void hwloc_win_get_function_ptrs(void)
 	(PFN_VIRTUALFREEEX) GetProcAddress(kernel32, "VirtualFreeEx");
    }

-    if (GetActiveProcessorGroupCountProc)
-      nr_processor_groups = GetActiveProcessorGroupCountProc();
-
    if (!QueryWorkingSetExProc) {
      HMODULE psapi = LoadLibrary("psapi.dll");
      if (psapi)
@@ -363,6 +338,173 @@ static int hwloc_bitmap_to_single_ULONG_PTR(hwloc_const_bitmap_t set, unsigned *
  return 0;
 }

+/**********************
+ * Processor Groups
+ */
+
+static unsigned long max_numanode_index = 0;
+
+static unsigned long nr_processor_groups = 1;
+static hwloc_cpuset_t * processor_group_cpusets = NULL;
+
+static void
+hwloc_win_get_processor_groups(void)
+{
+  HWLOC_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *procInfoTotal, *tmpprocInfoTotal, *procInfo;
+  DWORD length;
+  unsigned i;
+
+  hwloc_debug("querying windows processor groups\n");
+
+  if (!GetLogicalProcessorInformationExProc)
+    goto error;
+
+  nr_processor_groups = GetActiveProcessorGroupCountProc();
+  if (!nr_processor_groups)
+    goto error;
+
+  hwloc_debug("found %lu windows processor groups\n", nr_processor_groups);
+
+  if (nr_processor_groups > 1 && SIZEOF_VOID_P == 4) {
+    if (!hwloc_hide_errors())
+      fprintf(stderr, "hwloc: multiple processor groups found on 32bits Windows, topology may be invalid/incomplete.\n");
+  }
+
+  length = 0;
+  procInfoTotal = NULL;
+
+  while (1) {
+    if (GetLogicalProcessorInformationExProc(RelationGroup, procInfoTotal, &length))
+      break;
+    if (GetLastError() != ERROR_INSUFFICIENT_BUFFER)
+      goto error;
+    tmpprocInfoTotal = realloc(procInfoTotal, length);
+    if (!tmpprocInfoTotal)
+      goto error_with_procinfo;
+    procInfoTotal = tmpprocInfoTotal;
+  }
+
+  processor_group_cpusets = calloc(nr_processor_groups, sizeof(*processor_group_cpusets));
+  if (!processor_group_cpusets)
+    goto error_with_procinfo;
+
+  for (procInfo = procInfoTotal;
+       (void*) procInfo < (void*) ((uintptr_t) procInfoTotal + length);
+       procInfo = (void*) ((uintptr_t) procInfo + procInfo->Size)) {
+    unsigned id;
+
+    assert(procInfo->Relationship == RelationGroup);
+
+    hwloc_debug("Found %u active windows processor groups\n",
+                (unsigned) procInfo->Group.ActiveGroupCount);
+    for (id = 0; id < procInfo->Group.ActiveGroupCount; id++) {
+      KAFFINITY mask;
+      hwloc_bitmap_t set;
+
+      set = hwloc_bitmap_alloc();
+      if (!set)
+        goto error_with_cpusets;
+
+      mask = procInfo->Group.GroupInfo[id].ActiveProcessorMask;
+      hwloc_debug("group %u with %u cpus mask 0x%llx\n", id,
+                  (unsigned) procInfo->Group.GroupInfo[id].ActiveProcessorCount, (unsigned long long) mask);
+      /* KAFFINITY is ULONG_PTR */
+      hwloc_bitmap_set_ith_ULONG_PTR(set, id, mask);
+      /* FIXME: what if running 32bits on a 64bits windows with 64-processor groups?
+       * ULONG_PTR is 32bits, so half the group is invisible?
+       * maybe scale id to id*8/sizeof(ULONG_PTR) so that groups are 64-PU aligned?
+       */
+      hwloc_debug_2args_bitmap("group %u %d bitmap %s\n", id, procInfo->Group.GroupInfo[id].ActiveProcessorCount, set);
+      processor_group_cpusets[id] = set;
+    }
+  }
+
+  free(procInfoTotal);
+  return;
+
+ error_with_cpusets:
+  for(i=0; i<nr_processor_groups; i++) {
+    if (processor_group_cpusets[i])
+      hwloc_bitmap_free(processor_group_cpusets[i]);
+  }
+  free(processor_group_cpusets);
+  processor_group_cpusets = NULL;
+ error_with_procinfo:
+  free(procInfoTotal);
+ error:
+  /* on error set nr to 1 and keep cpusets NULL. We'll use the topology cpuset whenever needed */
+  nr_processor_groups = 1;
+}
+
+static void
+hwloc_win_free_processor_groups(void)
+{
+  unsigned i;
+  for(i=0; i<nr_processor_groups; i++) {
+    if (processor_group_cpusets[i])
+      hwloc_bitmap_free(processor_group_cpusets[i]);
+  }
+  free(processor_group_cpusets);
+  processor_group_cpusets = NULL;
+  nr_processor_groups = 1;
+}
+
+
+int
+hwloc_windows_get_nr_processor_groups(hwloc_topology_t topology, unsigned long flags)
+{
+  if (!topology->is_loaded || !topology->is_thissystem) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  if (flags) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  return nr_processor_groups;
+}
+
+int
+hwloc_windows_get_processor_group_cpuset(hwloc_topology_t topology, unsigned pg_index, hwloc_cpuset_t cpuset, unsigned long flags)
+{
+  if (!topology->is_loaded || !topology->is_thissystem) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  if (!cpuset) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  if (flags) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  if (pg_index >= nr_processor_groups) {
+    errno = ENOENT;
+    return -1;
+  }
+
+  if (!processor_group_cpusets) {
+    assert(nr_processor_groups == 1);
+    /* we found no processor groups, return the entire topology as a single one */
+    hwloc_bitmap_copy(cpuset, topology->levels[0][0]->cpuset);
+    return 0;
+  }
+
+  if (!processor_group_cpusets[pg_index]) {
+    errno = ENOENT;
+    return -1;
+  }
+
+  hwloc_bitmap_copy(cpuset, processor_group_cpusets[pg_index]);
+  return 0;
+}
+
 /**************************************************************
 * hwloc PU numbering with respect to Windows processor groups
 *
@@ -848,6 +990,8 @@ hwloc_look_windows(struct hwloc_backend *backend, struct hwloc_disc_status *dsta
  unsigned hostname_size = sizeof(hostname);
  int has_efficiencyclass = 0;
  struct hwloc_win_efficiency_classes eclasses;
+  char *env = getenv("HWLOC_WINDOWS_PROCESSOR_GROUP_OBJS");
+  int keep_pgroup_objs = (env && atoi(env));

  assert(dstatus->phase == HWLOC_DISC_PHASE_CPU);

@@ -878,137 +1022,8 @@ hwloc_look_windows(struct hwloc_backend *backend, struct hwloc_disc_status *dsta

  GetSystemInfo(&SystemInfo);

-  if (!GetLogicalProcessorInformationExProc && GetLogicalProcessorInformationProc) {
-      PSYSTEM_LOGICAL_PROCESSOR_INFORMATION procInfo, tmpprocInfo;
-      unsigned id;
-      unsigned i;
-      struct hwloc_obj *obj;
-      hwloc_obj_type_t type;
-
-      length = 0;
-      procInfo = NULL;
-
-      while (1) {
-	if (GetLogicalProcessorInformationProc(procInfo, &length))
-	  break;
-	if (GetLastError() != ERROR_INSUFFICIENT_BUFFER)
-	  return -1;
-	tmpprocInfo = realloc(procInfo, length);
-	if (!tmpprocInfo) {
-	  free(procInfo);
-	  goto out;
-	}
-	procInfo = tmpprocInfo;
-      }
-
-      assert(!length || procInfo);
-
-      for (i = 0; i < length / sizeof(*procInfo); i++) {
-
-        /* Ignore unknown caches */
-	if (procInfo->Relationship == RelationCache
-		&& procInfo->Cache.Type != CacheUnified
-		&& procInfo->Cache.Type != CacheData
-		&& procInfo->Cache.Type != CacheInstruction)
-	  continue;
-
-	id = HWLOC_UNKNOWN_INDEX;
-	switch (procInfo[i].Relationship) {
-	  case RelationNumaNode:
-	    type = HWLOC_OBJ_NUMANODE;
-	    id = procInfo[i].NumaNode.NodeNumber;
-	    gotnuma++;
-	    if (id > max_numanode_index)
-	      max_numanode_index = id;
-	    break;
-	  case RelationProcessorPackage:
-	    type = HWLOC_OBJ_PACKAGE;
-	    break;
-	  case RelationCache:
-	    type = (procInfo[i].Cache.Type == CacheInstruction ? HWLOC_OBJ_L1ICACHE : HWLOC_OBJ_L1CACHE) + procInfo[i].Cache.Level - 1;
-	    break;
-	  case RelationProcessorCore:
-	    type = HWLOC_OBJ_CORE;
-	    break;
-	  case RelationGroup:
-	  default:
-	    type = HWLOC_OBJ_GROUP;
-	    break;
-	}
-
-	if (!hwloc_filter_check_keep_object_type(topology, type))
-	  continue;
-
-	obj = hwloc_alloc_setup_object(topology, type, id);
-        obj->cpuset = hwloc_bitmap_alloc();
-	hwloc_debug("%s#%u mask %llx\n", hwloc_obj_type_string(type), id, (unsigned long long) procInfo[i].ProcessorMask);
-	/* ProcessorMask is a ULONG_PTR */
-	hwloc_bitmap_set_ith_ULONG_PTR(obj->cpuset, 0, procInfo[i].ProcessorMask);
-	hwloc_debug_2args_bitmap("%s#%u bitmap %s\n", hwloc_obj_type_string(type), id, obj->cpuset);
-
-	switch (type) {
-	  case HWLOC_OBJ_NUMANODE:
-	    {
-	      ULONGLONG avail;
-	      obj->nodeset = hwloc_bitmap_alloc();
-	      hwloc_bitmap_set(obj->nodeset, id);
-	      if ((GetNumaAvailableMemoryNodeExProc && GetNumaAvailableMemoryNodeExProc(id, &avail))
-		  || (GetNumaAvailableMemoryNodeProc && GetNumaAvailableMemoryNodeProc(id, &avail))) {
-		obj->attr->numanode.local_memory = avail;
-		gotnumamemory++;
-	      }
-	      obj->attr->numanode.page_types_len = 2;
-	      obj->attr->numanode.page_types = malloc(2 * sizeof(*obj->attr->numanode.page_types));
-	      memset(obj->attr->numanode.page_types, 0, 2 * sizeof(*obj->attr->numanode.page_types));
-	      obj->attr->numanode.page_types_len = 1;
-	      obj->attr->numanode.page_types[0].size = SystemInfo.dwPageSize;
-#if HAVE_DECL__SC_LARGE_PAGESIZE
-	      obj->attr->numanode.page_types_len++;
-	      obj->attr->numanode.page_types[1].size = sysconf(_SC_LARGE_PAGESIZE);
-#endif
-	      break;
-	    }
-	  case HWLOC_OBJ_L1CACHE:
-	  case HWLOC_OBJ_L2CACHE:
-	  case HWLOC_OBJ_L3CACHE:
-	  case HWLOC_OBJ_L4CACHE:
-	  case HWLOC_OBJ_L5CACHE:
-	  case HWLOC_OBJ_L1ICACHE:
-	  case HWLOC_OBJ_L2ICACHE:
-	  case HWLOC_OBJ_L3ICACHE:
-	    obj->attr->cache.size = procInfo[i].Cache.Size;
-	    obj->attr->cache.associativity = procInfo[i].Cache.Associativity == CACHE_FULLY_ASSOCIATIVE ? -1 : procInfo[i].Cache.Associativity ;
-	    obj->attr->cache.linesize = procInfo[i].Cache.LineSize;
-	    obj->attr->cache.depth = procInfo[i].Cache.Level;
-	    switch (procInfo->Cache.Type) {
-	      case CacheUnified:
-		obj->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED;
-		break;
-	      case CacheData:
-		obj->attr->cache.type = HWLOC_OBJ_CACHE_DATA;
-		break;
-	      case CacheInstruction:
-		obj->attr->cache.type = HWLOC_OBJ_CACHE_INSTRUCTION;
-		break;
-	      default:
-		hwloc_free_unlinked_object(obj);
-		continue;
-	    }
-	    break;
-	  case HWLOC_OBJ_GROUP:
-	    obj->attr->group.kind = procInfo[i].Relationship == RelationGroup ? HWLOC_GROUP_KIND_WINDOWS_PROCESSOR_GROUP : HWLOC_GROUP_KIND_WINDOWS_RELATIONSHIP_UNKNOWN;
-	    break;
-	  default:
-	    break;
-	}
-	hwloc__insert_object_by_cpuset(topology, NULL, obj, "windows:GetLogicalProcessorInformation");
-      }
-
-      free(procInfo);
-  }
-
  if (GetLogicalProcessorInformationExProc) {
-      PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX procInfoTotal, tmpprocInfoTotal, procInfo;
+      HWLOC_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *procInfoTotal, *tmpprocInfoTotal, *procInfo;
      unsigned id;
      struct hwloc_obj *obj;
      hwloc_obj_type_t type;
@@ -1047,8 +1062,16 @@ hwloc_look_windows(struct hwloc_backend *backend, struct hwloc_disc_status *dsta
 	switch (procInfo->Relationship) {
 	  case RelationNumaNode:
 	    type = HWLOC_OBJ_NUMANODE;
-            num = 1;
-            GroupMask = &procInfo->NumaNode.GroupMask;
+            /* Starting with Windows 11 and Server 2022, the GroupCount field is valid and >=1
+             * and we may read GroupMasks[]. Older releases have GroupCount==0 and we must read GroupMask.
+             */
+            if (procInfo->NumaNode.GroupCount) {
+              num = procInfo->NumaNode.GroupCount;
+              GroupMask = procInfo->NumaNode.GroupMasks;
+            } else {
+              num = 1;
+              GroupMask = &procInfo->NumaNode.GroupMask;
+            }
 	    id = procInfo->NumaNode.NodeNumber;
 	    gotnuma++;
 	    if (id > max_numanode_index)
@@ -1061,18 +1084,20 @@ hwloc_look_windows(struct hwloc_backend *backend, struct hwloc_disc_status *dsta
 	    break;
 	  case RelationCache:
 	    type = (procInfo->Cache.Type == CacheInstruction ? HWLOC_OBJ_L1ICACHE : HWLOC_OBJ_L1CACHE) + procInfo->Cache.Level - 1;
-            num = 1;
-            GroupMask = &procInfo->Cache.GroupMask;
+            /* GroupCount added approximately with NumaNode.GroupCount above */
+            if (procInfo->Cache.GroupCount) {
+              num = procInfo->Cache.GroupCount;
+              GroupMask = procInfo->Cache.GroupMasks;
+            } else {
+              num = 1;
+              GroupMask = &procInfo->Cache.GroupMask;
+            }
 	    break;
 	  case RelationProcessorCore:
 	    type = HWLOC_OBJ_CORE;
            num = procInfo->Processor.GroupCount;
            GroupMask = procInfo->Processor.GroupMask;
-            if (has_efficiencyclass)
-              /* the EfficiencyClass field didn't exist before Windows10 and recent MSVC headers,
-               * so just access it manually instead of trying to detect it.
-               */
-              efficiency_class = * ((&procInfo->Processor.Flags) + 1);
+            efficiency_class = procInfo->Processor.EfficiencyClass;
 	    break;
 	  case RelationGroup:
 	    /* So strange an interface... */
@@ -1097,11 +1122,12 @@ hwloc_look_windows(struct hwloc_backend *backend, struct hwloc_disc_status *dsta
 		groups_pu_set = hwloc_bitmap_alloc();
 	      hwloc_bitmap_or(groups_pu_set, groups_pu_set, set);

-	      if (hwloc_filter_check_keep_object_type(topology, HWLOC_OBJ_GROUP)) {
+              /* Ignore processor groups unless requested and filtered-in */
+              if (keep_pgroup_objs && hwloc_filter_check_keep_object_type(topology, HWLOC_OBJ_GROUP)) {
 		obj = hwloc_alloc_setup_object(topology, HWLOC_OBJ_GROUP, id);
 		obj->cpuset = set;
 		obj->attr->group.kind = HWLOC_GROUP_KIND_WINDOWS_PROCESSOR_GROUP;
-		hwloc__insert_object_by_cpuset(topology, NULL, obj, "windows:GetLogicalProcessorInformation:ProcessorGroup");
+		hwloc__insert_object_by_cpuset(topology, NULL, obj, "windows:GetLogicalProcessorInformationEx:ProcessorGroup");
 	      } else
 		hwloc_bitmap_free(set);
 	    }
@@ -1328,11 +1354,13 @@ hwloc_set_windows_hooks(struct hwloc_binding_hooks *hooks,
 static int hwloc_windows_component_init(unsigned long flags __hwloc_attribute_unused)
 {
  hwloc_win_get_function_ptrs();
+  hwloc_win_get_processor_groups();
  return 0;
 }

 static void hwloc_windows_component_finalize(unsigned long flags __hwloc_attribute_unused)
 {
+  hwloc_win_free_processor_groups();
 }

 static struct hwloc_backend *
--- a/src/3rdparty/hwloc/src/topology-x86.c
+++ b/src/3rdparty/hwloc/src/topology-x86.c
@@ -1,5 +1,5 @@
 /*
- * Copyright © 2010-2020 Inria.  All rights reserved.
+ * Copyright © 2010-2021 Inria.  All rights reserved.
 * Copyright © 2010-2013 Université Bordeaux
 * Copyright © 2010-2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@@ -7,11 +7,14 @@
 *
 * This backend is only used when the operating system does not export
 * the necessary hardware topology information to user-space applications.
- * Currently, only the FreeBSD backend relies on this x86 backend.
+ * Currently, FreeBSD and NetBSD only add PUs and then fallback to this
+ * backend for CPU/Cache discovery.
 *
 * Other backends such as Linux have their own way to retrieve various
 * pieces of hardware topology information from the operating system
 * on various architectures, without having to use this x86-specific code.
+ * But this backend is still used after them to annotate some objects with
+ * additional details (CPU info in Package, Inclusiveness in Caches).
 */

 #include "private/autogen/config.h"
@@ -497,7 +500,8 @@ static void read_amd_cores_topoext(struct procinfo *infos, unsigned long flags,
      nodes_per_proc = ((ecx >> 8) & 7) + 1;
    }
    if ((infos->cpufamilynumber == 0x15 && nodes_per_proc > 2)
-	|| ((infos->cpufamilynumber == 0x17 || infos->cpufamilynumber == 0x18) && nodes_per_proc > 4)) {
+	|| ((infos->cpufamilynumber == 0x17 || infos->cpufamilynumber == 0x18) && nodes_per_proc > 4)
+        || (infos->cpufamilynumber == 0x19 && nodes_per_proc > 1)) {
      hwloc_debug("warning: undefined nodes_per_proc value %u, assuming it means %u\n", nodes_per_proc, nodes_per_proc);
    }
  }
@@ -610,10 +614,13 @@ static void look_proc(struct hwloc_backend *backend, struct procinfo *infos, uns
  eax = 0x01;
  cpuid_or_from_dump(&eax, &ebx, &ecx, &edx, src_cpuiddump);
  infos->apicid = ebx >> 24;
-  if (edx & (1 << 28))
+  if (edx & (1 << 28)) {
    legacy_max_log_proc = 1 << hwloc_flsl(((ebx >> 16) & 0xff) - 1);
-  else
+  } else {
+    hwloc_debug("HTT bit not set in CPUID 0x01.edx, assuming legacy_max_log_proc = 1\n");
    legacy_max_log_proc = 1;
+  }
+
  hwloc_debug("APIC ID 0x%02x legacy_max_log_proc %u\n", infos->apicid, legacy_max_log_proc);
  infos->ids[PKG] = infos->apicid / legacy_max_log_proc;
  legacy_log_proc_id = infos->apicid % legacy_max_log_proc;
@@ -676,12 +683,23 @@ static void look_proc(struct hwloc_backend *backend, struct procinfo *infos, uns
      unsigned max_nbcores;
      unsigned max_nbthreads;
      unsigned threadid __hwloc_attribute_unused;
+      hwloc_debug("Trying to get core/thread IDs from 0x04...\n");
      max_nbcores = ((eax >> 26) & 0x3f) + 1;
-      max_nbthreads = legacy_max_log_proc / max_nbcores;
-      hwloc_debug("thus %u threads\n", max_nbthreads);
-      threadid = legacy_log_proc_id % max_nbthreads;
-      infos->ids[CORE] = legacy_log_proc_id / max_nbthreads;
-      hwloc_debug("this is thread %u of core %u\n", threadid, infos->ids[CORE]);
+      hwloc_debug("found %u cores max\n", max_nbcores);
+      /* some VMs (e.g. issue#525) don't report valid information, check things before dividing by 0. */
+      if (!max_nbcores) {
+        hwloc_debug("cannot detect core/thread IDs from 0x04 without a valid max of cores\n");
+      } else {
+        max_nbthreads = legacy_max_log_proc / max_nbcores;
+        hwloc_debug("found %u threads max\n", max_nbthreads);
+        if (!max_nbthreads) {
+          hwloc_debug("cannot detect core/thread IDs from 0x04 without a valid max of threads\n");
+        } else {
+          threadid = legacy_log_proc_id % max_nbthreads;
+          infos->ids[CORE] = legacy_log_proc_id / max_nbthreads;
+          hwloc_debug("this is thread %u of core %u\n", threadid, infos->ids[CORE]);
+        }
+      }
    }
  }

@@ -772,13 +790,19 @@ static void look_proc(struct hwloc_backend *backend, struct procinfo *infos, uns

    } else if (cpuid_type == amd) {
      /* AMD quirks */
-      if (infos->cpufamilynumber == 0x17
-	  && cache->level == 3 && cache->nbthreads_sharing == 6) {
-	/* AMD family 0x17 always shares L3 between 8 APIC ids,
-	 * even when only 6 APIC ids are enabled and reported in nbthreads_sharing
-	 * (on 24-core CPUs).
+      if (infos->cpufamilynumber >= 0x17 && cache->level == 3) {
+	/* AMD family 0x19 always shares L3 between 16 APIC ids (8 HT cores).
+         * while Family 0x17 shares between 8 APIC ids (4 HT cores).
+         * But many models have less APIC ids enabled and reported in nbthreads_sharing.
+         * It means we must round-up nbthreads_sharing to the nearest power of 2
+         * before computing cacheid.
 	 */
-	cache->cacheid = infos->apicid / 8;
+        unsigned nbapics_sharing = cache->nbthreads_sharing;
+        if (nbapics_sharing & (nbapics_sharing-1))
+          /* not a power of two, round-up */
+          nbapics_sharing = 1U<<(1+hwloc_ffsl(nbapics_sharing));
+
+	cache->cacheid = infos->apicid / nbapics_sharing;

      } else if (infos->cpufamilynumber== 0x10 && infos->cpumodelnumber == 0x9
 	  && cache->level == 3
@@ -804,7 +828,7 @@ static void look_proc(struct hwloc_backend *backend, struct procinfo *infos, uns
      } else if (infos->cpufamilynumber == 0x15
 		 && (infos->cpumodelnumber == 0x1 /* Bulldozer */ || infos->cpumodelnumber == 0x2 /* Piledriver */)
 		 && cache->level == 3 && cache->nbthreads_sharing == 6) {
-	/* AMD Bulldozer and Piledriver 12-core processors have same APIC ids as Magny-Cours below,
+	/* AMD Bulldozer and Piledriver 12-core processors have same APIC ids as Magny-Cours above,
 	 * but we can't merge the checks because the original nbthreads_sharing must be exactly 6 here.
 	 */
 	cache->cacheid = (infos->apicid % legacy_max_log_proc) / cache->nbthreads_sharing /* cacheid within the package */
@@ -908,6 +932,16 @@ static void summarize(struct hwloc_backend *backend, struct procinfo *infos, uns
  int gotnuma = 0;
  int fulldiscovery = (flags & HWLOC_X86_DISC_FLAG_FULL);

+#ifdef HWLOC_DEBUG
+  hwloc_debug("\nSummary of x86 CPUID topology:\n");
+  for(i=0; i<nbprocs; i++) {
+    hwloc_debug("PU %u present=%u apicid=%u on PKG %d CORE %d DIE %d NODE %d\n",
+                i, infos[i].present, infos[i].apicid,
+                infos[i].ids[PKG], infos[i].ids[CORE], infos[i].ids[DIE], infos[i].ids[NODE]);
+  }
+  hwloc_debug("\n");
+#endif
+
  for (i = 0; i < nbprocs; i++)
    if (infos[i].present) {
      hwloc_bitmap_set(complete_cpuset, i);
@@ -1218,6 +1252,18 @@ static void summarize(struct hwloc_backend *backend, struct procinfo *infos, uns
 	    }
 	  }
 	  cache = hwloc_alloc_setup_object(topology, otype, HWLOC_UNKNOWN_INDEX);
+          /* We don't specify the os_index of caches because we want to be
+           * 100% sure they are identical to what the Linux kernel reports
+           * (so that things like resctrl work).
+           * However, vendor/model-specific quirks in the x86 code above
+           * make this difficult.
+           *
+           * Caveat: if the x86 backend is used on Linux to avoid kernel bugs,
+           * IDs won't be available to resctrl users. But resctrl heavily
+           * relies on the kernel x86 discovery being non-buggy anyway.
+           *
+           * TODO: make this optional? or only disable it on Linux?
+           */
 	  cache->attr->cache.depth = level;
 	  cache->attr->cache.size = infos[i].cache[l].size;
 	  cache->attr->cache.linesize = infos[i].cache[l].linesize;
@@ -1247,7 +1293,8 @@ static int
 look_procs(struct hwloc_backend *backend, struct procinfo *infos, unsigned long flags,
 	   unsigned highest_cpuid, unsigned highest_ext_cpuid, unsigned *features, enum cpuid_type cpuid_type,
 	   int (*get_cpubind)(hwloc_topology_t topology, hwloc_cpuset_t set, int flags),
-	   int (*set_cpubind)(hwloc_topology_t topology, hwloc_const_cpuset_t set, int flags))
+	   int (*set_cpubind)(hwloc_topology_t topology, hwloc_const_cpuset_t set, int flags),
+           hwloc_bitmap_t restrict_set)
 {
  struct hwloc_x86_backend_data_s *data = backend->private_data;
  struct hwloc_topology *topology = backend->topology;
@@ -1267,6 +1314,12 @@ look_procs(struct hwloc_backend *backend, struct procinfo *infos, unsigned long

  for (i = 0; i < nbprocs; i++) {
    struct cpuiddump *src_cpuiddump = NULL;
+
+    if (restrict_set && !hwloc_bitmap_isset(restrict_set, i)) {
+      /* skip this CPU outside of the binding mask */
+      continue;
+    }
+
    if (data->src_cpuiddump_path) {
      src_cpuiddump = cpuiddump_read(data->src_cpuiddump_path, i);
      if (!src_cpuiddump)
@@ -1400,6 +1453,7 @@ static
 int hwloc_look_x86(struct hwloc_backend *backend, unsigned long flags)
 {
  struct hwloc_x86_backend_data_s *data = backend->private_data;
+  struct hwloc_topology *topology = backend->topology;
  unsigned nbprocs = data->nbprocs;
  unsigned eax, ebx, ecx = 0, edx;
  unsigned i;
@@ -1415,9 +1469,21 @@ int hwloc_look_x86(struct hwloc_backend *backend, unsigned long flags)
  struct hwloc_topology_membind_support memsupport __hwloc_attribute_unused;
  int (*get_cpubind)(hwloc_topology_t topology, hwloc_cpuset_t set, int flags) = NULL;
  int (*set_cpubind)(hwloc_topology_t topology, hwloc_const_cpuset_t set, int flags) = NULL;
+  hwloc_bitmap_t restrict_set = NULL;
  struct cpuiddump *src_cpuiddump = NULL;
  int ret = -1;

+  /* check if binding works */
+  memset(&hooks, 0, sizeof(hooks));
+  support.membind = &memsupport;
+  /* We could just copy the main hooks (except in some corner cases),
+   * but the current overhead is negligible, so just always reget them.
+   */
+  hwloc_set_native_binding_hooks(&hooks, &support);
+  /* in theory, those are only needed if !data->src_cpuiddump_path || HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_BINDING
+   * but that's the vast majority of cases anyway, and the overhead is very small.
+   */
+
  if (data->src_cpuiddump_path) {
    /* Just read cpuid from the dump (implies !topology->is_thissystem by default) */
    src_cpuiddump = cpuiddump_read(data->src_cpuiddump_path, 0);
@@ -1430,13 +1496,6 @@ int hwloc_look_x86(struct hwloc_backend *backend, unsigned long flags)
     * we may still force use this backend when debugging with !thissystem.
     */

-    /* check if binding works */
-    memset(&hooks, 0, sizeof(hooks));
-    support.membind = &memsupport;
-    /* We could just copy the main hooks (except in some corner cases),
-     * but the current overhead is negligible, so just always reget them.
-     */
-    hwloc_set_native_binding_hooks(&hooks, &support);
    if (hooks.get_thisthread_cpubind && hooks.set_thisthread_cpubind) {
      get_cpubind = hooks.get_thisthread_cpubind;
      set_cpubind = hooks.set_thisthread_cpubind;
@@ -1456,6 +1515,20 @@ int hwloc_look_x86(struct hwloc_backend *backend, unsigned long flags)
    }
  }

+  if (topology->flags & HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING) {
+    restrict_set = hwloc_bitmap_alloc();
+    if (!restrict_set)
+      goto out;
+    if (hooks.get_thisproc_cpubind)
+      hooks.get_thisproc_cpubind(topology, restrict_set, 0);
+    else if (hooks.get_thisthread_cpubind)
+      hooks.get_thisthread_cpubind(topology, restrict_set, 0);
+    if (hwloc_bitmap_iszero(restrict_set)) {
+      hwloc_bitmap_free(restrict_set);
+      restrict_set = NULL;
+    }
+  }
+
  if (!src_cpuiddump && !hwloc_have_x86_cpuid())
    goto out;

@@ -1520,7 +1593,7 @@ int hwloc_look_x86(struct hwloc_backend *backend, unsigned long flags)

  ret = look_procs(backend, infos, flags,
 		   highest_cpuid, highest_ext_cpuid, features, cpuid_type,
-		   get_cpubind, set_cpubind);
+		   get_cpubind, set_cpubind, restrict_set);
  if (!ret)
    /* success, we're done */
    goto out_with_os_state;
@@ -1545,6 +1618,7 @@ out_with_infos:
  }

 out:
+  hwloc_bitmap_free(restrict_set);
  if (src_cpuiddump)
    cpuiddump_free(src_cpuiddump);
  return ret;
@@ -1561,6 +1635,11 @@ hwloc_x86_discover(struct hwloc_backend *backend, struct hwloc_disc_status *dsta

  assert(dstatus->phase == HWLOC_DISC_PHASE_CPU);

+  if (topology->flags & HWLOC_TOPOLOGY_FLAG_DONT_CHANGE_BINDING) {
+    /* TODO: Things would work if there's a single PU, no need to rebind */
+    return 0;
+  }
+
  if (getenv("HWLOC_X86_TOPOEXT_NUMANODES")) {
    flags |= HWLOC_X86_DISC_FLAG_TOPOEXT_NUMANODES;
  }
@@ -1587,7 +1666,8 @@ hwloc_x86_discover(struct hwloc_backend *backend, struct hwloc_disc_status *dsta
  }

  if (topology->levels[0][0]->cpuset) {
-    /* somebody else discovered things */
+    /* somebody else discovered things, reconnect levels so that we can look at them */
+    hwloc_topology_reconnect(topology, 0);
    if (topology->nb_levels == 2 && topology->level_nbobjects[1] == data->nbprocs) {
      /* only PUs were discovered, as much as we would, complete the topology with everything else */
      alreadypus = 1;
@@ -1595,7 +1675,6 @@ hwloc_x86_discover(struct hwloc_backend *backend, struct hwloc_disc_status *dsta
    }

    /* several object types were added, we can't easily complete, just do partial discovery */
-    hwloc_topology_reconnect(topology, 0);
    ret = hwloc_look_x86(backend, flags);
    if (ret)
      hwloc_obj_add_info(topology->levels[0][0], "Backend", "x86");
--- a/src/3rdparty/hwloc/src/topology-xml.c
+++ b/src/3rdparty/hwloc/src/topology-xml.c
@@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009-2011, 2020 Université Bordeaux
 * Copyright © 2009-2018 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@@ -192,8 +192,9 @@ hwloc__xml_import_object_attr(struct hwloc_topology *topology,
 	  || lvalue == HWLOC_OBJ_CACHE_INSTRUCTION)
 	obj->attr->cache.type = (hwloc_obj_cache_type_t) lvalue;
      else
-	fprintf(stderr, "%s: ignoring invalid cache_type attribute %lu\n",
-		state->global->msgprefix, lvalue);
+        if (hwloc__xml_verbose())
+          fprintf(stderr, "%s: ignoring invalid cache_type attribute %lu\n",
+                  state->global->msgprefix, lvalue);
    } else if (hwloc__xml_verbose())
      fprintf(stderr, "%s: ignoring cache_type attribute for non-cache object type\n",
 	      state->global->msgprefix);
@@ -242,7 +243,7 @@ hwloc__xml_import_object_attr(struct hwloc_topology *topology,
  else if (!strcmp(name, "dont_merge")) {
    unsigned long lvalue = strtoul(value, NULL, 10);
    if (obj->type == HWLOC_OBJ_GROUP)
-      obj->attr->group.dont_merge = lvalue;
+      obj->attr->group.dont_merge = (unsigned char) lvalue;
    else if (hwloc__xml_verbose())
      fprintf(stderr, "%s: ignoring dont_merge attribute for non-group object type\n",
 	      state->global->msgprefix);
@@ -262,8 +263,8 @@ hwloc__xml_import_object_attr(struct hwloc_topology *topology,
 #ifndef HWLOC_HAVE_32BITS_PCI_DOMAIN
      } else if (domain > 0xffff) {
 	static int warned = 0;
-	if (!warned && !hwloc_hide_errors())
-	  fprintf(stderr, "Ignoring PCI device with non-16bit domain.\nPass --enable-32bits-pci-domain to configure to support such devices\n(warning: it would break the library ABI, don't enable unless really needed).\n");
+	if (!warned && hwloc_hide_errors() < 2)
+	  fprintf(stderr, "hwloc/xml: Ignoring PCI device with non-16bit domain.\nPass --enable-32bits-pci-domain to configure to support such devices\n(warning: it would break the library ABI, don't enable unless really needed).\n");
 	warned = 1;
 	*ignore = 1;
 #endif
@@ -337,6 +338,7 @@ hwloc__xml_import_object_attr(struct hwloc_topology *topology,
      } else {
 	obj->attr->bridge.upstream_type = (hwloc_obj_bridge_type_t) upstream_type;
 	obj->attr->bridge.downstream_type = (hwloc_obj_bridge_type_t) downstream_type;
+        /* FIXME verify that upstream/downstream type is valid */
      };
      break;
    }
@@ -361,12 +363,13 @@ hwloc__xml_import_object_attr(struct hwloc_topology *topology,
 #ifndef HWLOC_HAVE_32BITS_PCI_DOMAIN
      } else if (domain > 0xffff) {
 	static int warned = 0;
-	if (!warned && !hwloc_hide_errors())
-	  fprintf(stderr, "Ignoring bridge to PCI with non-16bit domain.\nPass --enable-32bits-pci-domain to configure to support such devices\n(warning: it would break the library ABI, don't enable unless really needed).\n");
+	if (!warned && hwloc_hide_errors() < 2)
+	  fprintf(stderr, "hwloc/xml: Ignoring bridge to PCI with non-16bit domain.\nPass --enable-32bits-pci-domain to configure to support such devices\n(warning: it would break the library ABI, don't enable unless really needed).\n");
 	warned = 1;
 	*ignore = 1;
 #endif
      } else {
+        /* FIXME verify that downstream type vs pci info are valid */
 	obj->attr->bridge.downstream.pci.domain = domain;
 	obj->attr->bridge.downstream.pci.secondary_bus = secbus;
 	obj->attr->bridge.downstream.pci.subordinate_bus = subbus;
@@ -1232,7 +1235,7 @@ hwloc__xml_import_object(hwloc_topology_t topology,
 	/* next should be before cur */
 	if (!childrengotignored) {
 	  static int reported = 0;
-	  if (!reported && !hwloc_hide_errors()) {
+	  if (!reported && hwloc_hide_errors() < 2) {
 	    hwloc__xml_import_report_outoforder(topology, next, cur);
 	    reported = 1;
 	  }
@@ -1462,6 +1465,9 @@ hwloc__xml_v2import_distances(hwloc_topology_t topology,
 	unsigned long long u;
 	if (heterotypes) {
 	  hwloc_obj_type_t t = HWLOC_OBJ_TYPE_NONE;
+          if (!*tmp)
+            /* reached the end of this indexes attribute */
+            break;
 	  if (hwloc_type_sscanf(tmp, &t, NULL, 0) < 0) {
 	    if (hwloc__xml_verbose())
 	      fprintf(stderr, "%s: %s with unrecognized heterogeneous type %s\n",
@@ -1562,7 +1568,7 @@ hwloc__xml_v2import_distances(hwloc_topology_t topology,
    }
  }

-  hwloc_internal_distances_add_by_index(topology, name, unique_type, different_types, nbobjs, indexes, u64values, kind, 0);
+  hwloc_internal_distances_add_by_index(topology, name, unique_type, different_types, nbobjs, indexes, u64values, kind, 0 /* assume grouping was applied when this matrix was discovered before exporting to XML */);

  /* prevent freeing below */
  indexes = NULL;
@@ -2644,7 +2650,8 @@ hwloc__xml_export_object_contents (hwloc__xml_export_state_t state, hwloc_topolo

      logical_to_v2array = malloc(nbobjs * sizeof(*logical_to_v2array));
      if (!logical_to_v2array) {
-	fprintf(stderr, "xml/export/v1: failed to allocated logical_to_v2array\n");
+        if (!hwloc_hide_errors())
+          fprintf(stderr, "hwloc/xml/export/v1: failed to allocated logical_to_v2array\n");
 	continue;
      }

@@ -2818,6 +2825,7 @@ hwloc__xml_v1export_object_with_memory(hwloc__xml_export_state_t parentstate, hw
    /* child has sibling, we must add a Group around those memory children */
    hwloc_obj_t group = parentstate->global->v1_memory_group;
    parentstate->new_child(parentstate, &gstate, "object");
+    group->parent = obj->parent;
    group->cpuset = obj->cpuset;
    group->complete_cpuset = obj->complete_cpuset;
    group->nodeset = obj->nodeset;
--- a/src/3rdparty/hwloc/src/topology.c
+++ b/src/3rdparty/hwloc/src/topology.c
@@ -1,8 +1,9 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009-2012, 2020 Université Bordeaux
 * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
+ * Copyright © 2022 IBM Corporation.  All rights reserved.
 * See COPYING in top-level directory.
 */

@@ -52,6 +53,57 @@
 #include <windows.h>
 #endif

+
+#ifdef HWLOC_HAVE_LEVELZERO
+/*
+ * Define ZES_ENABLE_SYSMAN=1 early so that the LevelZero backend gets Sysman enabled.
+ *
+ * Only if the levelzero was enabled in this build so that we don't enable sysman
+ * for external levelzero users when hwloc doesn't need it. If somebody ever loads
+ * an external levelzero plugin in a hwloc library built without levelzero (unlikely),
+ * he may have to manually set ZES_ENABLE_SYSMAN=1.
+ *
+ * Use the constructor if supported and/or the Windows DllMain callback.
+ * Do it in the main hwloc library instead of the levelzero component because
+ * the latter could be loaded later as a plugin.
+ *
+ * L0 seems to be using getenv() to check this variable on Windows
+ * (at least in the Intel Compute-Runtime of March 2021),
+ * but setenv() doesn't seem to exist on Windows, hence use putenv() to set the variable.
+ *
+ * For the record, Get/SetEnvironmentVariable() is not exactly the same as getenv/putenv():
+ * - getenv() doesn't see what was set with SetEnvironmentVariable()
+ * - GetEnvironmentVariable() doesn't see putenv() in cygwin (while it does in MSVC and MinGW).
+ * Hence, if L0 ever switches from getenv() to GetEnvironmentVariable(),
+ * it will break in cygwin, we'll have to use both putenv() and SetEnvironmentVariable().
+ * Hopefully L0 will provide a way to enable Sysman without env vars before it happens.
+ */
+#if HWLOC_HAVE_ATTRIBUTE_CONSTRUCTOR
+static void hwloc_constructor(void) __attribute__((constructor));
+static void hwloc_constructor(void)
+{
+  if (!getenv("ZES_ENABLE_SYSMAN"))
+#ifdef HWLOC_WIN_SYS
+    putenv("ZES_ENABLE_SYSMAN=1");
+#else
+    setenv("ZES_ENABLE_SYSMAN", "1", 1);
+#endif
+}
+#endif
+#ifdef HWLOC_WIN_SYS
+BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved)
+{
+  if (fdwReason == DLL_PROCESS_ATTACH) {
+    if (!getenv("ZES_ENABLE_SYSMAN"))
+      /* Windows does not have a setenv, so use putenv. */
+      putenv((char *) "ZES_ENABLE_SYSMAN=1");
+  }
+  return TRUE;
+}
+#endif
+#endif /* HWLOC_HAVE_LEVELZERO */
+
+
 unsigned hwloc_get_api_version(void)
 {
  return HWLOC_API_VERSION;
@@ -64,7 +116,7 @@ int hwloc_topology_abi_check(hwloc_topology_t topology)

 int hwloc_hide_errors(void)
 {
-  static int hide = 0;
+  static int hide = 1; /* only show critical errors by default. lstopo will show others */
  static int checked = 0;
  if (!checked) {
    const char *envvar = getenv("HWLOC_HIDE_ERRORS");
@@ -106,7 +158,7 @@ static void report_insert_error(hwloc_obj_t new, hwloc_obj_t old, const char *ms
 {
  static int reported = 0;

-  if (reason && !reported && !hwloc_hide_errors()) {
+  if (reason && !reported && hwloc_hide_errors() < 2) {
    char newstr[512];
    char oldstr[512];
    report_insert_error_format_obj(newstr, sizeof(newstr), new);
@@ -567,8 +619,9 @@ hwloc_free_unlinked_object(hwloc_obj_t obj)
 }

 /* Replace old with contents of new object, and make new freeable by the caller.
- * Only updates next_sibling/first_child pointers,
- * so may only be used during early discovery.
+ * Requires reconnect (for siblings pointers and group depth),
+ * fixup of sets (only the main cpuset was likely compared before merging),
+ * and update of total_memory and group depth.
 */
 static void
 hwloc_replace_linked_object(hwloc_obj_t old, hwloc_obj_t new)
@@ -1348,7 +1401,7 @@ merge_insert_equal(hwloc_obj_t new, hwloc_obj_t old)

 /* returns the result of merge, or NULL if not merged */
 static __hwloc_inline hwloc_obj_t
-hwloc__insert_try_merge_group(hwloc_obj_t old, hwloc_obj_t new)
+hwloc__insert_try_merge_group(hwloc_topology_t topology, hwloc_obj_t old, hwloc_obj_t new)
 {
  if (new->type == HWLOC_OBJ_GROUP && old->type == HWLOC_OBJ_GROUP) {
    /* which group do we keep? */
@@ -1359,6 +1412,7 @@ hwloc__insert_try_merge_group(hwloc_obj_t old, hwloc_obj_t new)

      /* keep the new one, it doesn't want to be merged */
      hwloc_replace_linked_object(old, new);
+      topology->modified = 1;
      return new;

    } else {
@@ -1366,9 +1420,12 @@ hwloc__insert_try_merge_group(hwloc_obj_t old, hwloc_obj_t new)
 	/* keep the old one, it doesn't want to be merged */
 	return old;

-      /* compare subkinds to decice who to keep */
-      if (new->attr->group.kind < old->attr->group.kind)
+      /* compare subkinds to decide which group to keep */
+      if (new->attr->group.kind < old->attr->group.kind) {
+        /* keep smaller kind */
 	hwloc_replace_linked_object(old, new);
+        topology->modified = 1;
+      }
      return old;
    }
  }
@@ -1394,6 +1451,7 @@ hwloc__insert_try_merge_group(hwloc_obj_t old, hwloc_obj_t new)
     * and let the caller free the new object
     */
    hwloc_replace_linked_object(old, new);
+    topology->modified = 1;
    return old;

  } else {
@@ -1435,7 +1493,7 @@ hwloc___insert_object_by_cpuset(struct hwloc_topology *topology, hwloc_obj_t cur
    int setres = res;

    if (res == HWLOC_OBJ_EQUAL) {
-      hwloc_obj_t merged = hwloc__insert_try_merge_group(child, obj);
+      hwloc_obj_t merged = hwloc__insert_try_merge_group(topology, child, obj);
      if (merged)
 	return merged;
      /* otherwise compare actual types to decide of the inclusion */
@@ -1859,6 +1917,9 @@ hwloc_topology_alloc_group_object(struct hwloc_topology *topology)
 static void hwloc_propagate_symmetric_subtree(hwloc_topology_t topology, hwloc_obj_t root);
 static void propagate_total_memory(hwloc_obj_t obj);
 static void hwloc_set_group_depth(hwloc_topology_t topology);
+static void hwloc_connect_children(hwloc_obj_t parent);
+static int hwloc_connect_levels(hwloc_topology_t topology);
+static int hwloc_connect_special_levels(hwloc_topology_t topology);

 hwloc_obj_t
 hwloc_topology_insert_group_object(struct hwloc_topology *topology, hwloc_obj_t obj)
@@ -1931,12 +1992,24 @@ hwloc_topology_insert_group_object(struct hwloc_topology *topology, hwloc_obj_t

  if (!res)
    return NULL;
-  if (res != obj)
-    /* merged */
+
+  if (res != obj && res->type != HWLOC_OBJ_GROUP)
+    /* merged, not into a Group, nothing to update */
    return res;

+  /* res == obj means that the object was inserted.
+   * We need to reconnect levels, fill all its cpu/node sets,
+   * compute its total memory, group depth, etc.
+   *
+   * res != obj usually means that our new group was merged into an
+   * existing object, no need to recompute anything.
+   * However, if merging with an existing group, depending on their kinds,
+   * the contents of obj may overwrite the contents of the old group.
+   * This requires reconnecting levels, filling sets, recomputing total memory, etc.
+   */
+
  /* properly inserted */
-  hwloc_obj_add_children_sets(obj);
+  hwloc_obj_add_children_sets(res);
  if (hwloc_topology_reconnect(topology, 0) < 0)
    return NULL;

@@ -1948,7 +2021,7 @@ hwloc_topology_insert_group_object(struct hwloc_topology *topology, hwloc_obj_t
 #endif
    hwloc_topology_check(topology);

-  return obj;
+  return res;
 }

 hwloc_obj_t
@@ -2289,9 +2362,15 @@ hwloc__filter_bridges(hwloc_topology_t topology, hwloc_obj_t root, unsigned dept

    child->attr->bridge.depth = depth;

-    if (child->type == HWLOC_OBJ_BRIDGE
-	&& filter == HWLOC_TYPE_FILTER_KEEP_IMPORTANT
-	&& !child->io_first_child) {
+    /* remove bridges that have no child,
+     * and pci-to-non-pci bridges (pcidev) that no child either.
+     * keep NVSwitch since they may be used in NVLink matrices.
+     */
+    if (filter == HWLOC_TYPE_FILTER_KEEP_IMPORTANT
+	&& !child->io_first_child
+        && (child->type == HWLOC_OBJ_BRIDGE
+            || (child->type == HWLOC_OBJ_PCI_DEVICE && (child->attr->pcidev.class_id >> 8) == 0x06
+                && (!child->subtype || strcmp(child->subtype, "NVSwitch"))))) {
      unlink_and_free_single_object(pchild);
      topology->modified = 1;
    }
@@ -2414,13 +2493,26 @@ hwloc_compare_levels_structure(hwloc_topology_t topology, unsigned i)
  return 0;
 }

-/* return > 0 if any level was removed, which means reconnect is needed */
-static void
+/* return > 0 if any level was removed.
+ * performs its own reconnect internally if needed
+ */
+static int
 hwloc_filter_levels_keep_structure(hwloc_topology_t topology)
 {
  unsigned i, j;
  int res = 0;

+  if (topology->modified) {
+    /* WARNING: hwloc_topology_reconnect() is duplicated partially here
+     * and at the end of this function:
+     * - we need normal levels before merging.
+     * - and we'll need to update special levels after merging.
+     */
+    hwloc_connect_children(topology->levels[0][0]);
+    if (hwloc_connect_levels(topology) < 0)
+      return -1;
+  }
+
  /* start from the bottom since we'll remove intermediate levels */
  for(i=topology->nb_levels-1; i>0; i--) {
    int replacechild = 0, replaceparent = 0;
@@ -2586,6 +2678,22 @@ hwloc_filter_levels_keep_structure(hwloc_topology_t topology)
 	topology->type_depth[type] = HWLOC_TYPE_DEPTH_MULTIPLE;
    }
  }
+
+
+  if (res > 0 || topology-> modified) {
+    /* WARNING: hwloc_topology_reconnect() is duplicated partially here
+     * and at the beginning of this function.
+     * If we merged some levels, some child+parent special children lisst
+     * may have been merged, hence specials level might need reordering,
+     * So reconnect special levels only here at the end
+     * (it's not needed at the beginning of this function).
+     */
+    if (hwloc_connect_special_levels(topology) < 0)
+      return -1;
+    topology->modified = 0;
+  }
+
+  return 0;
 }

 static void
@@ -2903,9 +3011,9 @@ hwloc_list_special_objects(hwloc_topology_t topology, hwloc_obj_t obj)
  }
 }

-/* Build I/O levels */
+/* Build Memory, I/O and Misc levels */
 static int
-hwloc_connect_io_misc_levels(hwloc_topology_t topology)
+hwloc_connect_special_levels(hwloc_topology_t topology)
 {
  unsigned i;

@@ -3070,7 +3178,8 @@ hwloc_connect_levels(hwloc_topology_t topology)
      tmpnbobjs = realloc(topology->level_nbobjects,
 			  2 * topology->nb_levels_allocated * sizeof(*topology->level_nbobjects));
      if (!tmplevels || !tmpnbobjs) {
-	fprintf(stderr, "hwloc failed to realloc level arrays to %u\n", topology->nb_levels_allocated * 2);
+        if (hwloc_hide_errors() < 2)
+          fprintf(stderr, "hwloc: failed to realloc level arrays to %u\n", topology->nb_levels_allocated * 2);

 	/* if one realloc succeeded, make sure the caller will free the new buffer */
 	if (tmplevels)
@@ -3115,6 +3224,10 @@ hwloc_connect_levels(hwloc_topology_t topology)
 int
 hwloc_topology_reconnect(struct hwloc_topology *topology, unsigned long flags)
 {
+  /* WARNING: when updating this function, the replicated code must
+   * also be updated inside hwloc_filter_levels_keep_structure()
+   */
+
  if (flags) {
    errno = EINVAL;
    return -1;
@@ -3127,7 +3240,7 @@ hwloc_topology_reconnect(struct hwloc_topology *topology, unsigned long flags)
  if (hwloc_connect_levels(topology) < 0)
    return -1;

-  if (hwloc_connect_io_misc_levels(topology) < 0)
+  if (hwloc_connect_special_levels(topology) < 0)
    return -1;

  topology->modified = 0;
@@ -3452,28 +3565,28 @@ hwloc_discover(struct hwloc_topology *topology,
  hwloc_debug("%s", "\nRemoving empty objects\n");
  remove_empty(topology, &topology->levels[0][0]);
  if (!topology->levels[0][0]) {
-    fprintf(stderr, "Topology became empty, aborting!\n");
+    if (hwloc_hide_errors() < 2)
+      fprintf(stderr, "hwloc: Topology became empty, aborting!\n");
    return -1;
  }
  if (hwloc_bitmap_iszero(topology->levels[0][0]->cpuset)) {
-    fprintf(stderr, "Topology does not contain any PU, aborting!\n");
+    if (hwloc_hide_errors() < 2)
+      fprintf(stderr, "hwloc: Topology does not contain any PU, aborting!\n");
    return -1;
  }
  if (hwloc_bitmap_iszero(topology->levels[0][0]->nodeset)) {
-    fprintf(stderr, "Topology does not contain any NUMA node, aborting!\n");
+    if (hwloc_hide_errors() < 2)
+      fprintf(stderr, "hwloc: Topology does not contain any NUMA node, aborting!\n");
    return -1;
  }
  hwloc_debug_print_objects(0, topology->levels[0][0]);

-  /* Reconnect things after all these changes.
-   * Often needed because of Groups inserted for I/Os.
-   * And required for KEEP_STRUCTURE below.
-   */
-  if (hwloc_topology_reconnect(topology, 0) < 0)
-    return -1;
-
  hwloc_debug("%s", "\nRemoving levels with HWLOC_TYPE_FILTER_KEEP_STRUCTURE\n");
-  hwloc_filter_levels_keep_structure(topology);
+  if (hwloc_filter_levels_keep_structure(topology) < 0)
+    return -1;
+  /* takes care of reconnecting children/levels internally,
+   * because it needs normal levels.
+   * and it's often needed below because of Groups inserted for I/Os anyway */
  hwloc_debug_print_objects(0, topology->levels[0][0]);

  /* accumulate children memory in total_memory fields (only once parent is set) */
@@ -3698,7 +3811,18 @@ hwloc_topology_set_flags (struct hwloc_topology *topology, unsigned long flags)
    return -1;
  }

-  if (flags & ~(HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED|HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM|HWLOC_TOPOLOGY_FLAG_THISSYSTEM_ALLOWED_RESOURCES|HWLOC_TOPOLOGY_FLAG_IMPORT_SUPPORT)) {
+  if (flags & ~(HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED|HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM|HWLOC_TOPOLOGY_FLAG_THISSYSTEM_ALLOWED_RESOURCES|HWLOC_TOPOLOGY_FLAG_IMPORT_SUPPORT|HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING|HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_MEMBINDING|HWLOC_TOPOLOGY_FLAG_DONT_CHANGE_BINDING)) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  if ((flags & (HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING|HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM)) == HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING) {
+    /* RESTRICT_TO_CPUBINDING requires THISSYSTEM for binding */
+    errno = EINVAL;
+    return -1;
+  }
+  if ((flags & (HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_MEMBINDING|HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM)) == HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_MEMBINDING) {
+    /* RESTRICT_TO_MEMBINDING requires THISSYSTEM for binding */
    errno = EINVAL;
    return -1;
  }
@@ -3985,6 +4109,31 @@ hwloc_topology_load (struct hwloc_topology *topology)

  topology->is_loaded = 1;

+  if (topology->flags & HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING) {
+    /* FIXME: filter directly in backends during the discovery.
+     * Only x86 does it because binding may cause issues on Windows.
+     */
+    hwloc_bitmap_t set = hwloc_bitmap_alloc();
+    if (set) {
+      err = hwloc_get_cpubind(topology, set, HWLOC_CPUBIND_STRICT);
+      if (!err)
+        hwloc_topology_restrict(topology, set, 0);
+      hwloc_bitmap_free(set);
+    }
+  }
+  if (topology->flags & HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_MEMBINDING) {
+    /* FIXME: filter directly in backends during the discovery.
+     */
+    hwloc_bitmap_t set = hwloc_bitmap_alloc();
+    hwloc_membind_policy_t policy;
+    if (set) {
+      err = hwloc_get_membind(topology, set, &policy, HWLOC_MEMBIND_STRICT | HWLOC_MEMBIND_BYNODESET);
+      if (!err)
+        hwloc_topology_restrict(topology, set, HWLOC_RESTRICT_FLAG_BYNODESET);
+      hwloc_bitmap_free(set);
+    }
+  }
+
  if (topology->backend_phases & HWLOC_DISC_PHASE_TWEAK) {
    dstatus.phase = HWLOC_DISC_PHASE_TWEAK;
    hwloc_discover_by_phase(topology, &dstatus, "TWEAK");
@@ -4260,14 +4409,13 @@ hwloc_topology_restrict(struct hwloc_topology *topology, hwloc_const_bitmap_t se
  hwloc_bitmap_free(droppedcpuset);
  hwloc_bitmap_free(droppednodeset);

-  if (hwloc_topology_reconnect(topology, 0) < 0)
+  if (hwloc_filter_levels_keep_structure(topology) < 0) /* takes care of reconnecting internally */
    goto out;

  /* some objects may have disappeared, we need to update distances objs arrays */
  hwloc_internal_distances_invalidate_cached_objs(topology);
  hwloc_internal_memattrs_need_refresh(topology);

-  hwloc_filter_levels_keep_structure(topology);
  hwloc_propagate_symmetric_subtree(topology, topology->levels[0][0]);
  propagate_total_memory(topology->levels[0][0]);
  hwloc_internal_cpukinds_restrict(topology);
@@ -4658,6 +4806,9 @@ hwloc__check_misc_children(hwloc_topology_t topology, hwloc_bitmap_t gp_indexes,
 static void
 hwloc__check_object(hwloc_topology_t topology, hwloc_bitmap_t gp_indexes, hwloc_obj_t obj)
 {
+  hwloc_uint64_t total_memory;
+  hwloc_obj_t child;
+
  assert(!hwloc_bitmap_isset(gp_indexes, obj->gp_index));
  hwloc_bitmap_set(gp_indexes, obj->gp_index);

@@ -4715,6 +4866,18 @@ hwloc__check_object(hwloc_topology_t topology, hwloc_bitmap_t gp_indexes, hwloc_
    assert(hwloc_cache_type_by_depth_type(obj->attr->cache.depth, obj->attr->cache.type) == obj->type);
  }

+  /* check total memory */
+  total_memory = 0;
+  if (obj->type == HWLOC_OBJ_NUMANODE)
+    total_memory += obj->attr->numanode.local_memory;
+  for_each_child(child, obj) {
+    total_memory += child->total_memory;
+  }
+  for_each_memory_child(child, obj) {
+    total_memory += child->total_memory;
+  }
+  assert(total_memory == obj->total_memory);
+
  /* check children */
  hwloc__check_normal_children(topology, gp_indexes, obj);
  hwloc__check_memory_children(topology, gp_indexes, obj);
--- a/src/3rdparty/hwloc/src/traversal.c
+++ b/src/3rdparty/hwloc/src/traversal.c
@@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009-2010, 2020 Université Bordeaux
 * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@@ -395,6 +395,8 @@ hwloc_type_sscanf(const char *string, hwloc_obj_type_t *typep,
  } else if (hwloc__type_match(string, "pcibridge", 5)) {
    type = HWLOC_OBJ_BRIDGE;
    ubtype = HWLOC_OBJ_BRIDGE_PCI;
+    /* if downstream_type can ever be non-PCI, we'll have to make strings more precise,
+     * or relax the hwloc_type_sscanf test */

  } else if (hwloc__type_match(string, "pcidev", 3)) {
    type = HWLOC_OBJ_PCI_DEVICE;
@@ -448,7 +450,9 @@ hwloc_type_sscanf(const char *string, hwloc_obj_type_t *typep,
      attrp->group.depth = depthattr;
    } else if (type == HWLOC_OBJ_BRIDGE && attrsize >= sizeof(attrp->bridge)) {
      attrp->bridge.upstream_type = ubtype;
-      attrp->bridge.downstream_type = HWLOC_OBJ_BRIDGE_PCI; /* nothing else so far */
+      attrp->bridge.downstream_type = HWLOC_OBJ_BRIDGE_PCI;
+      /* if downstream_type can ever be non-PCI, we'll have to make strings more precise,
+       * or relax the hwloc_type_sscanf test */
    } else if (type == HWLOC_OBJ_OS_DEVICE && attrsize >= sizeof(attrp->osdev)) {
      attrp->osdev.type = ostype;
    }
@@ -531,6 +535,9 @@ hwloc_obj_type_snprintf(char * __hwloc_restrict string, size_t size, hwloc_obj_t
    else
      return hwloc_snprintf(string, size, "%s", hwloc_obj_type_string(type));
  case HWLOC_OBJ_BRIDGE:
+    /* if downstream_type can ever be non-PCI, we'll have to make strings more precise,
+     * or relax the hwloc_type_sscanf test */
+    assert(obj->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI);
    return hwloc_snprintf(string, size, obj->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI ? "PCIBridge" : "HostBridge");
  case HWLOC_OBJ_PCI_DEVICE:
    return hwloc_snprintf(string, size, "PCI");
@@ -648,8 +655,11 @@ hwloc_obj_attr_snprintf(char * __hwloc_restrict string, size_t size, hwloc_obj_t
      } else
        *up = '\0';
      /* downstream is_PCI */
-      snprintf(down, sizeof(down), "buses=%04x:[%02x-%02x]",
-	       obj->attr->bridge.downstream.pci.domain, obj->attr->bridge.downstream.pci.secondary_bus, obj->attr->bridge.downstream.pci.subordinate_bus);
+      if (obj->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI) {
+        snprintf(down, sizeof(down), "buses=%04x:[%02x-%02x]",
+                 obj->attr->bridge.downstream.pci.domain, obj->attr->bridge.downstream.pci.secondary_bus, obj->attr->bridge.downstream.pci.subordinate_bus);
+      } else
+        assert(0);
      if (*up)
 	res = hwloc_snprintf(string, size, "%s%s%s", up, separator, down);
      else
@@ -736,3 +746,92 @@ int hwloc_bitmap_singlify_per_core(hwloc_topology_t topology, hwloc_bitmap_t cpu
  }
  return 0;
 }
+
+hwloc_obj_t
+hwloc_get_obj_with_same_locality(hwloc_topology_t topology, hwloc_obj_t src,
+                                 hwloc_obj_type_t type, const char *subtype, const char *nameprefix,
+                                 unsigned long flags)
+{
+  if (flags) {
+    errno = EINVAL;
+    return NULL;
+  }
+
+  if (hwloc_obj_type_is_normal(src->type) || hwloc_obj_type_is_memory(src->type)) {
+    /* normal/memory type, look for normal/memory type with same sets */
+    hwloc_obj_t obj;
+
+    if (!hwloc_obj_type_is_normal(type) && !hwloc_obj_type_is_memory(type)) {
+      errno = EINVAL;
+      return NULL;
+    }
+
+    obj = NULL;
+    while ((obj = hwloc_get_next_obj_by_type(topology, type, obj)) != NULL) {
+      if (!hwloc_bitmap_isequal(src->cpuset, obj->cpuset)
+          || !hwloc_bitmap_isequal(src->nodeset, obj->nodeset))
+        continue;
+      if (subtype && (!obj->subtype || strcasecmp(subtype, obj->subtype)))
+        continue;
+      if (nameprefix && (!obj->name || hwloc_strncasecmp(nameprefix, obj->name, strlen(nameprefix))))
+        continue;
+      return obj;
+    }
+    errno = ENOENT;
+    return NULL;
+
+  } else if (hwloc_obj_type_is_io(src->type)) {
+    /* I/O device, look for PCI/OS in same PCI */
+    hwloc_obj_t pci;
+
+    if ((src->type != HWLOC_OBJ_OS_DEVICE && src->type != HWLOC_OBJ_PCI_DEVICE)
+        || (type != HWLOC_OBJ_OS_DEVICE && type != HWLOC_OBJ_PCI_DEVICE)) {
+      errno = EINVAL;
+      return NULL;
+    }
+
+    /* walk up to find the container */
+    pci = src;
+    while (pci->type == HWLOC_OBJ_OS_DEVICE)
+      pci = pci->parent;
+
+    if (type == HWLOC_OBJ_PCI_DEVICE) {
+      if (pci->type != HWLOC_OBJ_PCI_DEVICE) {
+        errno = ENOENT;
+        return NULL;
+      }
+      if (subtype && (!pci->subtype || strcasecmp(subtype, pci->subtype))) {
+        errno = ENOENT;
+        return NULL;
+      }
+      if (nameprefix && (!pci->name || hwloc_strncasecmp(nameprefix, pci->name, strlen(nameprefix)))) {
+        errno = ENOENT;
+        return NULL;
+      }
+      return pci;
+
+    } else {
+      /* find a matching osdev child */
+      assert(type == HWLOC_OBJ_OS_DEVICE);
+      /* FIXME: won't work if we ever store osdevs in osdevs */
+      hwloc_obj_t child;
+      for(child = pci->io_first_child; child; child = child->next_sibling) {
+        if (child->type != HWLOC_OBJ_OS_DEVICE)
+          /* FIXME: should never occur currently */
+          continue;
+        if (subtype && (!child->subtype || strcasecmp(subtype, child->subtype)))
+          continue;
+        if (nameprefix && (!child->name || hwloc_strncasecmp(nameprefix, child->name, strlen(nameprefix))))
+          continue;
+        return child;
+      }
+    }
+    errno = ENOENT;
+    return NULL;
+
+  } else {
+    /* nothing for Misc */
+    errno = EINVAL;
+    return NULL;
+  }
+}
--- a/src/3rdparty/llhttp/LICENSE-MIT
+++ b/src/3rdparty/llhttp/LICENSE-MIT
@@ -0,0 +1,22 @@
+This software is licensed under the MIT License.
+
+Copyright Fedor Indutny, 2018.
+
+Permission is hereby granted, free of charge, to any person obtaining a
+copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to permit
+persons to whom the Software is furnished to do so, subject to the
+following conditions:
+
+The above copyright notice and this permission notice shall be included
+in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
+NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+USE OR OTHER DEALINGS IN THE SOFTWARE.
--- a/src/3rdparty/llhttp/README.md
+++ b/src/3rdparty/llhttp/README.md
@@ -0,0 +1,135 @@
+# llhttp
+[![CI](https://github.com/nodejs/llhttp/workflows/CI/badge.svg)](https://github.com/nodejs/llhttp/actions?query=workflow%3ACI)
+
+Port of [http_parser][0] to [llparse][1].
+
+## Why?
+
+Let's face it, [http_parser][0] is practically unmaintainable. Even
+introduction of a single new method results in a significant code churn.
+
+This project aims to:
+
+* Make it maintainable
+* Verifiable
+* Improving benchmarks where possible
+
+More details in [Fedor Indutny's talk at JSConf EU 2019](https://youtu.be/x3k_5Mi66sY)
+
+## How?
+
+Over time, different approaches for improving [http_parser][0]'s code base
+were tried. However, all of them failed due to resulting significant performance
+degradation.
+
+This project is a port of [http_parser][0] to TypeScript. [llparse][1] is used
+to generate the output C source file, which could be compiled and
+linked with the embedder's program (like [Node.js][7]).
+
+## Performance
+
+So far llhttp outperforms http_parser:
+
+|                 | input size |  bandwidth   |  reqs/sec  |   time  |
+|:----------------|-----------:|-------------:|-----------:|--------:|
+| **llhttp**      | 8192.00 mb | 1777.24 mb/s | 3583799.39 req/sec | 4.61 s |
+| **http_parser** | 8192.00 mb | 694.66 mb/s | 1406180.33 req/sec | 11.79 s |
+
+llhttp is faster by approximately **156%**.
+
+## Maintenance
+
+llhttp project has about 1400 lines of TypeScript code describing the parser
+itself and around 450 lines of C code and headers providing the helper methods.
+The whole [http_parser][0] is implemented in approximately 2500 lines of C, and
+436 lines of headers.
+
+All optimizations and multi-character matching in llhttp are generated
+automatically, and thus doesn't add any extra maintenance cost. On the contrary,
+most of http_parser's code is hand-optimized and unrolled. Instead describing
+"how" it should parse the HTTP requests/responses, a maintainer should
+implement the new features in [http_parser][0] cautiously, considering
+possible performance degradation and manually optimizing the new code.
+
+## Verification
+
+The state machine graph is encoded explicitly in llhttp. The [llparse][1]
+automatically checks the graph for absence of loops and correct reporting of the
+input ranges (spans) like header names and values. In the future, additional
+checks could be performed to get even stricter verification of the llhttp.
+
+## Usage
+
+```C
+#include "llhttp.h"
+
+llhttp_t parser;
+llhttp_settings_t settings;
+
+/* Initialize user callbacks and settings */
+llhttp_settings_init(&settings);
+
+/* Set user callback */
+settings.on_message_complete = handle_on_message_complete;
+
+/* Initialize the parser in HTTP_BOTH mode, meaning that it will select between
+ * HTTP_REQUEST and HTTP_RESPONSE parsing automatically while reading the first
+ * input.
+ */
+llhttp_init(&parser, HTTP_BOTH, &settings);
+
+/* Parse request! */
+const char* request = "GET / HTTP/1.1\r\n\r\n";
+int request_len = strlen(request);
+
+enum llhttp_errno err = llhttp_execute(&parser, request, request_len);
+if (err == HPE_OK) {
+  /* Successfully parsed! */
+} else {
+  fprintf(stderr, "Parse error: %s %s\n", llhttp_errno_name(err),
+          parser.reason);
+}
+```
+
+---
+
+### Bindings to other languages
+
+* Python: [pallas/pyllhttp][8]
+* Ruby: [metabahn/llhttp][9]
+
+#### LICENSE
+
+This software is licensed under the MIT License.
+
+Copyright Fedor Indutny, 2018.
+
+Permission is hereby granted, free of charge, to any person obtaining a
+copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to permit
+persons to whom the Software is furnished to do so, subject to the
+following conditions:
+
+The above copyright notice and this permission notice shall be included
+in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
+NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+[0]: https://github.com/nodejs/http-parser
+[1]: https://github.com/nodejs/llparse
+[2]: https://en.wikipedia.org/wiki/Register_allocation#Spilling
+[3]: https://en.wikipedia.org/wiki/Tail_call
+[4]: https://llvm.org/docs/LangRef.html
+[5]: https://llvm.org/docs/LangRef.html#call-instruction
+[6]: https://clang.llvm.org/
+[7]: https://github.com/nodejs/node
+[8]: https://github.com/pallas/pyllhttp
+[9]: https://github.com/metabahn/llhttp
--- a/src/3rdparty/llhttp/api.c
+++ b/src/3rdparty/llhttp/api.c
@@ -0,0 +1,348 @@
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#include "llhttp.h"
+
+#define CALLBACK_MAYBE(PARSER, NAME, ...)                                     \
+  do {                                                                        \
+    const llhttp_settings_t* settings;                                        \
+    settings = (const llhttp_settings_t*) (PARSER)->settings;                 \
+    if (settings == NULL || settings->NAME == NULL) {                         \
+      err = 0;                                                                \
+      break;                                                                  \
+    }                                                                         \
+    err = settings->NAME(__VA_ARGS__);                                        \
+  } while (0)
+
+void llhttp_init(llhttp_t* parser, llhttp_type_t type,
+                 const llhttp_settings_t* settings) {
+  llhttp__internal_init(parser);
+
+  parser->type = type;
+  parser->settings = (void*) settings;
+}
+
+
+#if defined(__wasm__)
+
+extern int wasm_on_message_begin(llhttp_t * p);
+extern int wasm_on_url(llhttp_t* p, const char* at, size_t length);
+extern int wasm_on_status(llhttp_t* p, const char* at, size_t length);
+extern int wasm_on_header_field(llhttp_t* p, const char* at, size_t length);
+extern int wasm_on_header_value(llhttp_t* p, const char* at, size_t length);
+extern int wasm_on_headers_complete(llhttp_t * p);
+extern int wasm_on_body(llhttp_t* p, const char* at, size_t length);
+extern int wasm_on_message_complete(llhttp_t * p);
+
+const llhttp_settings_t wasm_settings = {
+  wasm_on_message_begin,
+  wasm_on_url,
+  wasm_on_status,
+  wasm_on_header_field,
+  wasm_on_header_value,
+  wasm_on_headers_complete,
+  wasm_on_body,
+  wasm_on_message_complete,
+  NULL,
+  NULL,
+};
+
+
+llhttp_t* llhttp_alloc(llhttp_type_t type) {
+  llhttp_t* parser = malloc(sizeof(llhttp_t));
+  llhttp_init(parser, type, &wasm_settings);
+  return parser;
+}
+
+void llhttp_free(llhttp_t* parser) {
+  free(parser);
+}
+
+/* Some getters required to get stuff from the parser */
+
+uint8_t llhttp_get_type(llhttp_t* parser) {
+  return parser->type;
+}
+
+uint8_t llhttp_get_http_major(llhttp_t* parser) {
+  return parser->http_major;
+}
+
+uint8_t llhttp_get_http_minor(llhttp_t* parser) {
+  return parser->http_minor;
+}
+
+uint8_t llhttp_get_method(llhttp_t* parser) {
+  return parser->method;
+}
+
+int llhttp_get_status_code(llhttp_t* parser) {
+  return parser->status_code;
+}
+
+uint8_t llhttp_get_upgrade(llhttp_t* parser) {
+  return parser->upgrade;
+}
+
+#endif  // defined(__wasm__)
+
+
+void llhttp_reset(llhttp_t* parser) {
+  llhttp_type_t type = parser->type;
+  const llhttp_settings_t* settings = parser->settings;
+  void* data = parser->data;
+  uint8_t lenient_flags = parser->lenient_flags;
+
+  llhttp__internal_init(parser);
+
+  parser->type = type;
+  parser->settings = (void*) settings;
+  parser->data = data;
+  parser->lenient_flags = lenient_flags;
+}
+
+
+llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len) {
+  return llhttp__internal_execute(parser, data, data + len);
+}
+
+
+void llhttp_settings_init(llhttp_settings_t* settings) {
+  memset(settings, 0, sizeof(*settings));
+}
+
+
+llhttp_errno_t llhttp_finish(llhttp_t* parser) {
+  int err;
+
+  /* We're in an error state. Don't bother doing anything. */
+  if (parser->error != 0) {
+    return 0;
+  }
+
+  switch (parser->finish) {
+    case HTTP_FINISH_SAFE_WITH_CB:
+      CALLBACK_MAYBE(parser, on_message_complete, parser);
+      if (err != HPE_OK) return err;
+
+    /* FALLTHROUGH */
+    case HTTP_FINISH_SAFE:
+      return HPE_OK;
+    case HTTP_FINISH_UNSAFE:
+      parser->reason = "Invalid EOF state";
+      return HPE_INVALID_EOF_STATE;
+    default:
+      abort();
+  }
+}
+
+
+void llhttp_pause(llhttp_t* parser) {
+  if (parser->error != HPE_OK) {
+    return;
+  }
+
+  parser->error = HPE_PAUSED;
+  parser->reason = "Paused";
+}
+
+
+void llhttp_resume(llhttp_t* parser) {
+  if (parser->error != HPE_PAUSED) {
+    return;
+  }
+
+  parser->error = 0;
+}
+
+
+void llhttp_resume_after_upgrade(llhttp_t* parser) {
+  if (parser->error != HPE_PAUSED_UPGRADE) {
+    return;
+  }
+
+  parser->error = 0;
+}
+
+
+llhttp_errno_t llhttp_get_errno(const llhttp_t* parser) {
+  return parser->error;
+}
+
+
+const char* llhttp_get_error_reason(const llhttp_t* parser) {
+  return parser->reason;
+}
+
+
+void llhttp_set_error_reason(llhttp_t* parser, const char* reason) {
+  parser->reason = reason;
+}
+
+
+const char* llhttp_get_error_pos(const llhttp_t* parser) {
+  return parser->error_pos;
+}
+
+
+const char* llhttp_errno_name(llhttp_errno_t err) {
+#define HTTP_ERRNO_GEN(CODE, NAME, _) case HPE_##NAME: return "HPE_" #NAME;
+  switch (err) {
+    HTTP_ERRNO_MAP(HTTP_ERRNO_GEN)
+    default: abort();
+  }
+#undef HTTP_ERRNO_GEN
+}
+
+
+const char* llhttp_method_name(llhttp_method_t method) {
+#define HTTP_METHOD_GEN(NUM, NAME, STRING) case HTTP_##NAME: return #STRING;
+  switch (method) {
+    HTTP_METHOD_MAP(HTTP_METHOD_GEN)
+    default: abort();
+  }
+#undef HTTP_METHOD_GEN
+}
+
+
+void llhttp_set_lenient_headers(llhttp_t* parser, int enabled) {
+  if (enabled) {
+    parser->lenient_flags |= LENIENT_HEADERS;
+  } else {
+    parser->lenient_flags &= ~LENIENT_HEADERS;
+  }
+}
+
+
+void llhttp_set_lenient_chunked_length(llhttp_t* parser, int enabled) {
+  if (enabled) {
+    parser->lenient_flags |= LENIENT_CHUNKED_LENGTH;
+  } else {
+    parser->lenient_flags &= ~LENIENT_CHUNKED_LENGTH;
+  }
+}
+
+
+void llhttp_set_lenient_keep_alive(llhttp_t* parser, int enabled) {
+  if (enabled) {
+    parser->lenient_flags |= LENIENT_KEEP_ALIVE;
+  } else {
+    parser->lenient_flags &= ~LENIENT_KEEP_ALIVE;
+  }
+}
+
+/* Callbacks */
+
+
+int llhttp__on_message_begin(llhttp_t* s, const char* p, const char* endp) {
+  int err;
+  CALLBACK_MAYBE(s, on_message_begin, s);
+  return err;
+}
+
+
+int llhttp__on_url(llhttp_t* s, const char* p, const char* endp) {
+  int err;
+  CALLBACK_MAYBE(s, on_url, s, p, endp - p);
+  return err;
+}
+
+
+int llhttp__on_url_complete(llhttp_t* s, const char* p, const char* endp) {
+  int err;
+  CALLBACK_MAYBE(s, on_url_complete, s);
+  return err;
+}
+
+
+int llhttp__on_status(llhttp_t* s, const char* p, const char* endp) {
+  int err;
+  CALLBACK_MAYBE(s, on_status, s, p, endp - p);
+  return err;
+}
+
+
+int llhttp__on_status_complete(llhttp_t* s, const char* p, const char* endp) {
+  int err;
+  CALLBACK_MAYBE(s, on_status_complete, s);
+  return err;
+}
+
+
+int llhttp__on_header_field(llhttp_t* s, const char* p, const char* endp) {
+  int err;
+  CALLBACK_MAYBE(s, on_header_field, s, p, endp - p);
+  return err;
+}
+
+
+int llhttp__on_header_field_complete(llhttp_t* s, const char* p, const char* endp) {
+  int err;
+  CALLBACK_MAYBE(s, on_header_field_complete, s);
+  return err;
+}
+
+
+int llhttp__on_header_value(llhttp_t* s, const char* p, const char* endp) {
+  int err;
+  CALLBACK_MAYBE(s, on_header_value, s, p, endp - p);
+  return err;
+}
+
+
+int llhttp__on_header_value_complete(llhttp_t* s, const char* p, const char* endp) {
+  int err;
+  CALLBACK_MAYBE(s, on_header_value_complete, s);
+  return err;
+}
+
+
+int llhttp__on_headers_complete(llhttp_t* s, const char* p, const char* endp) {
+  int err;
+  CALLBACK_MAYBE(s, on_headers_complete, s);
+  return err;
+}
+
+
+int llhttp__on_message_complete(llhttp_t* s, const char* p, const char* endp) {
+  int err;
+  CALLBACK_MAYBE(s, on_message_complete, s);
+  return err;
+}
+
+
+int llhttp__on_body(llhttp_t* s, const char* p, const char* endp) {
+  int err;
+  CALLBACK_MAYBE(s, on_body, s, p, endp - p);
+  return err;
+}
+
+
+int llhttp__on_chunk_header(llhttp_t* s, const char* p, const char* endp) {
+  int err;
+  CALLBACK_MAYBE(s, on_chunk_header, s);
+  return err;
+}
+
+
+int llhttp__on_chunk_complete(llhttp_t* s, const char* p, const char* endp) {
+  int err;
+  CALLBACK_MAYBE(s, on_chunk_complete, s);
+  return err;
+}
+
+
+/* Private */
+
+
+void llhttp__debug(llhttp_t* s, const char* p, const char* endp,
+                   const char* msg) {
+  if (p == endp) {
+    fprintf(stderr, "p=%p type=%d flags=%02x next=null debug=%s\n", s, s->type,
+            s->flags, msg);
+  } else {
+    fprintf(stderr, "p=%p type=%d flags=%02x next=%02x   debug=%s\n", s,
+            s->type, s->flags, *p, msg);
+  }
+}
--- a/src/3rdparty/llhttp/api.h
+++ b/src/3rdparty/llhttp/api.h
@@ -0,0 +1,253 @@
+#ifndef INCLUDE_LLHTTP_API_H_
+#define INCLUDE_LLHTTP_API_H_
+#ifdef __cplusplus
+extern "C" {
+#endif
+#include <stddef.h>
+
+#if defined(__wasm__)
+#define LLHTTP_EXPORT __attribute__((visibility("default")))
+#else
+#define LLHTTP_EXPORT
+#endif
+
+typedef llhttp__internal_t llhttp_t;
+typedef struct llhttp_settings_s llhttp_settings_t;
+
+typedef int (*llhttp_data_cb)(llhttp_t*, const char *at, size_t length);
+typedef int (*llhttp_cb)(llhttp_t*);
+
+struct llhttp_settings_s {
+  /* Possible return values 0, -1, `HPE_PAUSED` */
+  llhttp_cb      on_message_begin;
+
+  llhttp_data_cb on_url;
+  llhttp_data_cb on_status;
+  llhttp_data_cb on_header_field;
+  llhttp_data_cb on_header_value;
+
+  /* Possible return values:
+   * 0  - Proceed normally
+   * 1  - Assume that request/response has no body, and proceed to parsing the
+   *      next message
+   * 2  - Assume absence of body (as above) and make `llhttp_execute()` return
+   *      `HPE_PAUSED_UPGRADE`
+   * -1 - Error
+   * `HPE_PAUSED`
+   */
+  llhttp_cb      on_headers_complete;
+
+  llhttp_data_cb on_body;
+
+  /* Possible return values 0, -1, `HPE_PAUSED` */
+  llhttp_cb      on_message_complete;
+
+  /* When on_chunk_header is called, the current chunk length is stored
+   * in parser->content_length.
+   * Possible return values 0, -1, `HPE_PAUSED`
+   */
+  llhttp_cb      on_chunk_header;
+  llhttp_cb      on_chunk_complete;
+
+  llhttp_cb      on_url_complete;
+  llhttp_cb      on_status_complete;
+  llhttp_cb      on_header_field_complete;
+  llhttp_cb      on_header_value_complete;
+};
+
+/* Initialize the parser with specific type and user settings.
+ *
+ * NOTE: lifetime of `settings` has to be at least the same as the lifetime of
+ * the `parser` here. In practice, `settings` has to be either a static
+ * variable or be allocated with `malloc`, `new`, etc.
+ */
+LLHTTP_EXPORT
+void llhttp_init(llhttp_t* parser, llhttp_type_t type,
+                 const llhttp_settings_t* settings);
+
+#if defined(__wasm__)
+
+LLHTTP_EXPORT
+llhttp_t* llhttp_alloc(llhttp_type_t type);
+
+LLHTTP_EXPORT
+void llhttp_free(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_type(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_http_major(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_http_minor(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_method(llhttp_t* parser);
+
+LLHTTP_EXPORT
+int llhttp_get_status_code(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_upgrade(llhttp_t* parser);
+
+#endif  // defined(__wasm__)
+
+/* Reset an already initialized parser back to the start state, preserving the
+ * existing parser type, callback settings, user data, and lenient flags.
+ */
+LLHTTP_EXPORT
+void llhttp_reset(llhttp_t* parser);
+
+/* Initialize the settings object */
+LLHTTP_EXPORT
+void llhttp_settings_init(llhttp_settings_t* settings);
+
+/* Parse full or partial request/response, invoking user callbacks along the
+ * way.
+ *
+ * If any of `llhttp_data_cb` returns errno not equal to `HPE_OK` - the parsing
+ * interrupts, and such errno is returned from `llhttp_execute()`. If
+ * `HPE_PAUSED` was used as a errno, the execution can be resumed with
+ * `llhttp_resume()` call.
+ *
+ * In a special case of CONNECT/Upgrade request/response `HPE_PAUSED_UPGRADE`
+ * is returned after fully parsing the request/response. If the user wishes to
+ * continue parsing, they need to invoke `llhttp_resume_after_upgrade()`.
+ *
+ * NOTE: if this function ever returns a non-pause type error, it will continue
+ * to return the same error upon each successive call up until `llhttp_init()`
+ * is called.
+ */
+LLHTTP_EXPORT
+llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len);
+
+/* This method should be called when the other side has no further bytes to
+ * send (e.g. shutdown of readable side of the TCP connection.)
+ *
+ * Requests without `Content-Length` and other messages might require treating
+ * all incoming bytes as the part of the body, up to the last byte of the
+ * connection. This method will invoke `on_message_complete()` callback if the
+ * request was terminated safely. Otherwise a error code would be returned.
+ */
+LLHTTP_EXPORT
+llhttp_errno_t llhttp_finish(llhttp_t* parser);
+
+/* Returns `1` if the incoming message is parsed until the last byte, and has
+ * to be completed by calling `llhttp_finish()` on EOF
+ */
+LLHTTP_EXPORT
+int llhttp_message_needs_eof(const llhttp_t* parser);
+
+/* Returns `1` if there might be any other messages following the last that was
+ * successfully parsed.
+ */
+LLHTTP_EXPORT
+int llhttp_should_keep_alive(const llhttp_t* parser);
+
+/* Make further calls of `llhttp_execute()` return `HPE_PAUSED` and set
+ * appropriate error reason.
+ *
+ * Important: do not call this from user callbacks! User callbacks must return
+ * `HPE_PAUSED` if pausing is required.
+ */
+LLHTTP_EXPORT
+void llhttp_pause(llhttp_t* parser);
+
+/* Might be called to resume the execution after the pause in user's callback.
+ * See `llhttp_execute()` above for details.
+ *
+ * Call this only if `llhttp_execute()` returns `HPE_PAUSED`.
+ */
+LLHTTP_EXPORT
+void llhttp_resume(llhttp_t* parser);
+
+/* Might be called to resume the execution after the pause in user's callback.
+ * See `llhttp_execute()` above for details.
+ *
+ * Call this only if `llhttp_execute()` returns `HPE_PAUSED_UPGRADE`
+ */
+LLHTTP_EXPORT
+void llhttp_resume_after_upgrade(llhttp_t* parser);
+
+/* Returns the latest return error */
+LLHTTP_EXPORT
+llhttp_errno_t llhttp_get_errno(const llhttp_t* parser);
+
+/* Returns the verbal explanation of the latest returned error.
+ *
+ * Note: User callback should set error reason when returning the error. See
+ * `llhttp_set_error_reason()` for details.
+ */
+LLHTTP_EXPORT
+const char* llhttp_get_error_reason(const llhttp_t* parser);
+
+/* Assign verbal description to the returned error. Must be called in user
+ * callbacks right before returning the errno.
+ *
+ * Note: `HPE_USER` error code might be useful in user callbacks.
+ */
+LLHTTP_EXPORT
+void llhttp_set_error_reason(llhttp_t* parser, const char* reason);
+
+/* Returns the pointer to the last parsed byte before the returned error. The
+ * pointer is relative to the `data` argument of `llhttp_execute()`.
+ *
+ * Note: this method might be useful for counting the number of parsed bytes.
+ */
+LLHTTP_EXPORT
+const char* llhttp_get_error_pos(const llhttp_t* parser);
+
+/* Returns textual name of error code */
+LLHTTP_EXPORT
+const char* llhttp_errno_name(llhttp_errno_t err);
+
+/* Returns textual name of HTTP method */
+LLHTTP_EXPORT
+const char* llhttp_method_name(llhttp_method_t method);
+
+
+/* Enables/disables lenient header value parsing (disabled by default).
+ *
+ * Lenient parsing disables header value token checks, extending llhttp's
+ * protocol support to highly non-compliant clients/server. No
+ * `HPE_INVALID_HEADER_TOKEN` will be raised for incorrect header values when
+ * lenient parsing is "on".
+ *
+ * **(USE AT YOUR OWN RISK)**
+ */
+LLHTTP_EXPORT
+void llhttp_set_lenient_headers(llhttp_t* parser, int enabled);
+
+
+/* Enables/disables lenient handling of conflicting `Transfer-Encoding` and
+ * `Content-Length` headers (disabled by default).
+ *
+ * Normally `llhttp` would error when `Transfer-Encoding` is present in
+ * conjunction with `Content-Length`. This error is important to prevent HTTP
+ * request smuggling, but may be less desirable for small number of cases
+ * involving legacy servers.
+ *
+ * **(USE AT YOUR OWN RISK)**
+ */
+LLHTTP_EXPORT
+void llhttp_set_lenient_chunked_length(llhttp_t* parser, int enabled);
+
+
+/* Enables/disables lenient handling of `Connection: close` and HTTP/1.0
+ * requests responses.
+ *
+ * Normally `llhttp` would error on (in strict mode) or discard (in loose mode)
+ * the HTTP request/response after the request/response with `Connection: close`
+ * and `Content-Length`. This is important to prevent cache poisoning attacks,
+ * but might interact badly with outdated and insecure clients. With this flag
+ * the extra request/response will be parsed normally.
+ *
+ * **(USE AT YOUR OWN RISK)**
+ */
+void llhttp_set_lenient_keep_alive(llhttp_t* parser, int enabled);
+
+#ifdef __cplusplus
+}  /* extern "C" */
+#endif
+#endif  /* INCLUDE_LLHTTP_API_H_ */
--- a/src/3rdparty/llhttp/http.c
+++ b/src/3rdparty/llhttp/http.c
@@ -0,0 +1,149 @@
+#include <stdio.h>
+#ifndef LLHTTP__TEST
+# include "llhttp.h"
+#else
+# define llhttp_t llparse_t
+#endif  /* */
+
+int llhttp_message_needs_eof(const llhttp_t* parser);
+int llhttp_should_keep_alive(const llhttp_t* parser);
+
+int llhttp__before_headers_complete(llhttp_t* parser, const char* p,
+                                    const char* endp) {
+  /* Set this here so that on_headers_complete() callbacks can see it */
+  if ((parser->flags & F_UPGRADE) &&
+      (parser->flags & F_CONNECTION_UPGRADE)) {
+    /* For responses, "Upgrade: foo" and "Connection: upgrade" are
+     * mandatory only when it is a 101 Switching Protocols response,
+     * otherwise it is purely informational, to announce support.
+     */
+    parser->upgrade =
+        (parser->type == HTTP_REQUEST || parser->status_code == 101);
+  } else {
+    parser->upgrade = (parser->method == HTTP_CONNECT);
+  }
+  return 0;
+}
+
+
+/* Return values:
+ * 0 - No body, `restart`, message_complete
+ * 1 - CONNECT request, `restart`, message_complete, and pause
+ * 2 - chunk_size_start
+ * 3 - body_identity
+ * 4 - body_identity_eof
+ * 5 - invalid transfer-encoding for request
+ */
+int llhttp__after_headers_complete(llhttp_t* parser, const char* p,
+                                   const char* endp) {
+  int hasBody;
+
+  hasBody = parser->flags & F_CHUNKED || parser->content_length > 0;
+  if (parser->upgrade && (parser->method == HTTP_CONNECT ||
+                          (parser->flags & F_SKIPBODY) || !hasBody)) {
+    /* Exit, the rest of the message is in a different protocol. */
+    return 1;
+  }
+
+  if (parser->flags & F_SKIPBODY) {
+    return 0;
+  } else if (parser->flags & F_CHUNKED) {
+    /* chunked encoding - ignore Content-Length header, prepare for a chunk */
+    return 2;
+  } else if (parser->flags & F_TRANSFER_ENCODING) {
+    if (parser->type == HTTP_REQUEST &&
+        (parser->lenient_flags & LENIENT_CHUNKED_LENGTH) == 0) {
+      /* RFC 7230 3.3.3 */
+
+      /* If a Transfer-Encoding header field
+       * is present in a request and the chunked transfer coding is not
+       * the final encoding, the message body length cannot be determined
+       * reliably; the server MUST respond with the 400 (Bad Request)
+       * status code and then close the connection.
+       */
+      return 5;
+    } else {
+      /* RFC 7230 3.3.3 */
+
+      /* If a Transfer-Encoding header field is present in a response and
+       * the chunked transfer coding is not the final encoding, the
+       * message body length is determined by reading the connection until
+       * it is closed by the server.
+       */
+      return 4;
+    }
+  } else {
+    if (!(parser->flags & F_CONTENT_LENGTH)) {
+      if (!llhttp_message_needs_eof(parser)) {
+        /* Assume content-length 0 - read the next */
+        return 0;
+      } else {
+        /* Read body until EOF */
+        return 4;
+      }
+    } else if (parser->content_length == 0) {
+      /* Content-Length header given but zero: Content-Length: 0\r\n */
+      return 0;
+    } else {
+      /* Content-Length header given and non-zero */
+      return 3;
+    }
+  }
+}
+
+
+int llhttp__after_message_complete(llhttp_t* parser, const char* p,
+                                   const char* endp) {
+  int should_keep_alive;
+
+  should_keep_alive = llhttp_should_keep_alive(parser);
+  parser->finish = HTTP_FINISH_SAFE;
+  parser->flags = 0;
+
+  /* NOTE: this is ignored in loose parsing mode */
+  return should_keep_alive;
+}
+
+
+int llhttp_message_needs_eof(const llhttp_t* parser) {
+  if (parser->type == HTTP_REQUEST) {
+    return 0;
+  }
+
+  /* See RFC 2616 section 4.4 */
+  if (parser->status_code / 100 == 1 || /* 1xx e.g. Continue */
+      parser->status_code == 204 ||     /* No Content */
+      parser->status_code == 304 ||     /* Not Modified */
+      (parser->flags & F_SKIPBODY)) {     /* response to a HEAD request */
+    return 0;
+  }
+
+  /* RFC 7230 3.3.3, see `llhttp__after_headers_complete` */
+  if ((parser->flags & F_TRANSFER_ENCODING) &&
+      (parser->flags & F_CHUNKED) == 0) {
+    return 1;
+  }
+
+  if (parser->flags & (F_CHUNKED | F_CONTENT_LENGTH)) {
+    return 0;
+  }
+
+  return 1;
+}
+
+
+int llhttp_should_keep_alive(const llhttp_t* parser) {
+  if (parser->http_major > 0 && parser->http_minor > 0) {
+    /* HTTP/1.1 */
+    if (parser->flags & F_CONNECTION_CLOSE) {
+      return 0;
+    }
+  } else {
+    /* HTTP/1.0 or earlier */
+    if (!(parser->flags & F_CONNECTION_KEEP_ALIVE)) {
+      return 0;
+    }
+  }
+
+  return !llhttp_message_needs_eof(parser);
+}
--- a/src/3rdparty/llhttp/llhttp.c
+++ b/src/3rdparty/llhttp/llhttp.c
--- a/src/3rdparty/llhttp/llhttp.h
+++ b/src/3rdparty/llhttp/llhttp.h
@@ -0,0 +1,508 @@
+#ifndef INCLUDE_LLHTTP_H_
+#define INCLUDE_LLHTTP_H_
+
+#define LLHTTP_VERSION_MAJOR 5
+#define LLHTTP_VERSION_MINOR 1
+#define LLHTTP_VERSION_PATCH 0
+
+#ifndef LLHTTP_STRICT_MODE
+# define LLHTTP_STRICT_MODE 0
+#endif
+
+#ifndef INCLUDE_LLHTTP_ITSELF_H_
+#define INCLUDE_LLHTTP_ITSELF_H_
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+typedef struct llhttp__internal_s llhttp__internal_t;
+struct llhttp__internal_s {
+  int32_t _index;
+  void* _span_pos0;
+  void* _span_cb0;
+  int32_t error;
+  const char* reason;
+  const char* error_pos;
+  void* data;
+  void* _current;
+  uint64_t content_length;
+  uint8_t type;
+  uint8_t method;
+  uint8_t http_major;
+  uint8_t http_minor;
+  uint8_t header_state;
+  uint8_t lenient_flags;
+  uint8_t upgrade;
+  uint8_t finish;
+  uint16_t flags;
+  uint16_t status_code;
+  void* settings;
+};
+
+int llhttp__internal_init(llhttp__internal_t* s);
+int llhttp__internal_execute(llhttp__internal_t* s, const char* p, const char* endp);
+
+#ifdef __cplusplus
+}  /* extern "C" */
+#endif
+#endif  /* INCLUDE_LLHTTP_ITSELF_H_ */
+
+#ifndef LLLLHTTP_C_HEADERS_
+#define LLLLHTTP_C_HEADERS_
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+enum llhttp_errno {
+  HPE_OK = 0,
+  HPE_INTERNAL = 1,
+  HPE_STRICT = 2,
+  HPE_LF_EXPECTED = 3,
+  HPE_UNEXPECTED_CONTENT_LENGTH = 4,
+  HPE_CLOSED_CONNECTION = 5,
+  HPE_INVALID_METHOD = 6,
+  HPE_INVALID_URL = 7,
+  HPE_INVALID_CONSTANT = 8,
+  HPE_INVALID_VERSION = 9,
+  HPE_INVALID_HEADER_TOKEN = 10,
+  HPE_INVALID_CONTENT_LENGTH = 11,
+  HPE_INVALID_CHUNK_SIZE = 12,
+  HPE_INVALID_STATUS = 13,
+  HPE_INVALID_EOF_STATE = 14,
+  HPE_INVALID_TRANSFER_ENCODING = 15,
+  HPE_CB_MESSAGE_BEGIN = 16,
+  HPE_CB_HEADERS_COMPLETE = 17,
+  HPE_CB_MESSAGE_COMPLETE = 18,
+  HPE_CB_CHUNK_HEADER = 19,
+  HPE_CB_CHUNK_COMPLETE = 20,
+  HPE_PAUSED = 21,
+  HPE_PAUSED_UPGRADE = 22,
+  HPE_PAUSED_H2_UPGRADE = 23,
+  HPE_USER = 24
+};
+typedef enum llhttp_errno llhttp_errno_t;
+
+enum llhttp_flags {
+  F_CONNECTION_KEEP_ALIVE = 0x1,
+  F_CONNECTION_CLOSE = 0x2,
+  F_CONNECTION_UPGRADE = 0x4,
+  F_CHUNKED = 0x8,
+  F_UPGRADE = 0x10,
+  F_CONTENT_LENGTH = 0x20,
+  F_SKIPBODY = 0x40,
+  F_TRAILING = 0x80,
+  F_TRANSFER_ENCODING = 0x200
+};
+typedef enum llhttp_flags llhttp_flags_t;
+
+enum llhttp_lenient_flags {
+  LENIENT_HEADERS = 0x1,
+  LENIENT_CHUNKED_LENGTH = 0x2,
+  LENIENT_KEEP_ALIVE = 0x4
+};
+typedef enum llhttp_lenient_flags llhttp_lenient_flags_t;
+
+enum llhttp_type {
+  HTTP_BOTH = 0,
+  HTTP_REQUEST = 1,
+  HTTP_RESPONSE = 2
+};
+typedef enum llhttp_type llhttp_type_t;
+
+enum llhttp_finish {
+  HTTP_FINISH_SAFE = 0,
+  HTTP_FINISH_SAFE_WITH_CB = 1,
+  HTTP_FINISH_UNSAFE = 2
+};
+typedef enum llhttp_finish llhttp_finish_t;
+
+enum llhttp_method {
+  HTTP_DELETE = 0,
+  HTTP_GET = 1,
+  HTTP_HEAD = 2,
+  HTTP_POST = 3,
+  HTTP_PUT = 4,
+  HTTP_CONNECT = 5,
+  HTTP_OPTIONS = 6,
+  HTTP_TRACE = 7,
+  HTTP_COPY = 8,
+  HTTP_LOCK = 9,
+  HTTP_MKCOL = 10,
+  HTTP_MOVE = 11,
+  HTTP_PROPFIND = 12,
+  HTTP_PROPPATCH = 13,
+  HTTP_SEARCH = 14,
+  HTTP_UNLOCK = 15,
+  HTTP_BIND = 16,
+  HTTP_REBIND = 17,
+  HTTP_UNBIND = 18,
+  HTTP_ACL = 19,
+  HTTP_REPORT = 20,
+  HTTP_MKACTIVITY = 21,
+  HTTP_CHECKOUT = 22,
+  HTTP_MERGE = 23,
+  HTTP_MSEARCH = 24,
+  HTTP_NOTIFY = 25,
+  HTTP_SUBSCRIBE = 26,
+  HTTP_UNSUBSCRIBE = 27,
+  HTTP_PATCH = 28,
+  HTTP_PURGE = 29,
+  HTTP_MKCALENDAR = 30,
+  HTTP_LINK = 31,
+  HTTP_UNLINK = 32,
+  HTTP_SOURCE = 33,
+  HTTP_PRI = 34,
+  HTTP_DESCRIBE = 35,
+  HTTP_ANNOUNCE = 36,
+  HTTP_SETUP = 37,
+  HTTP_PLAY = 38,
+  HTTP_PAUSE = 39,
+  HTTP_TEARDOWN = 40,
+  HTTP_GET_PARAMETER = 41,
+  HTTP_SET_PARAMETER = 42,
+  HTTP_REDIRECT = 43,
+  HTTP_RECORD = 44,
+  HTTP_FLUSH = 45
+};
+typedef enum llhttp_method llhttp_method_t;
+
+#define HTTP_ERRNO_MAP(XX) \
+  XX(0, OK, OK) \
+  XX(1, INTERNAL, INTERNAL) \
+  XX(2, STRICT, STRICT) \
+  XX(3, LF_EXPECTED, LF_EXPECTED) \
+  XX(4, UNEXPECTED_CONTENT_LENGTH, UNEXPECTED_CONTENT_LENGTH) \
+  XX(5, CLOSED_CONNECTION, CLOSED_CONNECTION) \
+  XX(6, INVALID_METHOD, INVALID_METHOD) \
+  XX(7, INVALID_URL, INVALID_URL) \
+  XX(8, INVALID_CONSTANT, INVALID_CONSTANT) \
+  XX(9, INVALID_VERSION, INVALID_VERSION) \
+  XX(10, INVALID_HEADER_TOKEN, INVALID_HEADER_TOKEN) \
+  XX(11, INVALID_CONTENT_LENGTH, INVALID_CONTENT_LENGTH) \
+  XX(12, INVALID_CHUNK_SIZE, INVALID_CHUNK_SIZE) \
+  XX(13, INVALID_STATUS, INVALID_STATUS) \
+  XX(14, INVALID_EOF_STATE, INVALID_EOF_STATE) \
+  XX(15, INVALID_TRANSFER_ENCODING, INVALID_TRANSFER_ENCODING) \
+  XX(16, CB_MESSAGE_BEGIN, CB_MESSAGE_BEGIN) \
+  XX(17, CB_HEADERS_COMPLETE, CB_HEADERS_COMPLETE) \
+  XX(18, CB_MESSAGE_COMPLETE, CB_MESSAGE_COMPLETE) \
+  XX(19, CB_CHUNK_HEADER, CB_CHUNK_HEADER) \
+  XX(20, CB_CHUNK_COMPLETE, CB_CHUNK_COMPLETE) \
+  XX(21, PAUSED, PAUSED) \
+  XX(22, PAUSED_UPGRADE, PAUSED_UPGRADE) \
+  XX(23, PAUSED_H2_UPGRADE, PAUSED_H2_UPGRADE) \
+  XX(24, USER, USER) \
+
+
+#define HTTP_METHOD_MAP(XX) \
+  XX(0, DELETE, DELETE) \
+  XX(1, GET, GET) \
+  XX(2, HEAD, HEAD) \
+  XX(3, POST, POST) \
+  XX(4, PUT, PUT) \
+  XX(5, CONNECT, CONNECT) \
+  XX(6, OPTIONS, OPTIONS) \
+  XX(7, TRACE, TRACE) \
+  XX(8, COPY, COPY) \
+  XX(9, LOCK, LOCK) \
+  XX(10, MKCOL, MKCOL) \
+  XX(11, MOVE, MOVE) \
+  XX(12, PROPFIND, PROPFIND) \
+  XX(13, PROPPATCH, PROPPATCH) \
+  XX(14, SEARCH, SEARCH) \
+  XX(15, UNLOCK, UNLOCK) \
+  XX(16, BIND, BIND) \
+  XX(17, REBIND, REBIND) \
+  XX(18, UNBIND, UNBIND) \
+  XX(19, ACL, ACL) \
+  XX(20, REPORT, REPORT) \
+  XX(21, MKACTIVITY, MKACTIVITY) \
+  XX(22, CHECKOUT, CHECKOUT) \
+  XX(23, MERGE, MERGE) \
+  XX(24, MSEARCH, M-SEARCH) \
+  XX(25, NOTIFY, NOTIFY) \
+  XX(26, SUBSCRIBE, SUBSCRIBE) \
+  XX(27, UNSUBSCRIBE, UNSUBSCRIBE) \
+  XX(28, PATCH, PATCH) \
+  XX(29, PURGE, PURGE) \
+  XX(30, MKCALENDAR, MKCALENDAR) \
+  XX(31, LINK, LINK) \
+  XX(32, UNLINK, UNLINK) \
+  XX(33, SOURCE, SOURCE) \
+  XX(34, PRI, PRI) \
+  XX(35, DESCRIBE, DESCRIBE) \
+  XX(36, ANNOUNCE, ANNOUNCE) \
+  XX(37, SETUP, SETUP) \
+  XX(38, PLAY, PLAY) \
+  XX(39, PAUSE, PAUSE) \
+  XX(40, TEARDOWN, TEARDOWN) \
+  XX(41, GET_PARAMETER, GET_PARAMETER) \
+  XX(42, SET_PARAMETER, SET_PARAMETER) \
+  XX(43, REDIRECT, REDIRECT) \
+  XX(44, RECORD, RECORD) \
+  XX(45, FLUSH, FLUSH) \
+
+
+
+#ifdef __cplusplus
+}  /* extern "C" */
+#endif
+#endif  /* LLLLHTTP_C_HEADERS_ */
+
+#ifndef INCLUDE_LLHTTP_API_H_
+#define INCLUDE_LLHTTP_API_H_
+#ifdef __cplusplus
+extern "C" {
+#endif
+#include <stddef.h>
+
+#if defined(__wasm__)
+#define LLHTTP_EXPORT __attribute__((visibility("default")))
+#else
+#define LLHTTP_EXPORT
+#endif
+
+typedef llhttp__internal_t llhttp_t;
+typedef struct llhttp_settings_s llhttp_settings_t;
+
+typedef int (*llhttp_data_cb)(llhttp_t*, const char *at, size_t length);
+typedef int (*llhttp_cb)(llhttp_t*);
+
+struct llhttp_settings_s {
+  /* Possible return values 0, -1, `HPE_PAUSED` */
+  llhttp_cb      on_message_begin;
+
+  llhttp_data_cb on_url;
+  llhttp_data_cb on_status;
+  llhttp_data_cb on_header_field;
+  llhttp_data_cb on_header_value;
+
+  /* Possible return values:
+   * 0  - Proceed normally
+   * 1  - Assume that request/response has no body, and proceed to parsing the
+   *      next message
+   * 2  - Assume absence of body (as above) and make `llhttp_execute()` return
+   *      `HPE_PAUSED_UPGRADE`
+   * -1 - Error
+   * `HPE_PAUSED`
+   */
+  llhttp_cb      on_headers_complete;
+
+  llhttp_data_cb on_body;
+
+  /* Possible return values 0, -1, `HPE_PAUSED` */
+  llhttp_cb      on_message_complete;
+
+  /* When on_chunk_header is called, the current chunk length is stored
+   * in parser->content_length.
+   * Possible return values 0, -1, `HPE_PAUSED`
+   */
+  llhttp_cb      on_chunk_header;
+  llhttp_cb      on_chunk_complete;
+
+  llhttp_cb      on_url_complete;
+  llhttp_cb      on_status_complete;
+  llhttp_cb      on_header_field_complete;
+  llhttp_cb      on_header_value_complete;
+};
+
+/* Initialize the parser with specific type and user settings.
+ *
+ * NOTE: lifetime of `settings` has to be at least the same as the lifetime of
+ * the `parser` here. In practice, `settings` has to be either a static
+ * variable or be allocated with `malloc`, `new`, etc.
+ */
+LLHTTP_EXPORT
+void llhttp_init(llhttp_t* parser, llhttp_type_t type,
+                 const llhttp_settings_t* settings);
+
+#if defined(__wasm__)
+
+LLHTTP_EXPORT
+llhttp_t* llhttp_alloc(llhttp_type_t type);
+
+LLHTTP_EXPORT
+void llhttp_free(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_type(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_http_major(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_http_minor(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_method(llhttp_t* parser);
+
+LLHTTP_EXPORT
+int llhttp_get_status_code(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_upgrade(llhttp_t* parser);
+
+#endif  // defined(__wasm__)
+
+/* Reset an already initialized parser back to the start state, preserving the
+ * existing parser type, callback settings, user data, and lenient flags.
+ */
+LLHTTP_EXPORT
+void llhttp_reset(llhttp_t* parser);
+
+/* Initialize the settings object */
+LLHTTP_EXPORT
+void llhttp_settings_init(llhttp_settings_t* settings);
+
+/* Parse full or partial request/response, invoking user callbacks along the
+ * way.
+ *
+ * If any of `llhttp_data_cb` returns errno not equal to `HPE_OK` - the parsing
+ * interrupts, and such errno is returned from `llhttp_execute()`. If
+ * `HPE_PAUSED` was used as a errno, the execution can be resumed with
+ * `llhttp_resume()` call.
+ *
+ * In a special case of CONNECT/Upgrade request/response `HPE_PAUSED_UPGRADE`
+ * is returned after fully parsing the request/response. If the user wishes to
+ * continue parsing, they need to invoke `llhttp_resume_after_upgrade()`.
+ *
+ * NOTE: if this function ever returns a non-pause type error, it will continue
+ * to return the same error upon each successive call up until `llhttp_init()`
+ * is called.
+ */
+LLHTTP_EXPORT
+llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len);
+
+/* This method should be called when the other side has no further bytes to
+ * send (e.g. shutdown of readable side of the TCP connection.)
+ *
+ * Requests without `Content-Length` and other messages might require treating
+ * all incoming bytes as the part of the body, up to the last byte of the
+ * connection. This method will invoke `on_message_complete()` callback if the
+ * request was terminated safely. Otherwise a error code would be returned.
+ */
+LLHTTP_EXPORT
+llhttp_errno_t llhttp_finish(llhttp_t* parser);
+
+/* Returns `1` if the incoming message is parsed until the last byte, and has
+ * to be completed by calling `llhttp_finish()` on EOF
+ */
+LLHTTP_EXPORT
+int llhttp_message_needs_eof(const llhttp_t* parser);
+
+/* Returns `1` if there might be any other messages following the last that was
+ * successfully parsed.
+ */
+LLHTTP_EXPORT
+int llhttp_should_keep_alive(const llhttp_t* parser);
+
+/* Make further calls of `llhttp_execute()` return `HPE_PAUSED` and set
+ * appropriate error reason.
+ *
+ * Important: do not call this from user callbacks! User callbacks must return
+ * `HPE_PAUSED` if pausing is required.
+ */
+LLHTTP_EXPORT
+void llhttp_pause(llhttp_t* parser);
+
+/* Might be called to resume the execution after the pause in user's callback.
+ * See `llhttp_execute()` above for details.
+ *
+ * Call this only if `llhttp_execute()` returns `HPE_PAUSED`.
+ */
+LLHTTP_EXPORT
+void llhttp_resume(llhttp_t* parser);
+
+/* Might be called to resume the execution after the pause in user's callback.
+ * See `llhttp_execute()` above for details.
+ *
+ * Call this only if `llhttp_execute()` returns `HPE_PAUSED_UPGRADE`
+ */
+LLHTTP_EXPORT
+void llhttp_resume_after_upgrade(llhttp_t* parser);
+
+/* Returns the latest return error */
+LLHTTP_EXPORT
+llhttp_errno_t llhttp_get_errno(const llhttp_t* parser);
+
+/* Returns the verbal explanation of the latest returned error.
+ *
+ * Note: User callback should set error reason when returning the error. See
+ * `llhttp_set_error_reason()` for details.
+ */
+LLHTTP_EXPORT
+const char* llhttp_get_error_reason(const llhttp_t* parser);
+
+/* Assign verbal description to the returned error. Must be called in user
+ * callbacks right before returning the errno.
+ *
+ * Note: `HPE_USER` error code might be useful in user callbacks.
+ */
+LLHTTP_EXPORT
+void llhttp_set_error_reason(llhttp_t* parser, const char* reason);
+
+/* Returns the pointer to the last parsed byte before the returned error. The
+ * pointer is relative to the `data` argument of `llhttp_execute()`.
+ *
+ * Note: this method might be useful for counting the number of parsed bytes.
+ */
+LLHTTP_EXPORT
+const char* llhttp_get_error_pos(const llhttp_t* parser);
+
+/* Returns textual name of error code */
+LLHTTP_EXPORT
+const char* llhttp_errno_name(llhttp_errno_t err);
+
+/* Returns textual name of HTTP method */
+LLHTTP_EXPORT
+const char* llhttp_method_name(llhttp_method_t method);
+
+
+/* Enables/disables lenient header value parsing (disabled by default).
+ *
+ * Lenient parsing disables header value token checks, extending llhttp's
+ * protocol support to highly non-compliant clients/server. No
+ * `HPE_INVALID_HEADER_TOKEN` will be raised for incorrect header values when
+ * lenient parsing is "on".
+ *
+ * **(USE AT YOUR OWN RISK)**
+ */
+LLHTTP_EXPORT
+void llhttp_set_lenient_headers(llhttp_t* parser, int enabled);
+
+
+/* Enables/disables lenient handling of conflicting `Transfer-Encoding` and
+ * `Content-Length` headers (disabled by default).
+ *
+ * Normally `llhttp` would error when `Transfer-Encoding` is present in
+ * conjunction with `Content-Length`. This error is important to prevent HTTP
+ * request smuggling, but may be less desirable for small number of cases
+ * involving legacy servers.
+ *
+ * **(USE AT YOUR OWN RISK)**
+ */
+LLHTTP_EXPORT
+void llhttp_set_lenient_chunked_length(llhttp_t* parser, int enabled);
+
+
+/* Enables/disables lenient handling of `Connection: close` and HTTP/1.0
+ * requests responses.
+ *
+ * Normally `llhttp` would error on (in strict mode) or discard (in loose mode)
+ * the HTTP request/response after the request/response with `Connection: close`
+ * and `Content-Length`. This is important to prevent cache poisoning attacks,
+ * but might interact badly with outdated and insecure clients. With this flag
+ * the extra request/response will be parsed normally.
+ *
+ * **(USE AT YOUR OWN RISK)**
+ */
+void llhttp_set_lenient_keep_alive(llhttp_t* parser, int enabled);
+
+#ifdef __cplusplus
+}  /* extern "C" */
+#endif
+#endif  /* INCLUDE_LLHTTP_API_H_ */
+
+#endif  /* INCLUDE_LLHTTP_H_ */
--- a/src/3rdparty/rapidjson/allocators.h
+++ b/src/3rdparty/rapidjson/allocators.h
@@ -1,21 +1,28 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_ALLOCATORS_H_
 #define RAPIDJSON_ALLOCATORS_H_

 #include "rapidjson.h"
+#include "internal/meta.h"
+
+#include <memory>
+
+#if RAPIDJSON_HAS_CXX11
+#include <type_traits>
+#endif

 RAPIDJSON_NAMESPACE_BEGIN

@@ -24,10 +31,10 @@ RAPIDJSON_NAMESPACE_BEGIN

 /*! \class rapidjson::Allocator
    \brief Concept for allocating, resizing and freeing memory block.
-    
+
    Note that Malloc() and Realloc() are non-static but Free() is static.
-    
-    So if an allocator need to support Free(), it needs to put its pointer in 
+
+    So if an allocator need to support Free(), it needs to put its pointer in
    the header of memory block.

 \code
@@ -75,28 +82,35 @@ concept Allocator {
 class CrtAllocator {
 public:
    static const bool kNeedFree = true;
-    void* Malloc(size_t size) { 
+    void* Malloc(size_t size) {
        if (size) //  behavior of malloc(0) is implementation defined.
-            return std::malloc(size);
+            return RAPIDJSON_MALLOC(size);
        else
            return NULL; // standardize to returning NULL.
    }
    void* Realloc(void* originalPtr, size_t originalSize, size_t newSize) {
        (void)originalSize;
        if (newSize == 0) {
-            std::free(originalPtr);
+            RAPIDJSON_FREE(originalPtr);
            return NULL;
        }
-        return std::realloc(originalPtr, newSize);
+        return RAPIDJSON_REALLOC(originalPtr, newSize);
+    }
+    static void Free(void *ptr) RAPIDJSON_NOEXCEPT { RAPIDJSON_FREE(ptr); }
+
+    bool operator==(const CrtAllocator&) const RAPIDJSON_NOEXCEPT {
+        return true;
+    }
+    bool operator!=(const CrtAllocator&) const RAPIDJSON_NOEXCEPT {
+        return false;
    }
-    static void Free(void *ptr) { std::free(ptr); }
 };

 ///////////////////////////////////////////////////////////////////////////////
 // MemoryPoolAllocator

 //! Default memory allocator used by the parser and DOM.
-/*! This allocator allocate memory blocks from pre-allocated memory chunks. 
+/*! This allocator allocate memory blocks from pre-allocated memory chunks.

    It does not free memory blocks. And Realloc() only allocate new memory.

@@ -113,16 +127,64 @@ public:
 */
 template <typename BaseAllocator = CrtAllocator>
 class MemoryPoolAllocator {
+    //! Chunk header for perpending to each chunk.
+    /*! Chunks are stored as a singly linked list.
+    */
+    struct ChunkHeader {
+        size_t capacity;    //!< Capacity of the chunk in bytes (excluding the header itself).
+        size_t size;        //!< Current size of allocated memory in bytes.
+        ChunkHeader *next;  //!< Next chunk in the linked list.
+    };
+
+    struct SharedData {
+        ChunkHeader *chunkHead;  //!< Head of the chunk linked-list. Only the head chunk serves allocation.
+        BaseAllocator* ownBaseAllocator; //!< base allocator created by this object.
+        size_t refcount;
+        bool ownBuffer;
+    };
+
+    static const size_t SIZEOF_SHARED_DATA = RAPIDJSON_ALIGN(sizeof(SharedData));
+    static const size_t SIZEOF_CHUNK_HEADER = RAPIDJSON_ALIGN(sizeof(ChunkHeader));
+
+    static inline ChunkHeader *GetChunkHead(SharedData *shared)
+    {
+        return reinterpret_cast<ChunkHeader*>(reinterpret_cast<uint8_t*>(shared) + SIZEOF_SHARED_DATA);
+    }
+    static inline uint8_t *GetChunkBuffer(SharedData *shared)
+    {
+        return reinterpret_cast<uint8_t*>(shared->chunkHead) + SIZEOF_CHUNK_HEADER;
+    }
+
+    static const size_t kDefaultChunkCapacity = RAPIDJSON_ALLOCATOR_DEFAULT_CHUNK_CAPACITY; //!< Default chunk capacity.
+
 public:
    static const bool kNeedFree = false;    //!< Tell users that no need to call Free() with this allocator. (concept Allocator)
+    static const bool kRefCounted = true;   //!< Tell users that this allocator is reference counted on copy

    //! Constructor with chunkSize.
    /*! \param chunkSize The size of memory chunk. The default is kDefaultChunkSize.
        \param baseAllocator The allocator for allocating memory chunks.
    */
-    MemoryPoolAllocator(size_t chunkSize = kDefaultChunkCapacity, BaseAllocator* baseAllocator = 0) : 
-        chunkHead_(0), chunk_capacity_(chunkSize), userBuffer_(0), baseAllocator_(baseAllocator), ownBaseAllocator_(0)
+    explicit
+    MemoryPoolAllocator(size_t chunkSize = kDefaultChunkCapacity, BaseAllocator* baseAllocator = 0) :
+        chunk_capacity_(chunkSize),
+        baseAllocator_(baseAllocator ? baseAllocator : RAPIDJSON_NEW(BaseAllocator)()),
+        shared_(static_cast<SharedData*>(baseAllocator_ ? baseAllocator_->Malloc(SIZEOF_SHARED_DATA + SIZEOF_CHUNK_HEADER) : 0))
    {
+        RAPIDJSON_ASSERT(baseAllocator_ != 0);
+        RAPIDJSON_ASSERT(shared_ != 0);
+        if (baseAllocator) {
+            shared_->ownBaseAllocator = 0;
+        }
+        else {
+            shared_->ownBaseAllocator = baseAllocator_;
+        }
+        shared_->chunkHead = GetChunkHead(shared_);
+        shared_->chunkHead->capacity = 0;
+        shared_->chunkHead->size = 0;
+        shared_->chunkHead->next = 0;
+        shared_->ownBuffer = true;
+        shared_->refcount = 1;
    }

    //! Constructor with user-supplied buffer.
@@ -136,41 +198,101 @@ public:
        \param baseAllocator The allocator for allocating memory chunks.
    */
    MemoryPoolAllocator(void *buffer, size_t size, size_t chunkSize = kDefaultChunkCapacity, BaseAllocator* baseAllocator = 0) :
-        chunkHead_(0), chunk_capacity_(chunkSize), userBuffer_(buffer), baseAllocator_(baseAllocator), ownBaseAllocator_(0)
+        chunk_capacity_(chunkSize),
+        baseAllocator_(baseAllocator),
+        shared_(static_cast<SharedData*>(AlignBuffer(buffer, size)))
    {
-        RAPIDJSON_ASSERT(buffer != 0);
-        RAPIDJSON_ASSERT(size > sizeof(ChunkHeader));
-        chunkHead_ = reinterpret_cast<ChunkHeader*>(buffer);
-        chunkHead_->capacity = size - sizeof(ChunkHeader);
-        chunkHead_->size = 0;
-        chunkHead_->next = 0;
+        RAPIDJSON_ASSERT(size >= SIZEOF_SHARED_DATA + SIZEOF_CHUNK_HEADER);
+        shared_->chunkHead = GetChunkHead(shared_);
+        shared_->chunkHead->capacity = size - SIZEOF_SHARED_DATA - SIZEOF_CHUNK_HEADER;
+        shared_->chunkHead->size = 0;
+        shared_->chunkHead->next = 0;
+        shared_->ownBaseAllocator = 0;
+        shared_->ownBuffer = false;
+        shared_->refcount = 1;
    }

+    MemoryPoolAllocator(const MemoryPoolAllocator& rhs) RAPIDJSON_NOEXCEPT :
+        chunk_capacity_(rhs.chunk_capacity_),
+        baseAllocator_(rhs.baseAllocator_),
+        shared_(rhs.shared_)
+    {
+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
+        ++shared_->refcount;
+    }
+    MemoryPoolAllocator& operator=(const MemoryPoolAllocator& rhs) RAPIDJSON_NOEXCEPT
+    {
+        RAPIDJSON_NOEXCEPT_ASSERT(rhs.shared_->refcount > 0);
+        ++rhs.shared_->refcount;
+        this->~MemoryPoolAllocator();
+        baseAllocator_ = rhs.baseAllocator_;
+        chunk_capacity_ = rhs.chunk_capacity_;
+        shared_ = rhs.shared_;
+        return *this;
+    }
+
+#if RAPIDJSON_HAS_CXX11_RVALUE_REFS
+    MemoryPoolAllocator(MemoryPoolAllocator&& rhs) RAPIDJSON_NOEXCEPT :
+        chunk_capacity_(rhs.chunk_capacity_),
+        baseAllocator_(rhs.baseAllocator_),
+        shared_(rhs.shared_)
+    {
+        RAPIDJSON_NOEXCEPT_ASSERT(rhs.shared_->refcount > 0);
+        rhs.shared_ = 0;
+    }
+    MemoryPoolAllocator& operator=(MemoryPoolAllocator&& rhs) RAPIDJSON_NOEXCEPT
+    {
+        RAPIDJSON_NOEXCEPT_ASSERT(rhs.shared_->refcount > 0);
+        this->~MemoryPoolAllocator();
+        baseAllocator_ = rhs.baseAllocator_;
+        chunk_capacity_ = rhs.chunk_capacity_;
+        shared_ = rhs.shared_;
+        rhs.shared_ = 0;
+        return *this;
+    }
+#endif
+
    //! Destructor.
    /*! This deallocates all memory chunks, excluding the user-supplied buffer.
    */
-    ~MemoryPoolAllocator() {
+    ~MemoryPoolAllocator() RAPIDJSON_NOEXCEPT {
+        if (!shared_) {
+            // do nothing if moved
+            return;
+        }
+        if (shared_->refcount > 1) {
+            --shared_->refcount;
+            return;
+        }
        Clear();
-        RAPIDJSON_DELETE(ownBaseAllocator_);
+        BaseAllocator *a = shared_->ownBaseAllocator;
+        if (shared_->ownBuffer) {
+            baseAllocator_->Free(shared_);
+        }
+        RAPIDJSON_DELETE(a);
    }

-    //! Deallocates all memory chunks, excluding the user-supplied buffer.
-    void Clear() {
-        while (chunkHead_ && chunkHead_ != userBuffer_) {
-            ChunkHeader* next = chunkHead_->next;
-            baseAllocator_->Free(chunkHead_);
-            chunkHead_ = next;
+    //! Deallocates all memory chunks, excluding the first/user one.
+    void Clear() RAPIDJSON_NOEXCEPT {
+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
+        for (;;) {
+            ChunkHeader* c = shared_->chunkHead;
+            if (!c->next) {
+                break;
+            }
+            shared_->chunkHead = c->next;
+            baseAllocator_->Free(c);
        }
-        if (chunkHead_ && chunkHead_ == userBuffer_)
-            chunkHead_->size = 0; // Clear user buffer
+        shared_->chunkHead->size = 0;
    }

    //! Computes the total capacity of allocated memory chunks.
    /*! \return total capacity in bytes.
    */
-    size_t Capacity() const {
+    size_t Capacity() const RAPIDJSON_NOEXCEPT {
+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
        size_t capacity = 0;
-        for (ChunkHeader* c = chunkHead_; c != 0; c = c->next)
+        for (ChunkHeader* c = shared_->chunkHead; c != 0; c = c->next)
            capacity += c->capacity;
        return capacity;
    }
@@ -178,25 +300,35 @@ public:
    //! Computes the memory blocks allocated.
    /*! \return total used bytes.
    */
-    size_t Size() const {
+    size_t Size() const RAPIDJSON_NOEXCEPT {
+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
        size_t size = 0;
-        for (ChunkHeader* c = chunkHead_; c != 0; c = c->next)
+        for (ChunkHeader* c = shared_->chunkHead; c != 0; c = c->next)
            size += c->size;
        return size;
    }

+    //! Whether the allocator is shared.
+    /*! \return true or false.
+    */
+    bool Shared() const RAPIDJSON_NOEXCEPT {
+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
+        return shared_->refcount > 1;
+    }
+
    //! Allocates a memory block. (concept Allocator)
    void* Malloc(size_t size) {
+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
        if (!size)
            return NULL;

        size = RAPIDJSON_ALIGN(size);
-        if (chunkHead_ == 0 || chunkHead_->size + size > chunkHead_->capacity)
+        if (RAPIDJSON_UNLIKELY(shared_->chunkHead->size + size > shared_->chunkHead->capacity))
            if (!AddChunk(chunk_capacity_ > size ? chunk_capacity_ : size))
                return NULL;

-        void *buffer = reinterpret_cast<char *>(chunkHead_) + RAPIDJSON_ALIGN(sizeof(ChunkHeader)) + chunkHead_->size;
-        chunkHead_->size += size;
+        void *buffer = GetChunkBuffer(shared_) + shared_->chunkHead->size;
+        shared_->chunkHead->size += size;
        return buffer;
    }

@@ -205,6 +337,7 @@ public:
        if (originalPtr == 0)
            return Malloc(newSize);

+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
        if (newSize == 0)
            return NULL;

@@ -216,10 +349,10 @@ public:
            return originalPtr;

        // Simply expand it if it is the last allocation and there is sufficient space
-        if (originalPtr == reinterpret_cast<char *>(chunkHead_) + RAPIDJSON_ALIGN(sizeof(ChunkHeader)) + chunkHead_->size - originalSize) {
+        if (originalPtr == GetChunkBuffer(shared_) + shared_->chunkHead->size - originalSize) {
            size_t increment = static_cast<size_t>(newSize - originalSize);
-            if (chunkHead_->size + increment <= chunkHead_->capacity) {
-                chunkHead_->size += increment;
+            if (shared_->chunkHead->size + increment <= shared_->chunkHead->capacity) {
+                shared_->chunkHead->size += increment;
                return originalPtr;
            }
        }
@@ -235,50 +368,325 @@ public:
    }

    //! Frees a memory block (concept Allocator)
-    static void Free(void *ptr) { (void)ptr; } // Do nothing
+    static void Free(void *ptr) RAPIDJSON_NOEXCEPT { (void)ptr; } // Do nothing
+
+    //! Compare (equality) with another MemoryPoolAllocator
+    bool operator==(const MemoryPoolAllocator& rhs) const RAPIDJSON_NOEXCEPT {
+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
+        RAPIDJSON_NOEXCEPT_ASSERT(rhs.shared_->refcount > 0);
+        return shared_ == rhs.shared_;
+    }
+    //! Compare (inequality) with another MemoryPoolAllocator
+    bool operator!=(const MemoryPoolAllocator& rhs) const RAPIDJSON_NOEXCEPT {
+        return !operator==(rhs);
+    }

 private:
-    //! Copy constructor is not permitted.
-    MemoryPoolAllocator(const MemoryPoolAllocator& rhs) /* = delete */;
-    //! Copy assignment operator is not permitted.
-    MemoryPoolAllocator& operator=(const MemoryPoolAllocator& rhs) /* = delete */;
-
    //! Creates a new chunk.
    /*! \param capacity Capacity of the chunk in bytes.
        \return true if success.
    */
    bool AddChunk(size_t capacity) {
        if (!baseAllocator_)
-            ownBaseAllocator_ = baseAllocator_ = RAPIDJSON_NEW(BaseAllocator)();
-        if (ChunkHeader* chunk = reinterpret_cast<ChunkHeader*>(baseAllocator_->Malloc(RAPIDJSON_ALIGN(sizeof(ChunkHeader)) + capacity))) {
+            shared_->ownBaseAllocator = baseAllocator_ = RAPIDJSON_NEW(BaseAllocator)();
+        if (ChunkHeader* chunk = static_cast<ChunkHeader*>(baseAllocator_->Malloc(SIZEOF_CHUNK_HEADER + capacity))) {
            chunk->capacity = capacity;
            chunk->size = 0;
-            chunk->next = chunkHead_;
-            chunkHead_ =  chunk;
+            chunk->next = shared_->chunkHead;
+            shared_->chunkHead = chunk;
            return true;
        }
        else
            return false;
    }

-    static const int kDefaultChunkCapacity = RAPIDJSON_ALLOCATOR_DEFAULT_CHUNK_CAPACITY; //!< Default chunk capacity.
+    static inline void* AlignBuffer(void* buf, size_t &size)
+    {
+        RAPIDJSON_NOEXCEPT_ASSERT(buf != 0);
+        const uintptr_t mask = sizeof(void*) - 1;
+        const uintptr_t ubuf = reinterpret_cast<uintptr_t>(buf);
+        if (RAPIDJSON_UNLIKELY(ubuf & mask)) {
+            const uintptr_t abuf = (ubuf + mask) & ~mask;
+            RAPIDJSON_ASSERT(size >= abuf - ubuf);
+            buf = reinterpret_cast<void*>(abuf);
+            size -= abuf - ubuf;
+        }
+        return buf;
+    }

-    //! Chunk header for perpending to each chunk.
-    /*! Chunks are stored as a singly linked list.
-    */
-    struct ChunkHeader {
-        size_t capacity;    //!< Capacity of the chunk in bytes (excluding the header itself).
-        size_t size;        //!< Current size of allocated memory in bytes.
-        ChunkHeader *next;  //!< Next chunk in the linked list.
+    size_t chunk_capacity_;     //!< The minimum capacity of chunk when they are allocated.
+    BaseAllocator* baseAllocator_;  //!< base allocator for allocating memory chunks.
+    SharedData *shared_;        //!< The shared data of the allocator
+};
+
+namespace internal {
+    template<typename, typename = void>
+    struct IsRefCounted :
+        public FalseType
+    { };
+    template<typename T>
+    struct IsRefCounted<T, typename internal::EnableIfCond<T::kRefCounted>::Type> :
+        public TrueType
+    { };
+}
+
+template<typename T, typename A>
+inline T* Realloc(A& a, T* old_p, size_t old_n, size_t new_n)
+{
+    RAPIDJSON_NOEXCEPT_ASSERT(old_n <= SIZE_MAX / sizeof(T) && new_n <= SIZE_MAX / sizeof(T));
+    return static_cast<T*>(a.Realloc(old_p, old_n * sizeof(T), new_n * sizeof(T)));
+}
+
+template<typename T, typename A>
+inline T *Malloc(A& a, size_t n = 1)
+{
+    return Realloc<T, A>(a, NULL, 0, n);
+}
+
+template<typename T, typename A>
+inline void Free(A& a, T *p, size_t n = 1)
+{
+    static_cast<void>(Realloc<T, A>(a, p, n, 0));
+}
+
+#ifdef __GNUC__
+RAPIDJSON_DIAG_PUSH
+RAPIDJSON_DIAG_OFF(effc++) // std::allocator can safely be inherited
+#endif
+
+template <typename T, typename BaseAllocator = CrtAllocator>
+class StdAllocator :
+    public std::allocator<T>
+{
+    typedef std::allocator<T> allocator_type;
+#if RAPIDJSON_HAS_CXX11
+    typedef std::allocator_traits<allocator_type> traits_type;
+#else
+    typedef allocator_type traits_type;
+#endif
+
+public:
+    typedef BaseAllocator BaseAllocatorType;
+
+    StdAllocator() RAPIDJSON_NOEXCEPT :
+        allocator_type(),
+        baseAllocator_()
+    { }
+
+    StdAllocator(const StdAllocator& rhs) RAPIDJSON_NOEXCEPT :
+        allocator_type(rhs),
+        baseAllocator_(rhs.baseAllocator_)
+    { }
+
+    template<typename U>
+    StdAllocator(const StdAllocator<U, BaseAllocator>& rhs) RAPIDJSON_NOEXCEPT :
+        allocator_type(rhs),
+        baseAllocator_(rhs.baseAllocator_)
+    { }
+
+#if RAPIDJSON_HAS_CXX11_RVALUE_REFS
+    StdAllocator(StdAllocator&& rhs) RAPIDJSON_NOEXCEPT :
+        allocator_type(std::move(rhs)),
+        baseAllocator_(std::move(rhs.baseAllocator_))
+    { }
+#endif
+#if RAPIDJSON_HAS_CXX11
+    using propagate_on_container_move_assignment = std::true_type;
+    using propagate_on_container_swap = std::true_type;
+#endif
+
+    /* implicit */
+    StdAllocator(const BaseAllocator& allocator) RAPIDJSON_NOEXCEPT :
+        allocator_type(),
+        baseAllocator_(allocator)
+    { }
+
+    ~StdAllocator() RAPIDJSON_NOEXCEPT
+    { }
+
+    template<typename U>
+    struct rebind {
+        typedef StdAllocator<U, BaseAllocator> other;
    };

-    ChunkHeader *chunkHead_;    //!< Head of the chunk linked-list. Only the head chunk serves allocation.
-    size_t chunk_capacity_;     //!< The minimum capacity of chunk when they are allocated.
-    void *userBuffer_;          //!< User supplied buffer.
-    BaseAllocator* baseAllocator_;  //!< base allocator for allocating memory chunks.
-    BaseAllocator* ownBaseAllocator_;   //!< base allocator created by this object.
+    typedef typename traits_type::size_type         size_type;
+    typedef typename traits_type::difference_type   difference_type;
+
+    typedef typename traits_type::value_type        value_type;
+    typedef typename traits_type::pointer           pointer;
+    typedef typename traits_type::const_pointer     const_pointer;
+
+#if RAPIDJSON_HAS_CXX11
+
+    typedef typename std::add_lvalue_reference<value_type>::type &reference;
+    typedef typename std::add_lvalue_reference<typename std::add_const<value_type>::type>::type &const_reference;
+
+    pointer address(reference r) const RAPIDJSON_NOEXCEPT
+    {
+        return std::addressof(r);
+    }
+    const_pointer address(const_reference r) const RAPIDJSON_NOEXCEPT
+    {
+        return std::addressof(r);
+    }
+
+    size_type max_size() const RAPIDJSON_NOEXCEPT
+    {
+        return traits_type::max_size(*this);
+    }
+
+    template <typename ...Args>
+    void construct(pointer p, Args&&... args)
+    {
+        traits_type::construct(*this, p, std::forward<Args>(args)...);
+    }
+    void destroy(pointer p)
+    {
+        traits_type::destroy(*this, p);
+    }
+
+#else // !RAPIDJSON_HAS_CXX11
+
+    typedef typename allocator_type::reference       reference;
+    typedef typename allocator_type::const_reference const_reference;
+
+    pointer address(reference r) const RAPIDJSON_NOEXCEPT
+    {
+        return allocator_type::address(r);
+    }
+    const_pointer address(const_reference r) const RAPIDJSON_NOEXCEPT
+    {
+        return allocator_type::address(r);
+    }
+
+    size_type max_size() const RAPIDJSON_NOEXCEPT
+    {
+        return allocator_type::max_size();
+    }
+
+    void construct(pointer p, const_reference r)
+    {
+        allocator_type::construct(p, r);
+    }
+    void destroy(pointer p)
+    {
+        allocator_type::destroy(p);
+    }
+
+#endif // !RAPIDJSON_HAS_CXX11
+
+    template <typename U>
+    U* allocate(size_type n = 1, const void* = 0)
+    {
+        return RAPIDJSON_NAMESPACE::Malloc<U>(baseAllocator_, n);
+    }
+    template <typename U>
+    void deallocate(U* p, size_type n = 1)
+    {
+        RAPIDJSON_NAMESPACE::Free<U>(baseAllocator_, p, n);
+    }
+
+    pointer allocate(size_type n = 1, const void* = 0)
+    {
+        return allocate<value_type>(n);
+    }
+    void deallocate(pointer p, size_type n = 1)
+    {
+        deallocate<value_type>(p, n);
+    }
+
+#if RAPIDJSON_HAS_CXX11
+    using is_always_equal = std::is_empty<BaseAllocator>;
+#endif
+
+    template<typename U>
+    bool operator==(const StdAllocator<U, BaseAllocator>& rhs) const RAPIDJSON_NOEXCEPT
+    {
+        return baseAllocator_ == rhs.baseAllocator_;
+    }
+    template<typename U>
+    bool operator!=(const StdAllocator<U, BaseAllocator>& rhs) const RAPIDJSON_NOEXCEPT
+    {
+        return !operator==(rhs);
+    }
+
+    //! rapidjson Allocator concept
+    static const bool kNeedFree = BaseAllocator::kNeedFree;
+    static const bool kRefCounted = internal::IsRefCounted<BaseAllocator>::Value;
+    void* Malloc(size_t size)
+    {
+        return baseAllocator_.Malloc(size);
+    }
+    void* Realloc(void* originalPtr, size_t originalSize, size_t newSize)
+    {
+        return baseAllocator_.Realloc(originalPtr, originalSize, newSize);
+    }
+    static void Free(void *ptr) RAPIDJSON_NOEXCEPT
+    {
+        BaseAllocator::Free(ptr);
+    }
+
+private:
+    template <typename, typename>
+    friend class StdAllocator; // access to StdAllocator<!T>.*
+
+    BaseAllocator baseAllocator_;
 };

+#if !RAPIDJSON_HAS_CXX17 // std::allocator<void> deprecated in C++17
+template <typename BaseAllocator>
+class StdAllocator<void, BaseAllocator> :
+    public std::allocator<void>
+{
+    typedef std::allocator<void> allocator_type;
+
+public:
+    typedef BaseAllocator BaseAllocatorType;
+
+    StdAllocator() RAPIDJSON_NOEXCEPT :
+        allocator_type(),
+        baseAllocator_()
+    { }
+
+    StdAllocator(const StdAllocator& rhs) RAPIDJSON_NOEXCEPT :
+        allocator_type(rhs),
+        baseAllocator_(rhs.baseAllocator_)
+    { }
+
+    template<typename U>
+    StdAllocator(const StdAllocator<U, BaseAllocator>& rhs) RAPIDJSON_NOEXCEPT :
+        allocator_type(rhs),
+        baseAllocator_(rhs.baseAllocator_)
+    { }
+
+    /* implicit */
+    StdAllocator(const BaseAllocator& baseAllocator) RAPIDJSON_NOEXCEPT :
+        allocator_type(),
+        baseAllocator_(baseAllocator)
+    { }
+
+    ~StdAllocator() RAPIDJSON_NOEXCEPT
+    { }
+
+    template<typename U>
+    struct rebind {
+        typedef StdAllocator<U, BaseAllocator> other;
+    };
+
+    typedef typename allocator_type::value_type value_type;
+
+private:
+    template <typename, typename>
+    friend class StdAllocator; // access to StdAllocator<!T>.*
+
+    BaseAllocator baseAllocator_;
+};
+#endif
+
+#ifdef __GNUC__
+RAPIDJSON_DIAG_POP
+#endif
+
 RAPIDJSON_NAMESPACE_END

 #endif // RAPIDJSON_ENCODINGS_H_
--- a/src/3rdparty/rapidjson/cursorstreamwrapper.h
+++ b/src/3rdparty/rapidjson/cursorstreamwrapper.h
@@ -1,6 +1,6 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
 //
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
--- a/src/3rdparty/rapidjson/document.h
+++ b/src/3rdparty/rapidjson/document.h
--- a/src/3rdparty/rapidjson/encodedstream.h
+++ b/src/3rdparty/rapidjson/encodedstream.h
@@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_ENCODEDSTREAM_H_
@@ -41,7 +41,7 @@ class EncodedInputStream {
 public:
    typedef typename Encoding::Ch Ch;

-    EncodedInputStream(InputByteStream& is) : is_(is) { 
+    EncodedInputStream(InputByteStream& is) : is_(is) {
        current_ = Encoding::TakeBOM(is_);
    }

@@ -51,7 +51,7 @@ public:

    // Not implemented
    void Put(Ch) { RAPIDJSON_ASSERT(false); }
-    void Flush() { RAPIDJSON_ASSERT(false); } 
+    void Flush() { RAPIDJSON_ASSERT(false); }
    Ch* PutBegin() { RAPIDJSON_ASSERT(false); return 0; }
    size_t PutEnd(Ch*) { RAPIDJSON_ASSERT(false); return 0; }

@@ -80,7 +80,7 @@ public:

    // Not implemented
    void Put(Ch) {}
-    void Flush() {} 
+    void Flush() {}
    Ch* PutBegin() { return 0; }
    size_t PutEnd(Ch*) { return 0; }

@@ -102,7 +102,7 @@ class EncodedOutputStream {
 public:
    typedef typename Encoding::Ch Ch;

-    EncodedOutputStream(OutputByteStream& os, bool putBOM = true) : os_(os) { 
+    EncodedOutputStream(OutputByteStream& os, bool putBOM = true) : os_(os) {
        if (putBOM)
            Encoding::PutBOM(os_);
    }
@@ -143,7 +143,7 @@ public:
        \param type UTF encoding type if it is not detected from the stream.
    */
    AutoUTFInputStream(InputByteStream& is, UTFType type = kUTF8) : is_(&is), type_(type), hasBOM_(false) {
-        RAPIDJSON_ASSERT(type >= kUTF8 && type <= kUTF32BE);        
+        RAPIDJSON_ASSERT(type >= kUTF8 && type <= kUTF32BE);
        DetectType();
        static const TakeFunc f[] = { RAPIDJSON_ENCODINGS_FUNC(Take) };
        takeFunc_ = f[type_];
@@ -159,7 +159,7 @@ public:

    // Not implemented
    void Put(Ch) { RAPIDJSON_ASSERT(false); }
-    void Flush() { RAPIDJSON_ASSERT(false); } 
+    void Flush() { RAPIDJSON_ASSERT(false); }
    Ch* PutBegin() { RAPIDJSON_ASSERT(false); return 0; }
    size_t PutEnd(Ch*) { RAPIDJSON_ASSERT(false); return 0; }

@@ -258,7 +258,7 @@ public:
    UTFType GetType() const { return type_; }

    void Put(Ch c) { putFunc_(*os_, c); }
-    void Flush() { os_->Flush(); } 
+    void Flush() { os_->Flush(); }

    // Not implemented
    Ch Peek() const { RAPIDJSON_ASSERT(false); return 0;}
@@ -271,7 +271,7 @@ private:
    AutoUTFOutputStream(const AutoUTFOutputStream&);
    AutoUTFOutputStream& operator=(const AutoUTFOutputStream&);

-    void PutBOM() { 
+    void PutBOM() {
        typedef void (*PutBOMFunc)(OutputByteStream&);
        static const PutBOMFunc f[] = { RAPIDJSON_ENCODINGS_FUNC(PutBOM) };
        f[type_](*os_);
--- a/src/3rdparty/rapidjson/encodings.h
+++ b/src/3rdparty/rapidjson/encodings.h
@@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_ENCODINGS_H_
@@ -100,7 +100,7 @@ struct UTF8 {

    template<typename OutputStream>
    static void Encode(OutputStream& os, unsigned codepoint) {
-        if (codepoint <= 0x7F) 
+        if (codepoint <= 0x7F)
            os.Put(static_cast<Ch>(codepoint & 0xFF));
        else if (codepoint <= 0x7FF) {
            os.Put(static_cast<Ch>(0xC0 | ((codepoint >> 6) & 0xFF)));
@@ -122,7 +122,7 @@ struct UTF8 {

    template<typename OutputStream>
    static void EncodeUnsafe(OutputStream& os, unsigned codepoint) {
-        if (codepoint <= 0x7F) 
+        if (codepoint <= 0x7F)
            PutUnsafe(os, static_cast<Ch>(codepoint & 0xFF));
        else if (codepoint <= 0x7FF) {
            PutUnsafe(os, static_cast<Ch>(0xC0 | ((codepoint >> 6) & 0xFF)));
@@ -276,7 +276,7 @@ struct UTF16 {
    static void Encode(OutputStream& os, unsigned codepoint) {
        RAPIDJSON_STATIC_ASSERT(sizeof(typename OutputStream::Ch) >= 2);
        if (codepoint <= 0xFFFF) {
-            RAPIDJSON_ASSERT(codepoint < 0xD800 || codepoint > 0xDFFF); // Code point itself cannot be surrogate pair 
+            RAPIDJSON_ASSERT(codepoint < 0xD800 || codepoint > 0xDFFF); // Code point itself cannot be surrogate pair
            os.Put(static_cast<typename OutputStream::Ch>(codepoint));
        }
        else {
@@ -292,7 +292,7 @@ struct UTF16 {
    static void EncodeUnsafe(OutputStream& os, unsigned codepoint) {
        RAPIDJSON_STATIC_ASSERT(sizeof(typename OutputStream::Ch) >= 2);
        if (codepoint <= 0xFFFF) {
-            RAPIDJSON_ASSERT(codepoint < 0xD800 || codepoint > 0xDFFF); // Code point itself cannot be surrogate pair 
+            RAPIDJSON_ASSERT(codepoint < 0xD800 || codepoint > 0xDFFF); // Code point itself cannot be surrogate pair
            PutUnsafe(os, static_cast<typename OutputStream::Ch>(codepoint));
        }
        else {
@@ -406,7 +406,7 @@ struct UTF16BE : UTF16<CharType> {
 ///////////////////////////////////////////////////////////////////////////////
 // UTF32

-//! UTF-32 encoding. 
+//! UTF-32 encoding.
 /*! http://en.wikipedia.org/wiki/UTF-32
    \tparam CharType Type for storing 32-bit UTF-32 data. Default is unsigned. C++11 may use char32_t instead.
    \note implements Encoding concept
@@ -498,7 +498,7 @@ struct UTF32BE : UTF32<CharType> {
    static CharType TakeBOM(InputByteStream& is) {
        RAPIDJSON_STATIC_ASSERT(sizeof(typename InputByteStream::Ch) == 1);
        CharType c = Take(is);
-        return static_cast<uint32_t>(c) == 0x0000FEFFu ? Take(is) : c; 
+        return static_cast<uint32_t>(c) == 0x0000FEFFu ? Take(is) : c;
    }

    template <typename InputByteStream>
@@ -694,13 +694,13 @@ struct Transcoder<Encoding, Encoding> {
        os.Put(is.Take());  // Just copy one code unit. This semantic is different from primary template class.
        return true;
    }
-    
+
    template<typename InputStream, typename OutputStream>
    static RAPIDJSON_FORCEINLINE bool TranscodeUnsafe(InputStream& is, OutputStream& os) {
        PutUnsafe(os, is.Take());  // Just copy one code unit. This semantic is different from primary template class.
        return true;
    }
-    
+
    template<typename InputStream, typename OutputStream>
    static RAPIDJSON_FORCEINLINE bool Validate(InputStream& is, OutputStream& os) {
        return Encoding::Validate(is, os);  // source/target encoding are the same
--- a/src/3rdparty/rapidjson/error/en.h
+++ b/src/3rdparty/rapidjson/error/en.h
@@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_ERROR_EN_H_
@@ -39,13 +39,13 @@ inline const RAPIDJSON_ERROR_CHARTYPE* GetParseError_En(ParseErrorCode parseErro

        case kParseErrorDocumentEmpty:                  return RAPIDJSON_ERROR_STRING("The document is empty.");
        case kParseErrorDocumentRootNotSingular:        return RAPIDJSON_ERROR_STRING("The document root must not be followed by other values.");
-    
+
        case kParseErrorValueInvalid:                   return RAPIDJSON_ERROR_STRING("Invalid value.");
-    
+
        case kParseErrorObjectMissName:                 return RAPIDJSON_ERROR_STRING("Missing a name for object member.");
        case kParseErrorObjectMissColon:                return RAPIDJSON_ERROR_STRING("Missing a colon after a name of object member.");
        case kParseErrorObjectMissCommaOrCurlyBracket:  return RAPIDJSON_ERROR_STRING("Missing a comma or '}' after an object member.");
-    
+
        case kParseErrorArrayMissCommaOrSquareBracket:  return RAPIDJSON_ERROR_STRING("Missing a comma or ']' after an array element.");

        case kParseErrorStringUnicodeEscapeInvalidHex:  return RAPIDJSON_ERROR_STRING("Incorrect hex digit after \\u escape in string.");
@@ -65,6 +65,54 @@ inline const RAPIDJSON_ERROR_CHARTYPE* GetParseError_En(ParseErrorCode parseErro
    }
 }

+//! Maps error code of validation into error message.
+/*!
+    \ingroup RAPIDJSON_ERRORS
+    \param validateErrorCode Error code obtained from validator.
+    \return the error message.
+    \note User can make a copy of this function for localization.
+        Using switch-case is safer for future modification of error codes.
+*/
+inline const RAPIDJSON_ERROR_CHARTYPE* GetValidateError_En(ValidateErrorCode validateErrorCode) {
+    switch (validateErrorCode) {
+        case kValidateErrors:                           return RAPIDJSON_ERROR_STRING("One or more validation errors have occurred");
+        case kValidateErrorNone:                        return RAPIDJSON_ERROR_STRING("No error.");
+
+        case kValidateErrorMultipleOf:                  return RAPIDJSON_ERROR_STRING("Number '%actual' is not a multiple of the 'multipleOf' value '%expected'.");
+        case kValidateErrorMaximum:                     return RAPIDJSON_ERROR_STRING("Number '%actual' is greater than the 'maximum' value '%expected'.");
+        case kValidateErrorExclusiveMaximum:            return RAPIDJSON_ERROR_STRING("Number '%actual' is greater than or equal to the 'exclusiveMaximum' value '%expected'.");
+        case kValidateErrorMinimum:                     return RAPIDJSON_ERROR_STRING("Number '%actual' is less than the 'minimum' value '%expected'.");
+        case kValidateErrorExclusiveMinimum:            return RAPIDJSON_ERROR_STRING("Number '%actual' is less than or equal to the 'exclusiveMinimum' value '%expected'.");
+
+        case kValidateErrorMaxLength:                   return RAPIDJSON_ERROR_STRING("String '%actual' is longer than the 'maxLength' value '%expected'.");
+        case kValidateErrorMinLength:                   return RAPIDJSON_ERROR_STRING("String '%actual' is shorter than the 'minLength' value '%expected'.");
+        case kValidateErrorPattern:                     return RAPIDJSON_ERROR_STRING("String '%actual' does not match the 'pattern' regular expression.");
+
+        case kValidateErrorMaxItems:                    return RAPIDJSON_ERROR_STRING("Array of length '%actual' is longer than the 'maxItems' value '%expected'.");
+        case kValidateErrorMinItems:                    return RAPIDJSON_ERROR_STRING("Array of length '%actual' is shorter than the 'minItems' value '%expected'.");
+        case kValidateErrorUniqueItems:                 return RAPIDJSON_ERROR_STRING("Array has duplicate items at indices '%duplicates' but 'uniqueItems' is true.");
+        case kValidateErrorAdditionalItems:             return RAPIDJSON_ERROR_STRING("Array has an additional item at index '%disallowed' that is not allowed by the schema.");
+
+        case kValidateErrorMaxProperties:               return RAPIDJSON_ERROR_STRING("Object has '%actual' members which is more than 'maxProperties' value '%expected'.");
+        case kValidateErrorMinProperties:               return RAPIDJSON_ERROR_STRING("Object has '%actual' members which is less than 'minProperties' value '%expected'.");
+        case kValidateErrorRequired:                    return RAPIDJSON_ERROR_STRING("Object is missing the following members required by the schema: '%missing'.");
+        case kValidateErrorAdditionalProperties:        return RAPIDJSON_ERROR_STRING("Object has an additional member '%disallowed' that is not allowed by the schema.");
+        case kValidateErrorPatternProperties:           return RAPIDJSON_ERROR_STRING("Object has 'patternProperties' that are not allowed by the schema.");
+        case kValidateErrorDependencies:                return RAPIDJSON_ERROR_STRING("Object has missing property or schema dependencies, refer to following errors.");
+
+        case kValidateErrorEnum:                        return RAPIDJSON_ERROR_STRING("Property has a value that is not one of its allowed enumerated values.");
+        case kValidateErrorType:                        return RAPIDJSON_ERROR_STRING("Property has a type '%actual' that is not in the following list: '%expected'.");
+
+        case kValidateErrorOneOf:                       return RAPIDJSON_ERROR_STRING("Property did not match any of the sub-schemas specified by 'oneOf', refer to following errors.");
+        case kValidateErrorOneOfMatch:                  return RAPIDJSON_ERROR_STRING("Property matched more than one of the sub-schemas specified by 'oneOf'.");
+        case kValidateErrorAllOf:                       return RAPIDJSON_ERROR_STRING("Property did not match all of the sub-schemas specified by 'allOf', refer to following errors.");
+        case kValidateErrorAnyOf:                       return RAPIDJSON_ERROR_STRING("Property did not match any of the sub-schemas specified by 'anyOf', refer to following errors.");
+        case kValidateErrorNot:                         return RAPIDJSON_ERROR_STRING("Property matched the sub-schema specified by 'not'.");
+
+        default:                                        return RAPIDJSON_ERROR_STRING("Unknown error.");
+    }
+}
+
 RAPIDJSON_NAMESPACE_END

 #ifdef __clang__
--- a/src/3rdparty/rapidjson/error/error.h
+++ b/src/3rdparty/rapidjson/error/error.h
@@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_ERROR_ERROR_H_
@@ -152,6 +152,61 @@ private:
 */
 typedef const RAPIDJSON_ERROR_CHARTYPE* (*GetParseErrorFunc)(ParseErrorCode);

+///////////////////////////////////////////////////////////////////////////////
+// ValidateErrorCode
+
+//! Error codes when validating.
+/*! \ingroup RAPIDJSON_ERRORS
+    \see GenericSchemaValidator
+*/
+enum ValidateErrorCode {
+    kValidateErrors    = -1,                   //!< Top level error code when kValidateContinueOnErrorsFlag set.
+    kValidateErrorNone = 0,                    //!< No error.
+
+    kValidateErrorMultipleOf,                  //!< Number is not a multiple of the 'multipleOf' value.
+    kValidateErrorMaximum,                     //!< Number is greater than the 'maximum' value.
+    kValidateErrorExclusiveMaximum,            //!< Number is greater than or equal to the 'maximum' value.
+    kValidateErrorMinimum,                     //!< Number is less than the 'minimum' value.
+    kValidateErrorExclusiveMinimum,            //!< Number is less than or equal to the 'minimum' value.
+
+    kValidateErrorMaxLength,                   //!< String is longer than the 'maxLength' value.
+    kValidateErrorMinLength,                   //!< String is longer than the 'maxLength' value.
+    kValidateErrorPattern,                     //!< String does not match the 'pattern' regular expression.
+
+    kValidateErrorMaxItems,                    //!< Array is longer than the 'maxItems' value.
+    kValidateErrorMinItems,                    //!< Array is shorter than the 'minItems' value.
+    kValidateErrorUniqueItems,                 //!< Array has duplicate items but 'uniqueItems' is true.
+    kValidateErrorAdditionalItems,             //!< Array has additional items that are not allowed by the schema.
+
+    kValidateErrorMaxProperties,               //!< Object has more members than 'maxProperties' value.
+    kValidateErrorMinProperties,               //!< Object has less members than 'minProperties' value.
+    kValidateErrorRequired,                    //!< Object is missing one or more members required by the schema.
+    kValidateErrorAdditionalProperties,        //!< Object has additional members that are not allowed by the schema.
+    kValidateErrorPatternProperties,           //!< See other errors.
+    kValidateErrorDependencies,                //!< Object has missing property or schema dependencies.
+
+    kValidateErrorEnum,                        //!< Property has a value that is not one of its allowed enumerated values
+    kValidateErrorType,                        //!< Property has a type that is not allowed by the schema..
+
+    kValidateErrorOneOf,                       //!< Property did not match any of the sub-schemas specified by 'oneOf'.
+    kValidateErrorOneOfMatch,                  //!< Property matched more than one of the sub-schemas specified by 'oneOf'.
+    kValidateErrorAllOf,                       //!< Property did not match all of the sub-schemas specified by 'allOf'.
+    kValidateErrorAnyOf,                       //!< Property did not match any of the sub-schemas specified by 'anyOf'.
+    kValidateErrorNot                          //!< Property matched the sub-schema specified by 'not'.
+};
+
+//! Function pointer type of GetValidateError().
+/*! \ingroup RAPIDJSON_ERRORS
+
+    This is the prototype for \c GetValidateError_X(), where \c X is a locale.
+    User can dynamically change locale in runtime, e.g.:
+\code
+    GetValidateErrorFunc GetValidateError = GetValidateError_En; // or whatever
+    const RAPIDJSON_ERROR_CHARTYPE* s = GetValidateError(validator.GetInvalidSchemaCode());
+\endcode
+*/
+typedef const RAPIDJSON_ERROR_CHARTYPE* (*GetValidateErrorFunc)(ValidateErrorCode);
+
 RAPIDJSON_NAMESPACE_END

 #ifdef __clang__
--- a/src/3rdparty/rapidjson/filereadstream.h
+++ b/src/3rdparty/rapidjson/filereadstream.h
@@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_FILEREADSTREAM_H_
@@ -41,7 +41,7 @@ public:
        \param buffer user-supplied buffer.
        \param bufferSize size of buffer in bytes. Must >=4 bytes.
    */
-    FileReadStream(std::FILE* fp, char* buffer, size_t bufferSize) : fp_(fp), buffer_(buffer), bufferSize_(bufferSize), bufferLast_(0), current_(buffer_), readCount_(0), count_(0), eof_(false) { 
+    FileReadStream(std::FILE* fp, char* buffer, size_t bufferSize) : fp_(fp), buffer_(buffer), bufferSize_(bufferSize), bufferLast_(0), current_(buffer_), readCount_(0), count_(0), eof_(false) {
        RAPIDJSON_ASSERT(fp_ != 0);
        RAPIDJSON_ASSERT(bufferSize >= 4);
        Read();
@@ -53,7 +53,7 @@ public:

    // Not implemented
    void Put(Ch) { RAPIDJSON_ASSERT(false); }
-    void Flush() { RAPIDJSON_ASSERT(false); } 
+    void Flush() { RAPIDJSON_ASSERT(false); }
    Ch* PutBegin() { RAPIDJSON_ASSERT(false); return 0; }
    size_t PutEnd(Ch*) { RAPIDJSON_ASSERT(false); return 0; }

--- a/src/3rdparty/rapidjson/filewritestream.h
+++ b/src/3rdparty/rapidjson/filewritestream.h
@@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_FILEWRITESTREAM_H_
@@ -33,11 +33,11 @@ class FileWriteStream {
 public:
    typedef char Ch;    //!< Character type. Only support char.

-    FileWriteStream(std::FILE* fp, char* buffer, size_t bufferSize) : fp_(fp), buffer_(buffer), bufferEnd_(buffer + bufferSize), current_(buffer_) { 
+    FileWriteStream(std::FILE* fp, char* buffer, size_t bufferSize) : fp_(fp), buffer_(buffer), bufferEnd_(buffer + bufferSize), current_(buffer_) {
        RAPIDJSON_ASSERT(fp_ != 0);
    }

-    void Put(char c) { 
+    void Put(char c) {
        if (current_ >= bufferEnd_)
            Flush();

--- a/src/3rdparty/rapidjson/fwd.h
+++ b/src/3rdparty/rapidjson/fwd.h
@@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_FWD_H_
@@ -101,8 +101,8 @@ class PrettyWriter;

 // document.h

-template <typename Encoding, typename Allocator> 
-struct GenericMember;
+template <typename Encoding, typename Allocator>
+class GenericMember;

 template <bool Const, typename Encoding, typename Allocator>
 class GenericMemberIterator;
@@ -110,7 +110,7 @@ class GenericMemberIterator;
 template<typename CharType>
 struct GenericStringRef;

-template <typename Encoding, typename Allocator> 
+template <typename Encoding, typename Allocator>
 class GenericValue;

 typedef GenericValue<UTF8<char>, MemoryPoolAllocator<CrtAllocator> > Value;
--- a/src/3rdparty/rapidjson/internal/biginteger.h
+++ b/src/3rdparty/rapidjson/internal/biginteger.h
@@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_BIGINTEGER_H_
@@ -17,7 +17,7 @@

 #include "../rapidjson.h"

-#if defined(_MSC_VER) && !__INTEL_COMPILER && defined(_M_AMD64)
+#if defined(_MSC_VER) && !defined(__INTEL_COMPILER) && defined(_M_AMD64)
 #include <intrin.h> // for _umul128
 #pragma intrinsic(_umul128)
 #endif
@@ -37,7 +37,8 @@ public:
        digits_[0] = u;
    }

-    BigInteger(const char* decimals, size_t length) : count_(1) {
+    template<typename Ch>
+    BigInteger(const Ch* decimals, size_t length) : count_(1) {
        RAPIDJSON_ASSERT(length > 0);
        digits_[0] = 0;
        size_t i = 0;
@@ -51,7 +52,7 @@ public:
        if (length > 0)
            AppendDecimal64(decimals + i, decimals + i + length);
    }
-    
+
    BigInteger& operator=(const BigInteger &rhs)
    {
        if (this != &rhs) {
@@ -60,9 +61,9 @@ public:
        }
        return *this;
    }
-    
+
    BigInteger& operator=(uint64_t u) {
-        digits_[0] = u;            
+        digits_[0] = u;
        count_ = 1;
        return *this;
    }
@@ -95,7 +96,7 @@ public:
            digits_[i] = MulAdd64(digits_[i], u, k, &hi);
            k = hi;
        }
-        
+
        if (k > 0)
            PushBack(k);

@@ -118,7 +119,7 @@ public:
            digits_[i] = (p0 & 0xFFFFFFFF) | (p1 << 32);
            k = p1 >> 32;
        }
-        
+
        if (k > 0)
            PushBack(k);

@@ -221,7 +222,8 @@ public:
    bool IsZero() const { return count_ == 1 && digits_[0] == 0; }

 private:
-    void AppendDecimal64(const char* begin, const char* end) {
+    template<typename Ch>
+    void AppendDecimal64(const Ch* begin, const Ch* end) {
        uint64_t u = ParseUint64(begin, end);
        if (IsZero())
            *this = u;
@@ -236,11 +238,12 @@ private:
        digits_[count_++] = digit;
    }

-    static uint64_t ParseUint64(const char* begin, const char* end) {
+    template<typename Ch>
+    static uint64_t ParseUint64(const Ch* begin, const Ch* end) {
        uint64_t r = 0;
-        for (const char* p = begin; p != end; ++p) {
-            RAPIDJSON_ASSERT(*p >= '0' && *p <= '9');
-            r = r * 10u + static_cast<unsigned>(*p - '0');
+        for (const Ch* p = begin; p != end; ++p) {
+            RAPIDJSON_ASSERT(*p >= Ch('0') && *p <= Ch('9'));
+            r = r * 10u + static_cast<unsigned>(*p - Ch('0'));
        }
        return r;
    }
--- a/src/3rdparty/rapidjson/internal/clzll.h
+++ b/src/3rdparty/rapidjson/internal/clzll.h
@@ -0,0 +1,71 @@
+// Tencent is pleased to support the open source community by making RapidJSON available.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
+//
+// Licensed under the MIT License (the "License"); you may not use this file except
+// in compliance with the License. You may obtain a copy of the License at
+//
+// http://opensource.org/licenses/MIT
+//
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations under the License.
+
+#ifndef RAPIDJSON_CLZLL_H_
+#define RAPIDJSON_CLZLL_H_
+
+#include "../rapidjson.h"
+
+#if defined(_MSC_VER) && !defined(UNDER_CE)
+#include <intrin.h>
+#if defined(_WIN64)
+#pragma intrinsic(_BitScanReverse64)
+#else
+#pragma intrinsic(_BitScanReverse)
+#endif
+#endif
+
+RAPIDJSON_NAMESPACE_BEGIN
+namespace internal {
+
+inline uint32_t clzll(uint64_t x) {
+    // Passing 0 to __builtin_clzll is UB in GCC and results in an
+    // infinite loop in the software implementation.
+    RAPIDJSON_ASSERT(x != 0);
+
+#if defined(_MSC_VER) && !defined(UNDER_CE)
+    unsigned long r = 0;
+#if defined(_WIN64)
+    _BitScanReverse64(&r, x);
+#else
+    // Scan the high 32 bits.
+    if (_BitScanReverse(&r, static_cast<uint32_t>(x >> 32)))
+        return 63 - (r + 32);
+
+    // Scan the low 32 bits.
+    _BitScanReverse(&r, static_cast<uint32_t>(x & 0xFFFFFFFF));
+#endif // _WIN64
+
+    return 63 - r;
+#elif (defined(__GNUC__) && __GNUC__ >= 4) || RAPIDJSON_HAS_BUILTIN(__builtin_clzll)
+    // __builtin_clzll wrapper
+    return static_cast<uint32_t>(__builtin_clzll(x));
+#else
+    // naive version
+    uint32_t r = 0;
+    while (!(x & (static_cast<uint64_t>(1) << 63))) {
+        x <<= 1;
+        ++r;
+    }
+
+    return r;
+#endif // _MSC_VER
+}
+
+#define RAPIDJSON_CLZLL RAPIDJSON_NAMESPACE::internal::clzll
+
+} // namespace internal
+RAPIDJSON_NAMESPACE_END
+
+#endif // RAPIDJSON_CLZLL_H_
--- a/src/3rdparty/rapidjson/internal/diyfp.h
+++ b/src/3rdparty/rapidjson/internal/diyfp.h
@@ -1,6 +1,6 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
 //
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
@@ -20,11 +20,11 @@
 #define RAPIDJSON_DIYFP_H_

 #include "../rapidjson.h"
+#include "clzll.h"
 #include <limits>

 #if defined(_MSC_VER) && defined(_M_AMD64) && !defined(__INTEL_COMPILER)
 #include <intrin.h>
-#pragma intrinsic(_BitScanReverse64)
 #pragma intrinsic(_umul128)
 #endif

@@ -100,22 +100,8 @@ struct DiyFp {
    }

    DiyFp Normalize() const {
-        RAPIDJSON_ASSERT(f != 0); // https://stackoverflow.com/a/26809183/291737
-#if defined(_MSC_VER) && defined(_M_AMD64)
-        unsigned long index;
-        _BitScanReverse64(&index, f);
-        return DiyFp(f << (63 - index), e - (63 - index));
-#elif defined(__GNUC__) && __GNUC__ >= 4
-        int s = __builtin_clzll(f);
+        int s = static_cast<int>(clzll(f));
        return DiyFp(f << s, e - s);
-#else
-        DiyFp res = *this;
-        while (!(res.f & (static_cast<uint64_t>(1) << 63))) {
-            res.f <<= 1;
-            res.e--;
-        }
-        return res;
-#endif
    }

    DiyFp NormalizeBoundary() const {
--- a/src/3rdparty/rapidjson/internal/dtoa.h
+++ b/src/3rdparty/rapidjson/internal/dtoa.h
@@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 // This is a C++ header-only implementation of Grisu2 algorithm from the publication:
@@ -58,7 +58,11 @@ inline int CountDecimalDigit32(uint32_t n) {
 }

 inline void DigitGen(const DiyFp& W, const DiyFp& Mp, uint64_t delta, char* buffer, int* len, int* K) {
-    static const uint32_t kPow10[] = { 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000 };
+    static const uint64_t kPow10[] = { 1U, 10U, 100U, 1000U, 10000U, 100000U, 1000000U, 10000000U, 100000000U,
+                                       1000000000U, 10000000000U, 100000000000U, 1000000000000U,
+                                       10000000000000U, 100000000000000U, 1000000000000000U,
+                                       10000000000000000U, 100000000000000000U, 1000000000000000000U,
+                                       10000000000000000000U };
    const DiyFp one(uint64_t(1) << -Mp.e, Mp.e);
    const DiyFp wp_w = Mp - W;
    uint32_t p1 = static_cast<uint32_t>(Mp.f >> -one.e);
@@ -86,7 +90,7 @@ inline void DigitGen(const DiyFp& W, const DiyFp& Mp, uint64_t delta, char* buff
        uint64_t tmp = (static_cast<uint64_t>(p1) << -one.e) + p2;
        if (tmp <= delta) {
            *K += kappa;
-            GrisuRound(buffer, *len, delta, tmp, static_cast<uint64_t>(kPow10[kappa]) << -one.e, wp_w.f);
+            GrisuRound(buffer, *len, delta, tmp, kPow10[kappa] << -one.e, wp_w.f);
            return;
        }
    }
@@ -103,7 +107,7 @@ inline void DigitGen(const DiyFp& W, const DiyFp& Mp, uint64_t delta, char* buff
        if (p2 < delta) {
            *K += kappa;
            int index = -kappa;
-            GrisuRound(buffer, *len, delta, p2, one.f, wp_w.f * (index < 9 ? kPow10[index] : 0));
+            GrisuRound(buffer, *len, delta, p2, one.f, wp_w.f * (index < 20 ? kPow10[index] : 0));
            return;
        }
    }
--- a/src/3rdparty/rapidjson/internal/ieee754.h
+++ b/src/3rdparty/rapidjson/internal/ieee754.h
@@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_IEEE754_
--- a/src/3rdparty/rapidjson/internal/itoa.h
+++ b/src/3rdparty/rapidjson/internal/itoa.h
@@ -1,6 +1,6 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
 //
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
--- a/src/3rdparty/rapidjson/internal/meta.h
+++ b/src/3rdparty/rapidjson/internal/meta.h
@@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_INTERNAL_META_H_
--- a/src/3rdparty/rapidjson/internal/pow10.h
+++ b/src/3rdparty/rapidjson/internal/pow10.h
@@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_POW10_
@@ -27,8 +27,8 @@ namespace internal {
 */
 inline double Pow10(int n) {
    static const double e[] = { // 1e-0...1e308: 309 * 8 bytes = 2472 bytes
-        1e+0,  
-        1e+1,  1e+2,  1e+3,  1e+4,  1e+5,  1e+6,  1e+7,  1e+8,  1e+9,  1e+10, 1e+11, 1e+12, 1e+13, 1e+14, 1e+15, 1e+16, 1e+17, 1e+18, 1e+19, 1e+20, 
+        1e+0,
+        1e+1,  1e+2,  1e+3,  1e+4,  1e+5,  1e+6,  1e+7,  1e+8,  1e+9,  1e+10, 1e+11, 1e+12, 1e+13, 1e+14, 1e+15, 1e+16, 1e+17, 1e+18, 1e+19, 1e+20,
        1e+21, 1e+22, 1e+23, 1e+24, 1e+25, 1e+26, 1e+27, 1e+28, 1e+29, 1e+30, 1e+31, 1e+32, 1e+33, 1e+34, 1e+35, 1e+36, 1e+37, 1e+38, 1e+39, 1e+40,
        1e+41, 1e+42, 1e+43, 1e+44, 1e+45, 1e+46, 1e+47, 1e+48, 1e+49, 1e+50, 1e+51, 1e+52, 1e+53, 1e+54, 1e+55, 1e+56, 1e+57, 1e+58, 1e+59, 1e+60,
        1e+61, 1e+62, 1e+63, 1e+64, 1e+65, 1e+66, 1e+67, 1e+68, 1e+69, 1e+70, 1e+71, 1e+72, 1e+73, 1e+74, 1e+75, 1e+76, 1e+77, 1e+78, 1e+79, 1e+80,
--- a/src/3rdparty/rapidjson/internal/regex.h
+++ b/src/3rdparty/rapidjson/internal/regex.h
@@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_INTERNAL_REGEX_H_
@@ -23,7 +23,6 @@
 RAPIDJSON_DIAG_PUSH
 RAPIDJSON_DIAG_OFF(padded)
 RAPIDJSON_DIAG_OFF(switch-enum)
-RAPIDJSON_DIAG_OFF(implicit-fallthrough)
 #elif defined(_MSC_VER)
 RAPIDJSON_DIAG_PUSH
 RAPIDJSON_DIAG_OFF(4512) // assignment operator could not be generated
@@ -32,9 +31,6 @@ RAPIDJSON_DIAG_OFF(4512) // assignment operator could not be generated
 #ifdef __GNUC__
 RAPIDJSON_DIAG_PUSH
 RAPIDJSON_DIAG_OFF(effc++)
-#if __GNUC__ >= 7
-RAPIDJSON_DIAG_OFF(implicit-fallthrough)
-#endif
 #endif

 #ifndef RAPIDJSON_REGEX_VERBOSE
@@ -106,9 +102,9 @@ class GenericRegexSearch;
    - \c \\t Tab (U+0009)
    - \c \\v Vertical tab (U+000B)

-    \note This is a Thompson NFA engine, implemented with reference to 
-        Cox, Russ. "Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby,...).", 
-        https://swtch.com/~rsc/regexp/regexp1.html 
+    \note This is a Thompson NFA engine, implemented with reference to
+        Cox, Russ. "Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby,...).",
+        https://swtch.com/~rsc/regexp/regexp1.html
 */
 template <typename Encoding, typename Allocator = CrtAllocator>
 class GenericRegex {
@@ -117,9 +113,9 @@ public:
    typedef typename Encoding::Ch Ch;
    template <typename, typename> friend class GenericRegexSearch;

-    GenericRegex(const Ch* source, Allocator* allocator = 0) : 
-        ownAllocator_(allocator ? 0 : RAPIDJSON_NEW(Allocator)()), allocator_(allocator ? allocator : ownAllocator_), 
-        states_(allocator_, 256), ranges_(allocator_, 256), root_(kRegexInvalidState), stateCount_(), rangeCount_(), 
+    GenericRegex(const Ch* source, Allocator* allocator = 0) :
+        ownAllocator_(allocator ? 0 : RAPIDJSON_NEW(Allocator)()), allocator_(allocator ? allocator : ownAllocator_),
+        states_(allocator_, 256), ranges_(allocator_, 256), root_(kRegexInvalidState), stateCount_(), rangeCount_(),
        anchorBegin_(), anchorEnd_()
    {
        GenericStringStream<Encoding> ss(source);
@@ -151,7 +147,7 @@ private:
    static const unsigned kRangeNegationFlag = 0x80000000;

    struct Range {
-        unsigned start; // 
+        unsigned start; //
        unsigned end;
        SizeType next;
    };
@@ -291,6 +287,7 @@ private:
                    if (!CharacterEscape(ds, &codepoint))
                        return; // Unsupported escape character
                    // fall through to default
+                    RAPIDJSON_DELIBERATE_FALLTHROUGH;

                default: // Pattern character
                    PushOperand(operandStack, codepoint);
@@ -405,7 +402,7 @@ private:
                }
                return false;

-            default: 
+            default:
                // syntax error (e.g. unclosed kLeftParenthesis)
                return false;
        }
@@ -520,6 +517,7 @@ private:
                else if (!CharacterEscape(ds, &codepoint))
                    return false;
                // fall through to default
+                RAPIDJSON_DELIBERATE_FALLTHROUGH;

            default:
                switch (step) {
@@ -529,6 +527,7 @@ private:
                        break;
                    }
                    // fall through to step 0 for other characters
+                    RAPIDJSON_DELIBERATE_FALLTHROUGH;

                case 0:
                    {
@@ -551,7 +550,7 @@ private:
        }
        return false;
    }
-    
+
    SizeType NewRange(unsigned codepoint) {
        Range* r = ranges_.template Push<Range>();
        r->start = r->end = codepoint;
@@ -609,7 +608,7 @@ public:
    typedef typename RegexType::EncodingType Encoding;
    typedef typename Encoding::Ch Ch;

-    GenericRegexSearch(const RegexType& regex, Allocator* allocator = 0) : 
+    GenericRegexSearch(const RegexType& regex, Allocator* allocator = 0) :
        regex_(regex), allocator_(allocator), ownAllocator_(0),
        state0_(allocator, 0), state1_(allocator, 0), stateSet_()
    {
@@ -668,7 +667,7 @@ private:
            for (const SizeType* s = current->template Bottom<SizeType>(); s != current->template End<SizeType>(); ++s) {
                const State& sr = regex_.GetState(*s);
                if (sr.codepoint == codepoint ||
-                    sr.codepoint == RegexType::kAnyCharacterClass || 
+                    sr.codepoint == RegexType::kAnyCharacterClass ||
                    (sr.codepoint == RegexType::kRangeCharacterClass && MatchRange(sr.rangeStart, codepoint)))
                {
                    matched = AddState(*next, sr.out) || matched;
--- a/src/3rdparty/rapidjson/internal/stack.h
+++ b/src/3rdparty/rapidjson/internal/stack.h
@@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_INTERNAL_STACK_H_
@@ -98,7 +98,7 @@ public:

    void Clear() { stackTop_ = stack_; }

-    void ShrinkToFit() { 
+    void ShrinkToFit() {
        if (Empty()) {
            // If the stack is empty, completely deallocate the memory.
            Allocator::Free(stack_); // NOLINT (+clang-analyzer-unix.Malloc)
@@ -142,7 +142,7 @@ public:
    }

    template<typename T>
-    T* Top() { 
+    T* Top() {
        RAPIDJSON_ASSERT(GetSize() >= sizeof(T));
        return reinterpret_cast<T*>(stackTop_ - sizeof(T));
    }
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				`epee - is a small library of helpers, wrappers, tools and and so on, used to make my life easier.`