# Efficient and performance-portable vector software
[//]: # (placeholder, do not remove)
Highway is a C++ library that provides portable SIMD/vector intrinsics.
[Documentation](https://google.github.io/highway/en/master/)
Previously licensed under Apache 2, now dual-licensed as Apache 2 / BSD-3.
## Why
We are passionate about high-performance software. We see major untapped
potential in CPUs (servers, mobile, desktops). Highway is for engineers who want
to reliably and economically push the boundaries of what is possible in
software.
## How
CPUs provide SIMD/vector instructions that apply the same operation to multiple
data items. This can reduce energy usage e.g. *fivefold* because fewer
instructions are executed. We also often see *5-10x* speedups.
Highway makes SIMD/vector programming practical and workable according to these
guiding principles:
**Does what you expect**: Highway is a C++ library with carefully-chosen
functions that map well to CPU instructions without extensive compiler
transformations. The resulting code is more predictable and robust to code
changes/compiler updates than autovectorization.
**Works on widely-used platforms**: Highway supports five architectures; the
same application code can target various instruction sets, including those with
'scalable' vectors (size unknown at compile time). Highway only requires C++11
and supports four families of compilers. If you would like to use Highway on
other platforms, please raise an issue.
**Flexible to deploy**: Applications using Highway can run on heterogeneous
clouds or client devices, choosing the best available instruction set at
runtime. Alternatively, developers may choose to target a single instruction set
without any runtime overhead. In both cases, the application code is the same
except for swapping `HWY_STATIC_DISPATCH` with `HWY_DYNAMIC_DISPATCH` plus one
line of code.
**Suitable for a variety of domains**: Highway provides an extensive set of
operations, used for image processing (floating-point), compression, video
analysis, linear algebra, cryptography, sorting and random generation. We
recognise that new use-cases may require additional ops and are happy to add
them where it makes sense (e.g. no performance cliffs on some architectures). If
you would like to discuss, please file an issue.
**Rewards data-parallel design**: Highway provides tools such as Gather,
MaskedLoad, and FixedTag to enable speedups for legacy data structures. However,
the biggest gains are unlocked by designing algorithms and data structures for
scalable vectors. Helpful techniques include batching, structure-of-array
layouts, and aligned/padded allocations.
## Examples
Online demos using Compiler Explorer:
- [multiple targets with dynamic dispatch](https://gcc.godbolt.org/z/KM3ben7ET)
(more complicated, but flexible and uses best available SIMD)
- [single target using -m flags](https://gcc.godbolt.org/z/rGnjMevKG)
(simpler, but requires/only uses the instruction set enabled by compiler
flags)
We observe that Highway is referenced in the following open source projects,
found via sourcegraph.com. Most are Github repositories. If you would like to
add your project or link to it directly, feel free to raise an issue or contact
us via the below email.
* Browsers: Chromium (+Vivaldi), Firefox (+floorp / foxhound / librewolf / Waterfox)
* Cryptography: google/distributed_point_functions
* Data structures: bkille/BitLib
* Image codecs: eustas/2im, [Grok JPEG 2000](https://github.com/GrokImageCompression/grok), [JPEG XL](https://github.com/libjxl/libjxl), OpenHTJ2K, [JPEGenc](https://github.com/osamu620/JPEGenc)
* Image processing: cloudinary/ssimulacra2, m-ab-s/media-autobuild_suite, [libvips](https://github.com/libvips/libvips)
* Image viewers: AlienCowEatCake/ImageViewer, mirillis/jpegxl-wic,
[Lux panorama/image viewer](https://bitbucket.org/kfj/pv/)
* Information retrieval: [iresearch database index](https://github.com/iresearch-toolkit/iresearch/blob/e7638e7a4b99136ca41f82be6edccf01351a7223/core/utils/simd_utils.hpp), michaeljclark/zvec
* Machine learning: Tensorflow, Numpy, zpye/SimpleInfer
* Voxels: rools/voxl
Other
* [Evaluation of C++ SIMD Libraries](https://www.mnm-team.org/pub/Fopras/rock23/):
"Highway excelled with a strong performance across multiple SIMD extensions
[..]. Thus, Highway may currently be the most suitable SIMD library for many
software projects."
* [zimt](https://github.com/kfjahnke/zimt): C++11 template library to process n-dimensional arrays with multi-threaded SIMD code
* [vectorized Quicksort](https://github.com/google/highway/tree/master/hwy/contrib/sort) ([paper](https://arxiv.org/abs/2205.05982))
If you'd like to get Highway, in addition to cloning from this Github repository
or using it as a Git submodule, you can also find it in the following package
managers or repositories: alpinelinux, conan-io, conda-forge, DragonFlyBSD,
freebsd, ghostbsd, microsoft/vcpkg, MidnightBSD, MSYS2, NetBSD, openSUSE,
opnsense, Xilinx/Vitis_Libraries. See also the list at
https://repology.org/project/highway-simd-library/versions .
## Current status
### Targets
Highway supports 22 targets, listed in alphabetical order of platform:
- Any: `EMU128`, `SCALAR`;
- Arm: `NEON` (Armv7+), `SVE`, `SVE2`, `SVE_256`, `SVE2_128`;
- IBM Z: `Z14`, `Z15`;
- POWER: `PPC8` (v2.07), `PPC9` (v3.0), `PPC10` (v3.1B, not yet supported
due to compiler bugs, see #1207; also requires QEMU 7.2);
- RISC-V: `RVV` (1.0);
- WebAssembly: `WASM`, `WASM_EMU256` (a 2x unrolled version of wasm128,
enabled if `HWY_WANT_WASM2` is defined. This will remain supported until it
is potentially superseded by a future version of WASM.);
- x86:
- `SSE2`
- `SSSE3` (~Intel Core)
- `SSE4` (~Nehalem, also includes AES + CLMUL).
- `AVX2` (~Haswell, also includes BMI2 + F16 + FMA)
- `AVX3` (~Skylake, AVX-512F/BW/CD/DQ/VL)
- `AVX3_DL` (~Icelake, includes BitAlg + CLMUL + GFNI + VAES + VBMI +
VBMI2 + VNNI + VPOPCNT; requires opt-in by defining `HWY_WANT_AVX3_DL`
unless compiling for static dispatch),
- `AVX3_ZEN4` (like AVX3_DL but optimized for AMD Zen4; requires opt-in by
defining `HWY_WANT_AVX3_ZEN4` if compiling for static dispatch)
- `AVX3_SPR` (~Sapphire Rapids, includes AVX-512FP16)
Our policy is that unless otherwise specified, targets will remain supported as
long as they can be (cross-)compiled with currently supported Clang or GCC, and
tested using QEMU. If the target can be compiled with LLVM trunk and tested
using our version of QEMU without extra flags, then it is eligible for inclusion
in our continuous testing infrastructure. Otherwise, the target will be manually
tested before releases with selected versions/configurations of Clang and GCC.
SVE was initially tested using farm_sve (see acknowledgments).
### Versioning
Highway releases aim to follow the semver.org system (MAJOR.MINOR.PATCH),
incrementing MINOR after backward-compatible additions and PATCH after
backward-compatible fixes. We recommend using releases (rather than the Git tip)
because they are tested more extensively, see below.
The current version 1.0 signals an increased focus on backwards compatibility.
Applications using documented functionality will remain compatible with future
updates that have the same major version number.
### Testing
Continuous integration tests build with a recent version of Clang (running on
native x86, or QEMU for RISC-V and Arm) and MSVC 2019 (v19.28, running on native
x86).
Before releases, we also test on x86 with Clang and GCC, and Armv7/8 via GCC
cross-compile. See the [testing process](g3doc/release_testing_process.md) for
details.
### Related modules
The `contrib` directory cont
没有合适的资源?快使用搜索试试~ 我知道了~
gemma.cpp windows已编译好 exe
共1544个文件
cc:286个
h:216个
vcxproj:119个
需积分: 0 9 下载量 186 浏览量
2024-02-28
17:11:57
上传
评论
收藏 81.71MB ZIP 举报
温馨提示
gemma.cpp是一个轻量级的、独立的C++推理引擎,用于Gemma Google的基础模型。 Gemma 可使用模型库2b、2b-it、7b、7b-it、等 gemma.exe windows 已编译可直接运行执行 gemma.exe --tokenizer "F:\gemma.cpp\7b\tokenizer.spm" --compressed_weights "F:\gemma.cpp\7b\7b-it-sfp.sbs" --model "7b-it"
资源推荐
资源详情
资源评论
收起资源包目录
gemma.cpp windows已编译好 exe (1544个子文件)
make.bat 795B
run_tests.bat 308B
BUILD.bazel 16KB
BUILD.bazel 6KB
BUILD.bazel 3KB
BUILD.bazel 2KB
MODULE.bazel 255B
MODULE.bazel 209B
feature_tests.bin 19KB
CMakeDetermineCompilerABI_CXX.bin 17KB
CMakeDetermineCompilerABI_C.bin 17KB
BUILD 14KB
BUILD 7KB
BUILD 3KB
CMakeCCompilerId.c 28KB
gmock-matchers_test.cc 289KB
gtest_unittest.cc 263KB
gtest.cc 253KB
sentencepiece_model.pb.cc 139KB
strutil.cc 89KB
extension_set.cc 83KB
gtest_pred_impl_unittest.cc 78KB
gmock-spec-builders_test.cc 76KB
googletest-printers-test.cc 64KB
gtest-death-test.cc 63KB
sentencepiece_processor_test.cc 55KB
gmock-more-actions_test.cc 54KB
gmock-actions_test.cc 52KB
convert_test.cc 52KB
googletest-death-test-test.cc 48KB
gtest-port.cc 48KB
crypto_test.cc 42KB
googletest-port-test.cc 41KB
googletest-param-test-test.cc 41KB
sentencepiece_processor.cc 39KB
googletest-output-test_.cc 36KB
sentencepiece.pb.cc 35KB
unigram_model.cc 34KB
gmock-function-mocker_test.cc 34KB
gmock-spec-builders.cc 34KB
gemma.cc 33KB
unigram_model_test.cc 32KB
coded_stream.cc 31KB
generated_message_util.cc 30KB
trainer_interface.cc 29KB
base_test.cc 28KB
gmock-internal-utils_test.cc 28KB
wire_format_lite.cc 28KB
sort_test.cc 27KB
structurally_valid.cc 26KB
compress_test.cc 26KB
compare_test.cc 25KB
googletest-listener-test.cc 24KB
foreach_vec_test.cc 24KB
demote_test.cc 24KB
unigram_model_trainer.cc 23KB
googletest-filepath-test.cc 23KB
arithmetic_test.cc 22KB
message_lite.cc 21KB
trainer_interface_test.cc 21KB
parse_context.cc 21KB
highway_test.cc 21KB
mask_test.cc 20KB
masked_arithmetic_test.cc 20KB
memory_test.cc 20KB
widen_mul_test.cc 19KB
targets.cc 19KB
mul_test.cc 19KB
builder.cc 19KB
slide_up_down_test.cc 19KB
gtest-printers.cc 19KB
gmock-matchers.cc 18KB
sentencepiece_trainer_test.cc 17KB
math_test.cc 17KB
blockwise_test.cc 17KB
normalizer_test.cc 16KB
arena.cc 16KB
gmock-nice-strict_test.cc 16KB
sfp_test.cc 16KB
float_test.cc 16KB
model_interface_test.cc 15KB
nuq_test.cc 15KB
thread_pool_test.cc 15KB
sentencepiece_trainer.cc 15KB
aligned_allocator_test.cc 15KB
bench_sort.cc 15KB
swizzle_test.cc 15KB
unroller_test.cc 15KB
shift_test.cc 14KB
gtest-typed-test_test.cc 14KB
gtest-filepath.cc 14KB
blob_store.cc 14KB
mask_mem_test.cc 14KB
io_win32.cc 14KB
zero_copy_stream_impl_lite.cc 13KB
sums_abs_diff_test.cc 13KB
spm_train_main.cc 13KB
transform_test.cc 13KB
gtest-unittest-api_test.cc 13KB
util_test.cc 13KB
共 1544 条
- 1
- 2
- 3
- 4
- 5
- 6
- 16
资源评论
qq77481427
- 粉丝: 0
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 可直接运行 MATLAB数学建模学习资料 模拟算法MATLAB代码实现.rar
- 基于 Java+SQLServer 实现的医药售卖系统课程设计
- HCNP(HCDP)华为认证资深网络工程师-路由交换方向培训 -IESN中文理论书-内文.pdf
- 新版FPGA课程大纲,芯片硬件开发用的大纲
- ROS2下OpenCV识别物体区域和视频捕捉的样例
- STM32-EMBPI.PDF
- Font Awesome图标字体库提供可缩放矢量图标,它可以被定制大小、颜色、阴影以及任何可以用CSS的样式
- Bluefield 2固件镜像版本,fw-MBF2M345A-VENOT-ES-Ax-24.40.1000.bin
- 雪颜奇迹幻白双重莹白焕采霜50ML-1016-FA.rar
- Qt的QDOCK高级用法源码,包含linux和windows版本,从开源库下载
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功