<hr>
<h3>About CUB</h3>
CUB provides state-of-the-art, reusable software components for every layer
of the CUDA programming model:
- [<b><em>Device-wide primitives</em></b>](https://nvlabs.github.com/cub/group___device_module.html)
- Sort, prefix scan, reduction, histogram, etc.
- Compatible with CUDA dynamic parallelism
- [<b><em>Block-wide "collective" primitives</em></b>](https://nvlabs.github.com/cub/group___block_module.html)
- I/O, sort, prefix scan, reduction, histogram, etc.
- Compatible with arbitrary thread block sizes and types
- [<b><em>Warp-wide "collective" primitives</em></b>](https://nvlabs.github.com/cub/group___warp_module.html)
- Warp-wide prefix scan, reduction, etc.
- Safe and architecture-specific
- [<b><em>Thread and resource utilities</em></b>](https://nvlabs.github.com/cub/group___thread_module.html)
- PTX intrinsics, device reflection, texture-caching iterators, caching memory allocators, etc.
![Orientation of collective primitives within the CUDA software stack](http://nvlabs.github.com/cub/cub_overview.png)
CUB is included in the NVIDIA HPC SDK and the CUDA Toolkit.
We recommend the [CUB Project Website](http://nvlabs.github.com/cub) for further information and examples.
<br><hr>
<h3>A Simple Example</h3>
```C++
#include <cub/cub.cuh>
// Block-sorting CUDA kernel
__global__ void BlockSortKernel(int *d_in, int *d_out)
{
using namespace cub;
// Specialize BlockRadixSort, BlockLoad, and BlockStore for 128 threads
// owning 16 integer items each
typedef BlockRadixSort<int, 128, 16> BlockRadixSort;
typedef BlockLoad<int, 128, 16, BLOCK_LOAD_TRANSPOSE> BlockLoad;
typedef BlockStore<int, 128, 16, BLOCK_STORE_TRANSPOSE> BlockStore;
// Allocate shared memory
__shared__ union {
typename BlockRadixSort::TempStorage sort;
typename BlockLoad::TempStorage load;
typename BlockStore::TempStorage store;
} temp_storage;
int block_offset = blockIdx.x * (128 * 16); // OffsetT for this block's ment
// Obtain a segment of 2048 consecutive keys that are blocked across threads
int thread_keys[16];
BlockLoad(temp_storage.load).Load(d_in + block_offset, thread_keys);
__syncthreads();
// Collectively sort the keys
BlockRadixSort(temp_storage.sort).Sort(thread_keys);
__syncthreads();
// Store the sorted segment
BlockStore(temp_storage.store).Store(d_out + block_offset, thread_keys);
}
```
Each thread block uses `cub::BlockRadixSort` to collectively sort
its own input segment. The class is specialized by the
data type being sorted, by the number of threads per block, by the number of
keys per thread, and implicitly by the targeted compilation architecture.
The `cub::BlockLoad` and `cub::BlockStore` classes are similarly specialized.
Furthermore, to provide coalesced accesses to device memory, these primitives are
configured to access memory using a striped access pattern (where consecutive threads
simultaneously access consecutive items) and then <em>transpose</em> the keys into
a [<em>blocked arrangement</em>](index.html#sec4sec3) of elements across threads.
Once specialized, these classes expose opaque `TempStorage` member types.
The thread block uses these storage types to statically allocate the union of
shared memory needed by the thread block. (Alternatively these storage types
could be aliased to global memory allocations).
<br><hr>
<h3>Releases</h3>
CUB is distributed with the NVIDIA HPC SDK and the CUDA Toolkit in addition
to GitHub.
See the [changelog](CHANGELOG.md) for details about specific releases.
| CUB Release | Included In |
| ------------------------- | ------------------------------ |
| 1.9.10 | NVIDIA HPC SDK 20.5 |
| 1.9.9 | CUDA Toolkit 11.0 |
| 1.9.8-1 | NVIDIA HPC SDK 20.3 |
| 1.9.8 | CUDA Toolkit 11.0 Early Access |
| 1.9.8 | CUDA 11.0 Early Access |
| 1.8.0 | |
| 1.7.5 | Thrust 1.9.2 |
| 1.7.4 | Thrust 1.9.1-2 |
| 1.7.3 | |
| 1.7.2 | |
| 1.7.1 | |
| 1.7.0 | Thrust 1.9.0-5 |
| 1.6.4 | |
| 1.6.3 | |
| 1.6.2 (previously 1.5.5) | |
| 1.6.1 (previously 1.5.4) | |
| 1.6.0 (previously 1.5.3) | |
| 1.5.2 | |
| 1.5.1 | |
| 1.5.0 | |
| 1.4.1 | |
| 1.4.0 | |
| 1.3.2 | |
| 1.3.1 | |
| 1.3.0 | |
| 1.2.3 | |
| 1.2.2 | |
| 1.2.0 | |
| 1.1.1 | |
| 1.0.2 | |
| 1.0.1 | |
| 0.9.4 | |
| 0.9.2 | |
| 0.9.1 | |
| 0.9.0 | |
<br><hr>
<h3>Development Model</h3>
For information on development model, see [this document](DEVELOPMENT_MODEL.md).
<br><hr>
<h3>Open Source License</h3>
CUB is available under the "New BSD" open-source license:
```
Copyright (c) 2010-2011, Duane Merrill. All rights reserved.
Copyright (c) 2011-2018, NVIDIA CORPORATION. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the NVIDIA CORPORATION nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL NVIDIA CORPORATION BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
```
没有合适的资源?快使用搜索试试~ 我知道了~
基于 C++和python实现BlazePose算法的机器人人体姿势识别与模仿算法源码.zip
共2000个文件
cc:709个
h:488个
cu:145个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 178 浏览量
2024-06-12
21:36:59
上传
评论
收藏 234.25MB ZIP 举报
温馨提示
基于 C++和python实现BlazePose 算法的机器人人体姿势识别与模仿算法源码.zip本科生毕业论文源码题目:基于 BlazePose 算法的机器人人体姿势识别与模仿 本仓库分五部分: 01.BlazePose_train_test 这是BlazePose的复现工作,具体操作可以查看这里。 02.BlazePose_pc 本文件夹主要是 BlazePose PC端的姿态识别代码,具体操作可以查看这里。 03.BlazePose_app 该部分主要是 BlazePose 移动端的姿态识别代码,主要使用TNN开发。具体信息查看这里 04.BlazePose_unity 本文件夹主要是 BlazePose 虚拟机器人的姿态模仿代码,具体操作可以查看这里。 05.BlazePose_robot 该部分主要是 BlazePose 真实机器人的姿态模仿代码,具体操作可以查看这里。 基于 C++和python实现BlazePose 算法的机器人人体姿势识别与模仿算法源码.zip基于 C++和python实现BlazePose 算法的机器人人体姿势识别与模仿算法源码.zip
资源推荐
资源详情
资源评论
收起资源包目录
基于 C++和python实现BlazePose算法的机器人人体姿势识别与模仿算法源码.zip (2000个子文件)
gradlew.bat 2KB
gradlew.bat 2KB
fileHashes.bin 18KB
feature_tests.bin 14KB
CMakeDetermineCompilerABI_CXX.bin 10KB
CMakeDetermineCompilerABI_C.bin 10KB
last-build.bin 1B
NeuralNetwork.pb-c.c 918KB
protobuf-c.c 94KB
FeatureTypes.pb-c.c 53KB
DataStructures.pb-c.c 46KB
rotate_6.c 42KB
rotate_7.c 42KB
rotate_8.c 42KB
rotate_5.c 42KB
Model.pb-c.c 24KB
Parameters.pb-c.c 15KB
rotate_4.c 13KB
rotate_3.c 9KB
rotate_2.c 9KB
kannarotate.c 2KB
feature_tests.c 728B
opencl_program.cc 2.82MB
clipper.cc 163KB
x86_mat_util.cc 85KB
arm_util.cc 82KB
arm_blob_converter.cc 69KB
arm_mat_util.cc 66KB
x86_compute_int8.cc 62KB
compute_half.cc 49KB
gemm_function.cc 44KB
arm_upsample_layer_acc.cc 44KB
x86_compute.cc 43KB
compute_int8.cc 43KB
tnn_sdk_sample.cc 39KB
compute.cc 38KB
x86_conv_layer_3x3.cc 35KB
x86_binary_op_layer_acc.cc 32KB
opencl_wrapper.cc 30KB
opencl_runtime.cc 28KB
gemm_function_fp16.cc 27KB
opencl_utils.cc 27KB
opencl_lstm_layer_acc.cc 27KB
arm_conv_fp16_layer_common.cc 26KB
npu_network.cc 26KB
opencl_concat_layer_acc.cc 26KB
x86_conv_int8_layer_common.cc 25KB
opencl_blob_converter.cc 25KB
arm_conv_fp16_layer_3x3.cc 23KB
x86_util.cc 22KB
x86_inst_norm_layer_acc.cc 22KB
opencl_gather_layer_acc.cc 22KB
arm_conv_int8_layer_common.cc 21KB
opencl_mat_converter.cc 21KB
arm_inner_product_layer_acc.cc 21KB
arm_concat_layer_acc.cc 21KB
arm_conv_layer_3x3.cc 21KB
opencl_layer_acc.cc 21KB
youtu_face_align.cc 20KB
arm_conv_layer_group.cc 20KB
arm_binary_layer_acc.cc 20KB
x86_blob_converter.cc 20KB
net_optimizer_insert_layout_reformat.cc 19KB
opencl_binary_layer_acc.cc 19KB
ocr_detector_jni.cc 18KB
opencl_mat_mul_layer_acc.cc 18KB
model_interpreter.cc 17KB
cpu_upsample_layer_acc.cc 17KB
pose_detect_landmark_jni.cc 16KB
arm_roialign_layer_acc.cc 16KB
cpu_mat_util.cc 16KB
x86_upsample_layer_acc.cc 16KB
winograd_function.cc 16KB
opencl_reduce_layer_acc.cc 16KB
opencl_conv_layer_acc_impl.cc 15KB
net_optimizer_insert_fp16_reformat.cc 15KB
arm_conv_layer_common.cc 15KB
blazepose_detector.cc 15KB
imagebuffer_convertor.cc 15KB
bert_tokenizer.cc 15KB
rknpu_network.cc 14KB
blazepose_landmark.cc 14KB
arm_deconv_layer_stride.cc 14KB
opencl_deconv_layer_acc_impl.cc 14KB
x86_lstm_layer_acc.cc 14KB
opencl_conv_layer_winograd_acc.cc 14KB
arm_conv_int8_sdot_layer_common.cc 14KB
arm_conv_int8_sdot_layer_depthwise_3x3.cc 14KB
arm_mat_converter.cc 14KB
x86_mat_converter.cc 14KB
ocr_textbox_detector.cc 14KB
arm_lstm_layer_acc.cc 13KB
cpu_mat_converter.cc 13KB
opencl_inner_product_layer_acc.cc 13KB
x86_inner_product_layer_acc.cc 13KB
opencl_hdr_guide_layer_acc.cc 13KB
cpu_lstm_layer_acc.cc 13KB
cpu_conv_layer_acc.cc 13KB
blazeface_detector_jni.cc 13KB
opencl_conv_layer_1x1_acc.cc 13KB
共 2000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 20
资源评论
猰貐的新时代
- 粉丝: 1w+
- 资源: 2886
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功