cambriocnpytorch训练和推理模型集合资源-CSDN文库

共2000个文件

py：1025个

sh：263个

md：255个

版权申诉

人工智能

173 浏览量 2024-02-05 11:23:30 上传评论收藏 33.22MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

cambriocn pytorch训练和推理模型集合（2000个子文件）

_mask.c 632KB

cpu_nms.c 393KB

maskApi.c 8KB

gpu_nms.cpp 275KB

Taco2LSTMCellLayerPlugin_test.cpp 37KB

decoderBuilderPlugins.cpp 21KB

taco2AttentionLayerPlugin.cpp 19KB

taco2LSTMCellLayerPlugin.cpp 18KB

decoderBuilderPlain.cpp 18KB

taco2ProjectionLayerPlugin.cpp 18KB

taco2PrenetLayerPlugin.cpp 13KB

lstm.cpp 12KB

taco2ModulationRemovalLayerPlugin.cpp 12KB

taco2DenoiseTransformLayerPlugin.cpp 12KB

tacotron2StreamingInstance.cpp 11KB

CustomContext.cpp 10KB

decoderInstancePlugins.cpp 10KB

vector_pool.cpp 9KB

trtUtils.cpp 9KB

iou3d_cpu.cpp 8KB

characterMapping.cpp 8KB

jsonModelImporter.cpp 8KB

encoderBuilder.cpp 8KB

ROIAlign_cpu.cpp 8KB

speechSynthesizer.cpp 8KB

TRTISClient.cpp 8KB

Taco2ModulationRemovalLayerPlugin_test.cpp 8KB

taco2AttentionLayerPluginCreator.cpp 8KB

engineCache.cpp 8KB

decoderInstancePlain.cpp 7KB

taco2ProjectionLayerPluginCreator.cpp 7KB

Taco2DenoiseTransformLayerPlugin_test.cpp 7KB

roiaware_pool3d.cpp 7KB

convBatchNormCreator.cpp 7KB

Taco2ProjectionLayerPlugin_test.cpp 7KB

decoderInstance.cpp 7KB

denoiserBuilder.cpp 7KB

tacotron2Instance.cpp 7KB

taco2LSTMCellLayerPluginCreator.cpp 7KB

waveGlowInstance.cpp 7KB

waveGlowStreamingInstance.cpp 7KB

iou3d_nms.cpp 6KB

Taco2PrenetLayerPlugin_test.cpp 6KB

waveGlowBuilder.cpp 6KB

taco2ModulationRemovalLayerPluginCreator.cpp 6KB

attentionLayerCreator.cpp 6KB

build_denoiser.cpp 6KB

taco2PrenetLayerPluginCreator.cpp 6KB

speechDataBuffer.cpp 6KB

taco2DenoiseTransformLayerPluginCreator.cpp 6KB

postNetBuilder.cpp 6KB

build_tacotron2.cpp 6KB

build_waveglow.cpp 5KB

pluginBuilder.cpp 5KB

waveFileWriter.cpp 5KB

WaveFileWriter.cpp 5KB

CustomInputReader.cpp 5KB

tacotron2Loader.cpp 4KB

tacotron2Builder.cpp 4KB

Blending_test.cpp 4KB

trtis_client.cpp 4KB

UnitTest.cpp 4KB

componentTiming.cpp 4KB

denoiserInstance.cpp 4KB

interpolate.cpp 4KB

CharacterMappingReader.cpp 4KB

CustomOutputWriter.cpp 4KB

encoderInstance.cpp 4KB

JSONModelImporter_test.cpp 4KB

CharacterMapping_test.cpp 4KB

layerData.cpp 3KB

DataShuffler_test.cpp 3KB

custom.cpp 3KB

postNetInstance.cpp 3KB

binding.cpp 3KB

denoiserLoader.cpp 3KB

waveGlowLoader.cpp 3KB

denoiserStreamingInstance.cpp 3KB

nms_cpu.cpp 2KB

engineDriver.cpp 2KB

group_points.cpp 2KB

nms_cpu.cpp 2KB

CharacterMappingReader_test.cpp 2KB

roipoint_pool3d.cpp 2KB

interpolate.cpp 2KB

sampling.cpp 2KB

pointnet2_api.cpp 2KB

ball_query.cpp 2KB

sampling.cpp 2KB

vision.cpp 1KB

voxel_query.cpp 1KB

vision.cpp 1KB

ball_query.cpp 1KB

group_points.cpp 1KB

pointnet2_api.cpp 1KB

iou3d_nms_api.cpp 596B

logging.h 16KB

taco2AttentionLayerPlugin.h 9KB

taco2LSTMCellLayerPlugin.h 9KB

共 2000 条

# BERT For PyTorch This repository provides a script and recipe to train the BERT model for PyTorch to achieve state-of-the-art accuracy, and is tested and maintained by NVIDIA. ## Table Of Contents - [Model overview](#model-overview) * [Model architecture](#model-architecture) * [Default configuration](#default-configuration) * [Feature support matrix](#feature-support-matrix) * [Features](#features) * [Mixed precision training](#mixed-precision-training) * [Enabling mixed precision](#enabling-mixed-precision) * [Enabling TF32](#enabling-tf32) * [Glossary](#glossary) - [Setup](#setup) * [Requirements](#requirements) - [Quick Start Guide](#quick-start-guide) - [Advanced](#advanced) * [Scripts and sample code](#scripts-and-sample-code) * [Parameters](#parameters) * [Pre-training parameters](#pre-training-parameters) * [Fine tuning parameters](#fine-tuning-parameters) * [Multi-node](#multi-node) * [Fine-tuning parameters](#fine-tuning-parameters) * [Command-line options](#command-line-options) * [Getting the data](#getting-the-data) * [Dataset guidelines](#dataset-guidelines) * [Multi-dataset](#multi-dataset) * [Training process](#training-process) * [Pre-training](#pre-training) * [Fine-tuning](#fine-tuning) * [Inference process](#inference-process) * [Fine-tuning inference](#fine-tuning-inference) * [Deploying BERT using NVIDIA Triton Inference Server](#deploying-bert-using-nvidia-triton-inference-server) - [Performance](#performance) * [Benchmarking](#benchmarking) * [Training performance benchmark](#training-performance-benchmark) * [Inference performance benchmark](#inference-performance-benchmark) * [Results](#results) * [Training accuracy results](#training-accuracy-results) * [Pre-training loss results: NVIDIA DGX A100 (8x A100 40GB)](#pre-training-loss-results-nvidia-dgx-a100-8x-a100-40gb) * [Pre-training loss results: NVIDIA DGX-2H V100 (16x V100 32GB)](#pre-training-loss-results-nvidia-dgx-2h-v100-16x-v100-32gb) * [Pre-training loss results](#pre-training-loss-results) * [Pre-training loss curves](#pre-training-loss-curves) * [Fine-tuning accuracy results: NVIDIA DGX A100 (8x A100 40GB)](#fine-tuning-accuracy-results-nvidia-dgx-a100-8x-a100-40gb) * [Fine-tuning accuracy results: NVIDIA DGX-2 (16x V100 32G)](#fine-tuning-accuracy-results-nvidia-dgx-2-16x-v100-32g) * [Fine-tuning accuracy results: NVIDIA DGX-1 (8x V100 16G)](#fine-tuning-accuracy-results-nvidia-dgx-1-8x-v100-16g) * [Training stability test](#training-stability-test) * [Pre-training stability test](#pre-training-stability-test) * [Fine-tuning stability test](#fine-tuning-stability-test) * [Training performance results](#training-performance-results) * [Training performance: NVIDIA DGX A100 (8x A100 40GB)](#training-performance-nvidia-dgx-a100-8x-a100-40gb) * [Pre-training NVIDIA DGX A100 (8x A100 40GB)](#pre-training-nvidia-dgx-a100-8x-a100-40gb) * [Fine-tuning NVIDIA DGX A100 (8x A100 40GB)](#fine-tuning-nvidia-dgx-a100-8x-a100-40gb) * [Training performance: NVIDIA DGX-2 (16x V100 32G)](#training-performance-nvidia-dgx-2-16x-v100-32g) * [Pre-training NVIDIA DGX-2 With 32G](#pre-training-nvidia-dgx-2-with-32g) * [Pre-training on multiple NVIDIA DGX-2H With 32G](#pre-training-on-multiple-nvidia-dgx-2h-with-32g) * [Fine-tuning NVIDIA DGX-2 With 32G](#fine-tuning-nvidia-dgx-2-with-32g) * [Training performance: NVIDIA DGX-1 (8x V100 32G)](#training-performance-nvidia-dgx-1-8x-v100-32g) * [Pre-training NVIDIA DGX-1 With 32G](#pre-training-nvidia-dgx-1-with-32g) * [Fine-tuning NVIDIA DGX-1 With 32G](#fine-tuning-nvidia-dgx-1-with-32g) * [Training performance: NVIDIA DGX-1 (8x V100 16G)](#training-performance-nvidia-dgx-1-8x-v100-16g) * [Pre-training NVIDIA DGX-1 With 16G](#pre-training-nvidia-dgx-1-with-16g) * [Pre-training on multiple NVIDIA DGX-1 With 16G](#pre-training-on-multiple-nvidia-dgx-1-with-16g) * [Fine-tuning NVIDIA DGX-1 With 16G](#fine-tuning-nvidia-dgx-1-with-16g) * [Inference performance results](#inference-performance-results) * [Inference performance: NVIDIA DGX A100 (1x A100 40GB)](#inference-performance-nvidia-dgx-a100-1x-a100-40gb) * [Fine-tuning inference on NVIDIA DGX A100 (1x A100 40GB)](#fine-tuning-inference-on-nvidia-dgx-a100-1x-a100-40gb) * [Inference performance: NVIDIA DGX-2 (1x V100 32G)](#inference-performance-nvidia-dgx-2-1x-v100-32g) * [Fine-tuning inference on NVIDIA DGX-2 with 32G](#fine-tuning-inference-on-nvidia-dgx-2-with-32g) * [Inference performance: NVIDIA DGX-1 (1x V100 32G)](#inference-performance-nvidia-dgx-1-1x-v100-32g) * [Fine-tuning inference on NVIDIA DGX-1 with 32G](#fine-tuning-inference-on-nvidia-dgx-1-with-32g) * [Inference performance: NVIDIA DGX-1 (1x V100 16G)](#inference-performance-nvidia-dgx-1-1x-v100-16g) * [Fine-tuning inference on NVIDIA DGX-1 with 16G](#fine-tuning-inference-on-nvidia-dgx-1-with-16g) - [Release notes](#release-notes) * [Changelog](#changelog) * [Known issues](#known-issues) ## Model overview BERT, or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. This model is based on the [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) paper. NVIDIA's implementation of BERT is an optimized version of the [Hugging Face implementation](https://github.com/huggingface/pytorch-pretrained-BERT), leveraging mixed precision arithmetic and Tensor Cores on Volta V100 and Ampere A100 GPUs for faster training times while maintaining target accuracy. This repository contains scripts to interactively launch data download, training, benchmarking and inference routines in a Docker container for both pre-training and fine-tuning for tasks such as question answering. The major differences between the original implementation of the paper and this version of BERT are as follows: - Scripts to download Wikipedia and BookCorpus datasets - Scripts to preprocess downloaded data or a custom corpus into inputs and targets for pre-training in a modular fashion - Fused [LAMB](https://arxiv.org/pdf/1904.00962.pdf) optimizer to support training with larger batches - Fused Adam optimizer for fine tuning tasks - Fused CUDA kernels for better performance LayerNorm - Automatic mixed precision (AMP) training support - Scripts to launch on multiple number of nodes Other publicly available implementations of BERT include: 1. [NVIDIA TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT) 2. [Hugging Face](https://github.com/huggingface/pytorch-pretrained-BERT) 3. [codertimo](https://github.com/codertimo/BERT-pytorch) 4. [gluon-nlp](https://github.com/dmlc/gluon-nlp/tree/master/scripts/bert) 5. [Google's implementation](https://github.com/google-research/bert) This model trains with mixed precision Tensor Cores on Volta and provides a push-button solution to pretraining on a corpus of choice. As a result, researchers can get results 4x faster than training without Tensor Cores. This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time. ### Model architecture The BERT model uses the same architecture as the

评论收藏

内容反馈

版权申诉