算法部署-在苹果Silicon上使用CoreML部署StableDiffusion扩散模型-附项目源码-优质项目实战.zip

共76个文件

swift：28个

png：28个

py：16个

版权申诉

Silicon

CoreML

扩散模型

27 浏览量 2024-05-17 16:03:07 上传评论收藏 23.42MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

算法部署_在苹果Silicon上使用CoreML部署StableDiffusion扩散模型_附项目源码_优质项目实战.zip （76个子文件）

算法部署_在苹果Silicon上使用CoreML部署StableDiffusion扩散模型_附项目源码_优质项目实战

Package.swift 1KB

setup.py 1KB

assets

mbp

a_high_quality_photo_of_a_surfing_dog.7667.final_4.50-bits.png 1.49MB

a_high_quality_photo_of_a_surfing_dog.7667.final_float16_original.png 1.56MB

stabilityai_stable-diffusion-2-1-base_psnr_vs_size.png 64KB

a_high_quality_photo_of_a_surfing_dog.7667.final_6.55-bits.png 1.52MB

stabilityai_stable-diffusion-xl-base-1.0_psnr_vs_size.png 57KB

a_high_quality_photo_of_a_surfing_dog.7667.final_3.41-bits.png 1.54MB

runwayml_stable-diffusion-v1-5_psnr_vs_size.png 56KB

float16_gpu_readmereel.png 2.16MB

palette6_cpuandne_readmereel.png 2.2MB

a_high_quality_photo_of_an_astronaut_riding_a_horse_in_space

randomSeed_13_computeUnit_CPU_AND_GPU_modelVersion_CompVis_stable-diffusion-v1-4.png 446KB

randomSeed_11_computeUnit_CPU_AND_NE_modelVersion_stabilityai_stable-diffusion-2-base.png 427KB

randomSeed_93_computeUnit_CPU_AND_GPU_modelVersion_runwayml_stable-diffusion-v1-5.png 456KB

randomSeed_13_computeUnit_ALL_modelVersion_CompVis_stable-diffusion-v1-4.png 467KB

randomSeed_93_computeUnit_ALL_modelVersion_runwayml_stable-diffusion-v1-5.png 461KB

randomSeed_93_computeUnit_CPU_AND_NE_modelVersion_runwayml_stable-diffusion-v1-5.png 461KB

randomSeed_13_computeUnit_CPU_AND_NE_modelVersion_CompVis_stable-diffusion-v1-4.png 468KB

randomSeed_11_computeUnit_CPU_AND_GPU_modelVersion_stabilityai_stable-diffusion-2-base.png 423KB

float16_cpuandne_readmereel.png 2.17MB

controlnet_readme_reel.png 1.5MB

readme_reel.png 1.35MB

a_high_quality_photo_of_an_astronaut_riding_a_dragon_in_space

randomSeed_93_computeUnit_ALL_modelVersion_stabilityai_stable-diffusion-2-base.png 507KB

randomSeed_93_computeUnit_CPU_AND_GPU_modelVersion_stabilityai_stable-diffusion-2-base.png 520KB

randomSeed_11_computeUnit_CPU_AND_NE_modelVersion_runwayml_stable-diffusion-v1-5.png 430KB

randomSeed_11_computeUnit_CPU_AND_GPU_modelVersion_runwayml_stable-diffusion-v1-5.png 395KB

randomSeed_123456789_computeUnit_CPU_AND_GPU_modelVersion_CompVis_stable-diffusion-v1-4.png 428KB

randomSeed_93_computeUnit_CPU_AND_NE_modelVersion_stabilityai_stable-diffusion-2-base.png 507KB

randomSeed_123456789_computeUnit_CPU_AND_NE_modelVersion_CompVis_stable-diffusion-v1-4.png 444KB

randomSeed_123456789_computeUnit_ALL_modelVersion_CompVis_stable-diffusion-v1-4.png 444KB

tests

__init__.py 0B

test_stable_diffusion.py 15KB

swift

StableDiffusion

pipeline

SampleTimer.swift 2KB

MultilingualTextEncoder.swift 8KB

Encoder.swift 4KB

ManagedMLModel.swift 2KB

StableDiffusionPipeline+Resources.swift 7KB

ResourceManaging.swift 577B

RandomSource.swift 263B

Scheduler.swift 13KB

Decoder.swift 3KB

TorchRandomSource.swift 5KB

StableDiffusionXLPipeline.swift 15KB

NvRandomSource.swift 3KB

Unet.swift 9KB

SafetyChecker.swift 6KB

NumPyRandomSource.swift 4KB

CGImage+vImage.swift 9KB

DPMSolverMultistepScheduler.swift 12KB

TextEncoderXL.swift 4KB

StableDiffusionXL+Resources.swift 5KB

StableDiffusionPipeline.swift 16KB

StableDiffusionPipeline.Configuration.swift 4KB

TextEncoder.swift 3KB

ControlNet.swift 5KB

tokenizer

BPETokenizer+Reading.swift 2KB

BPETokenizer.swift 6KB

StableDiffusionCLI

main.swift 12KB

StableDiffusionTests

StableDiffusionTests.swift 2KB

Resources

merges.txt 512KB

vocab.json 842KB

requirements.txt 130B

python_coreml_stable_diffusion

mixed_bit_compression_pre_analysis.py 20KB

__init__.py 34B

mixed_bit_compression_apply.py 4KB

coreml_model.py 7KB

_version.py 23B

unet.py 42KB

torch2coreml.py 63KB

layer_norm.py 3KB

controlnet.py 9KB

pipeline.py 32KB

multilingual_projection.py 2KB

attention.py 5KB

chunk_mlprogram.py 14KB

README.md 51KB

# Core ML Stable Diffusion Run Stable Diffusion on Apple Silicon with Core ML This repository comprises: - `python_coreml_stable_diffusion`, a Python package for converting PyTorch models to Core ML format and performing image generation with Hugging Face [diffusers](https://github.com/huggingface/diffusers) in Python - `StableDiffusion`, a Swift package that developers can add to their Xcode projects as a dependency to deploy image generation capabilities in their apps. The Swift package relies on the Core ML model files generated by `python_coreml_stable_diffusion` If you run into issues during installation or runtime, please refer to the [FAQ](#faq) section. Please refer to the [System Requirements](#system-requirements) section before getting started. <img src="assets/readme_reel.png"> ## <a name="system-requirements"></a> System Requirements <details> <summary> Details (Click to expand) </summary> Model Conversion: macOS | Python | coremltools | :------:|:------:|:-----------:| 13.1 | 3.8 | 7.0 | Project Build: macOS | Xcode | Swift | :------:|:-----:|:-----:| 13.1 | 14.3 | 5.8 | Target Device Runtime: macOS | iPadOS, iOS | :------:|:-----------:| 13.1 | 16.2 | Target Device Runtime ([With Memory Improvements](#compression-6-bits-and-higher)): macOS | iPadOS, iOS | :------:|:-----------:| 14.0 | 17.0 | Target Device Hardware Generation: Mac | iPad | iPhone | :------:|:-------:|:-------:| M1 | M1 | A14 | </details> ## <a name="performance-benchmark"></a> Performance Benchmarks <details> <summary> Details (Click to expand) </summary> [`stabilityai/stable-diffusion-2-1-base`](https://huggingface.co/apple/coreml-stable-diffusion-2-1-base) (512x512) | Device | `--compute-unit`| `--attention-implementation` | End-to-End Latency (s) | Diffusion Speed (iter/s) | | --------------------- | --------------- | ---------------------------- | ---------------------- | ------------------------ | | iPhone 12 Mini | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 18.5* | 1.44 | | iPhone 12 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 15.4 | 1.45 | | iPhone 13 | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 10.8* | 2.53 | | iPhone 13 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 10.4 | 2.55 | | iPhone 14 | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 8.6 | 2.57 | | iPhone 14 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 7.9 | 2.69 | | iPad Pro (M1) | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 11.2 | 2.19 | | iPad Pro (M2) | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 7.0 | 3.07 | <details> <summary> Details (Click to expand) </summary> - This benchmark was conducted by Apple and Hugging Face using public beta versions of iOS 17.0, iPadOS 17.0 and macOS 14.0 Seed 8 in August 2023. - The performance data was collected using the `benchmark` branch of the [Diffusers app](https://github.com/huggingface/swift-coreml-diffusers) - Swift code is not fully optimized, introducing up to ~10% overhead unrelated to Core ML model execution. - The median latency value across 5 back-to-back end-to-end executions are reported - The image generation procedure follows the standard configuration: 20 inference steps, 512x512 output image resolution, 77 text token sequence length, classifier-free guidance (batch size of 2 for unet). - The actual prompt length does not impact performance because the Core ML model is converted with a static shape that computes the forward pass for all of the 77 elements (`tokenizer.model_max_length`) in the text token sequence regardless of the actual length of the input text. - Weights are compressed to 6 bit precision. Please refer to [this section](#compression-6-bits-and-higher) for details. - Activations are in float16 precision for both the GPU and the Neural Engine. - `*` indicates that the [reduceMemory](https://github.com/apple/ml-stable-diffusion/blob/main/swift/StableDiffusion/pipeline/StableDiffusionPipeline.swift#L91) option was enabled which loads and unloads models just-in-time to avoid memory shortage. This added up to 2 seconds to the end-to-end latency. - In the benchmark table, we report the best performing `--compute-unit` and `--attention-implementation` values per device. The former does not modify the Core ML model and can be applied during runtime. The latter modifies the Core ML model. Note that the best performing compute unit is model version and hardware-specific. - Note that the performance optimizations in this repository (e.g. `--attention-implementation`) are generally applicable to Transformers and not customized to Stable Diffusion. Better performance may be observed upon custom kernel tuning. Therefore, these numbers do not represent **peak** HW capability. - Performance may vary across different versions of Stable Diffusion due to architecture changes in the model itself. Each reported number is specific to the model version mentioned in that context. - Performance may vary due to factors like increased system load from other applications or suboptimal device thermal state. </details> [`stabilityai/stable-diffusion-xl-base-1.0-ios`](https://huggingface.co/apple/coreml-stable-diffusion-xl-base-ios) (768x768) | Device | `--compute-unit`| `--attention-implementation` | End-to-End Latency (s) | Diffusion Speed (iter/s) | | --------------------- | --------------- | ---------------------------- | ---------------------- | ------------------------ | | iPhone 12 Pro | `CPU_AND_NE` | `SPLIT_EINSUM` | 116* | 0.50 | | iPhone 13 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM` | 86* | 0.68 | | iPhone 14 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM` | 77* | 0.83 | | iPhone 15 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM` | 31 | 0.85 | | iPad Pro (M1) | `CPU_AND_NE` | `SPLIT_EINSUM` | 36 | 0.69 | | iPad Pro (M2) | `CPU_AND_NE` | `SPLIT_EINSUM` | 27 | 0.98 | <details> <summary> Details (Click to expand) </summary> - This benchmark was conducted by Apple and Hugging Face using iOS 17.0.2 and iPadOS 17.0.2 in September 2023. - The performance data was collected using the `benchmark` branch of the [Diffusers app](https://github.com/huggingface/swift-coreml-diffusers) - The median latency value across 5 back-to-back end-to-end executions are reported - The image generation procedure follows this configuration: 20 inference steps, 768x768 output image resolution, 77 text token sequence length, classifier-free guidance (batch size of 2 for unet). - `Unet.mlmodelc` is compressed to 4.04 bit precision following the [Mixed-Bit Palettization](#compression-lower-than-6-bits) algorithm recipe published [here](https://huggingface.co/apple/coreml-stable-diffusion-mixed-bit-palettization/blob/main/recipes/stabilityai-stable-diffusion-xl-base-1.0_palettization_recipe.json) - All models except for `Unet.mlmodelc` are compressed to 16 bit precision - [madebyollin/sdxl-vae-fp16-fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) by [@madebyollin](https://github.com/madebyollin) was used as the source PyTorch model for `VAEDecoder.mlmodelc` in order to enable float16 weight and activation quantization for the VAE model. - `--attention-implementatio

评论收藏

内容反馈

版权申诉