# Core ML Stable Diffusion
Run Stable Diffusion on Apple Silicon with Core ML
This repository comprises:
- `python_coreml_stable_diffusion`, a Python package for converting PyTorch models to Core ML format and performing image generation with Hugging Face [diffusers](https://github.com/huggingface/diffusers) in Python
- `StableDiffusion`, a Swift package that developers can add to their Xcode projects as a dependency to deploy image generation capabilities in their apps. The Swift package relies on the Core ML model files generated by `python_coreml_stable_diffusion`
If you run into issues during installation or runtime, please refer to the [FAQ](#faq) section. Please refer to the [System Requirements](#system-requirements) section before getting started.
<img src="assets/readme_reel.png">
## <a name="system-requirements"></a> System Requirements
<details>
<summary> Details (Click to expand) </summary>
Model Conversion:
macOS | Python | coremltools |
:------:|:------:|:-----------:|
13.1 | 3.8 | 7.0 |
Project Build:
macOS | Xcode | Swift |
:------:|:-----:|:-----:|
13.1 | 14.3 | 5.8 |
Target Device Runtime:
macOS | iPadOS, iOS |
:------:|:-----------:|
13.1 | 16.2 |
Target Device Runtime ([With Memory Improvements](#compression-6-bits-and-higher)):
macOS | iPadOS, iOS |
:------:|:-----------:|
14.0 | 17.0 |
Target Device Hardware Generation:
Mac | iPad | iPhone |
:------:|:-------:|:-------:|
M1 | M1 | A14 |
</details>
## <a name="performance-benchmark"></a> Performance Benchmarks
<details>
<summary> Details (Click to expand) </summary>
[`stabilityai/stable-diffusion-2-1-base`](https://huggingface.co/apple/coreml-stable-diffusion-2-1-base) (512x512)
| Device | `--compute-unit`| `--attention-implementation` | End-to-End Latency (s) | Diffusion Speed (iter/s) |
| --------------------- | --------------- | ---------------------------- | ---------------------- | ------------------------ |
| iPhone 12 Mini | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 18.5* | 1.44 |
| iPhone 12 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 15.4 | 1.45 |
| iPhone 13 | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 10.8* | 2.53 |
| iPhone 13 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 10.4 | 2.55 |
| iPhone 14 | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 8.6 | 2.57 |
| iPhone 14 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 7.9 | 2.69 |
| iPad Pro (M1) | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 11.2 | 2.19 |
| iPad Pro (M2) | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 7.0 | 3.07 |
<details>
<summary> Details (Click to expand) </summary>
- This benchmark was conducted by Apple and Hugging Face using public beta versions of iOS 17.0, iPadOS 17.0 and macOS 14.0 Seed 8 in August 2023.
- The performance data was collected using the `benchmark` branch of the [Diffusers app](https://github.com/huggingface/swift-coreml-diffusers)
- Swift code is not fully optimized, introducing up to ~10% overhead unrelated to Core ML model execution.
- The median latency value across 5 back-to-back end-to-end executions are reported
- The image generation procedure follows the standard configuration: 20 inference steps, 512x512 output image resolution, 77 text token sequence length, classifier-free guidance (batch size of 2 for unet).
- The actual prompt length does not impact performance because the Core ML model is converted with a static shape that computes the forward pass for all of the 77 elements (`tokenizer.model_max_length`) in the text token sequence regardless of the actual length of the input text.
- Weights are compressed to 6 bit precision. Please refer to [this section](#compression-6-bits-and-higher) for details.
- Activations are in float16 precision for both the GPU and the Neural Engine.
- `*` indicates that the [reduceMemory](https://github.com/apple/ml-stable-diffusion/blob/main/swift/StableDiffusion/pipeline/StableDiffusionPipeline.swift#L91) option was enabled which loads and unloads models just-in-time to avoid memory shortage. This added up to 2 seconds to the end-to-end latency.
- In the benchmark table, we report the best performing `--compute-unit` and `--attention-implementation` values per device. The former does not modify the Core ML model and can be applied during runtime. The latter modifies the Core ML model. Note that the best performing compute unit is model version and hardware-specific.
- Note that the performance optimizations in this repository (e.g. `--attention-implementation`) are generally applicable to Transformers and not customized to Stable Diffusion. Better performance may be observed upon custom kernel tuning. Therefore, these numbers do not represent **peak** HW capability.
- Performance may vary across different versions of Stable Diffusion due to architecture changes in the model itself. Each reported number is specific to the model version mentioned in that context.
- Performance may vary due to factors like increased system load from other applications or suboptimal device thermal state.
</details>
[`stabilityai/stable-diffusion-xl-base-1.0-ios`](https://huggingface.co/apple/coreml-stable-diffusion-xl-base-ios) (768x768)
| Device | `--compute-unit`| `--attention-implementation` | End-to-End Latency (s) | Diffusion Speed (iter/s) |
| --------------------- | --------------- | ---------------------------- | ---------------------- | ------------------------ |
| iPhone 12 Pro | `CPU_AND_NE` | `SPLIT_EINSUM` | 116* | 0.50 |
| iPhone 13 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM` | 86* | 0.68 |
| iPhone 14 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM` | 77* | 0.83 |
| iPhone 15 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM` | 31 | 0.85 |
| iPad Pro (M1) | `CPU_AND_NE` | `SPLIT_EINSUM` | 36 | 0.69 |
| iPad Pro (M2) | `CPU_AND_NE` | `SPLIT_EINSUM` | 27 | 0.98 |
<details>
<summary> Details (Click to expand) </summary>
- This benchmark was conducted by Apple and Hugging Face using iOS 17.0.2 and iPadOS 17.0.2 in September 2023.
- The performance data was collected using the `benchmark` branch of the [Diffusers app](https://github.com/huggingface/swift-coreml-diffusers)
- The median latency value across 5 back-to-back end-to-end executions are reported
- The image generation procedure follows this configuration: 20 inference steps, 768x768 output image resolution, 77 text token sequence length, classifier-free guidance (batch size of 2 for unet).
- `Unet.mlmodelc` is compressed to 4.04 bit precision following the [Mixed-Bit Palettization](#compression-lower-than-6-bits) algorithm recipe published [here](https://huggingface.co/apple/coreml-stable-diffusion-mixed-bit-palettization/blob/main/recipes/stabilityai-stable-diffusion-xl-base-1.0_palettization_recipe.json)
- All models except for `Unet.mlmodelc` are compressed to 16 bit precision
- [madebyollin/sdxl-vae-fp16-fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) by [@madebyollin](https://github.com/madebyollin) was used as the source PyTorch model for `VAEDecoder.mlmodelc` in order to enable float16 weight and activation quantization for the VAE model.
- `--attention-implementatio
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
算法部署_在苹果Silicon上使用CoreML部署StableDiffusion扩散模型_附项目源码_优质项目实战.zip (76个子文件)
算法部署_在苹果Silicon上使用CoreML部署StableDiffusion扩散模型_附项目源码_优质项目实战
Package.swift 1KB
setup.py 1KB
assets
mbp
a_high_quality_photo_of_a_surfing_dog.7667.final_4.50-bits.png 1.49MB
a_high_quality_photo_of_a_surfing_dog.7667.final_float16_original.png 1.56MB
stabilityai_stable-diffusion-2-1-base_psnr_vs_size.png 64KB
a_high_quality_photo_of_a_surfing_dog.7667.final_6.55-bits.png 1.52MB
stabilityai_stable-diffusion-xl-base-1.0_psnr_vs_size.png 57KB
a_high_quality_photo_of_a_surfing_dog.7667.final_3.41-bits.png 1.54MB
runwayml_stable-diffusion-v1-5_psnr_vs_size.png 56KB
float16_gpu_readmereel.png 2.16MB
palette6_cpuandne_readmereel.png 2.2MB
a_high_quality_photo_of_an_astronaut_riding_a_horse_in_space
randomSeed_13_computeUnit_CPU_AND_GPU_modelVersion_CompVis_stable-diffusion-v1-4.png 446KB
randomSeed_11_computeUnit_CPU_AND_NE_modelVersion_stabilityai_stable-diffusion-2-base.png 427KB
randomSeed_93_computeUnit_CPU_AND_GPU_modelVersion_runwayml_stable-diffusion-v1-5.png 456KB
randomSeed_13_computeUnit_ALL_modelVersion_CompVis_stable-diffusion-v1-4.png 467KB
randomSeed_93_computeUnit_ALL_modelVersion_runwayml_stable-diffusion-v1-5.png 461KB
randomSeed_93_computeUnit_CPU_AND_NE_modelVersion_runwayml_stable-diffusion-v1-5.png 461KB
randomSeed_13_computeUnit_CPU_AND_NE_modelVersion_CompVis_stable-diffusion-v1-4.png 468KB
randomSeed_11_computeUnit_CPU_AND_GPU_modelVersion_stabilityai_stable-diffusion-2-base.png 423KB
float16_cpuandne_readmereel.png 2.17MB
controlnet_readme_reel.png 1.5MB
readme_reel.png 1.35MB
a_high_quality_photo_of_an_astronaut_riding_a_dragon_in_space
randomSeed_93_computeUnit_ALL_modelVersion_stabilityai_stable-diffusion-2-base.png 507KB
randomSeed_93_computeUnit_CPU_AND_GPU_modelVersion_stabilityai_stable-diffusion-2-base.png 520KB
randomSeed_11_computeUnit_CPU_AND_NE_modelVersion_runwayml_stable-diffusion-v1-5.png 430KB
randomSeed_11_computeUnit_CPU_AND_GPU_modelVersion_runwayml_stable-diffusion-v1-5.png 395KB
randomSeed_123456789_computeUnit_CPU_AND_GPU_modelVersion_CompVis_stable-diffusion-v1-4.png 428KB
randomSeed_93_computeUnit_CPU_AND_NE_modelVersion_stabilityai_stable-diffusion-2-base.png 507KB
randomSeed_123456789_computeUnit_CPU_AND_NE_modelVersion_CompVis_stable-diffusion-v1-4.png 444KB
randomSeed_123456789_computeUnit_ALL_modelVersion_CompVis_stable-diffusion-v1-4.png 444KB
tests
__init__.py 0B
test_stable_diffusion.py 15KB
swift
StableDiffusion
pipeline
SampleTimer.swift 2KB
MultilingualTextEncoder.swift 8KB
Encoder.swift 4KB
ManagedMLModel.swift 2KB
StableDiffusionPipeline+Resources.swift 7KB
ResourceManaging.swift 577B
RandomSource.swift 263B
Scheduler.swift 13KB
Decoder.swift 3KB
TorchRandomSource.swift 5KB
StableDiffusionXLPipeline.swift 15KB
NvRandomSource.swift 3KB
Unet.swift 9KB
SafetyChecker.swift 6KB
NumPyRandomSource.swift 4KB
CGImage+vImage.swift 9KB
DPMSolverMultistepScheduler.swift 12KB
TextEncoderXL.swift 4KB
StableDiffusionXL+Resources.swift 5KB
StableDiffusionPipeline.swift 16KB
StableDiffusionPipeline.Configuration.swift 4KB
TextEncoder.swift 3KB
ControlNet.swift 5KB
tokenizer
BPETokenizer+Reading.swift 2KB
BPETokenizer.swift 6KB
StableDiffusionCLI
main.swift 12KB
StableDiffusionTests
StableDiffusionTests.swift 2KB
Resources
merges.txt 512KB
vocab.json 842KB
requirements.txt 130B
python_coreml_stable_diffusion
mixed_bit_compression_pre_analysis.py 20KB
__init__.py 34B
mixed_bit_compression_apply.py 4KB
coreml_model.py 7KB
_version.py 23B
unet.py 42KB
torch2coreml.py 63KB
layer_norm.py 3KB
controlnet.py 9KB
pipeline.py 32KB
multilingual_projection.py 2KB
attention.py 5KB
chunk_mlprogram.py 14KB
README.md 51KB
共 76 条
- 1
资源评论
__AtYou__
- 粉丝: 1895
- 资源: 648
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功