Copyright © 2013 ARM. All rights reserved.
ARM DEN0018A (ID071613)
NEON
™
Version: 1.0
Programmer’s Guide
ARM DEN0018A Copyright © 2013 ARM. All rights reserved. ii
ID071613 Non-Confidential
NEON
Programmer’s Guide
Copyright © 2013 ARM. All rights reserved.
Release Information
The following changes have been made to this book.
Proprietary Notice
This document is protected by copyright and other related rights and the practice or implementation of the information
contained in this document may be protected by one or more patents or pending patent applications. No part of this
document may be reproduced in any form by any means without the express prior written permission of ARM. No
license, express or implied, by estoppel or otherwise to any intellectual property rights is granted by this document
unless specifically stated.
Your access to the information in this document is conditional upon your acceptance that you will not use or permit
others to use the information for the purposes of determining whether implementations infringe any third party patents.
THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO
WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS
FOR A PARTICULAR PURPOSE WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, ARM makes
no representation with respect to, and has undertaken no analysis to identify or understand the scope and content of,
third party patents, copyrights, trade secrets, or other rights.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES,
INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR
CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY,
ARISING OUT ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY
OF SUCH DAMAGES.
This document may include technical inaccuracies or typographical errors.
This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or
disclosure of this document complies fully with any relevant export laws and regulations to assure that this document
or any portion thereof is not exported, directly or indirectly, in violation of such export laws. Use of the word “partner”
is not intended to create or refer to any partnership relationship with any other company. ARM may make changes to
this document at any time and without notice.
If any of the provisions contained in these terms conflict with any of the provisions of any signed written agreement
covering this document with ARM, then the signed written agreement prevails over and supersedes the conflicting
provisions of these terms.
Words and logos marked with
™
or
®
are registered trademarks or trademarks of ARM Limited or its affiliates in the EU
and/or elsewhere. All rights reserved. Other brands and names mentioned in this document may be the trademarks of
their respective owners. Please follow ARM’s trademark usage guidelines at,
http://www.arm.com/about/trademark-usage-guidelines.php.
Copyright © 2013, ARM Limited or its affiliates. All rights reserved.
ARM Limited. Company 02557590 registered in England.
110 Fulbourn Road, Cambridge, England CB1 9NJ.
LES-PRE-20318 v0.1
Web Address
http://www.arm.com
Change history
Date Issue Confidentiality Change
28 June 2013 A Non-Confidential First release
ARM DEN0018A Copyright © 2013 ARM. All rights reserved. iii
ID071613 Non-Confidential
Contents
NEON Programmer’s Guide
Preface
References ................................................................................................................ vii
Typographical conventions ....................................................................................... viii
Feedback on this book ................................................................................................ ix
Glossary ....................................................................................................................... x
Chapter 1 Introduction
1.1 Data processing technologies .................................................................................. 1-2
1.2 Comparison between ARM NEON technology and other implementations ............. 1-4
1.3 Architecture support for NEON technology .............................................................. 1-7
1.4 Fundamentals of NEON technology ...................................................................... 1-10
Chapter 2 Compiling NEON Instructions
2.1 Vectorization ............................................................................................................ 2-2
2.2 Generating NEON code using the vectorizing compiler .......................................... 2-9
2.3 Vectorizing examples ............................................................................................. 2-11
2.4 NEON assembler and ABI restrictions ................................................................... 2-17
2.5 NEON libraries ....................................................................................................... 2-19
2.6 Intrinsics ................................................................................................................. 2-20
2.7 Detecting presence of a NEON unit ....................................................................... 2-21
2.8 Writing code to imply SIMD ................................................................................... 2-22
2.9 GCC command line options ................................................................................... 2-24
Chapter 3 NEON Instruction Set Architecture
3.1 Introduction to the NEON instruction syntax ............................................................ 3-2
3.2 Instruction syntax ..................................................................................................... 3-4
3.3 Specifying data types ............................................................................................... 3-8
3.4 Packing and unpacking data .................................................................................... 3-9
3.5 Alignment ............................................................................................................... 3-10
ARM DEN0018A Copyright © 2013 ARM. All rights reserved. iv
ID071613 Non-Confidential
3.6 Saturation arithmetic .............................................................................................. 3-11
3.7 Floating-point operations ....................................................................................... 3-12
3.8 Flush-to-zero mode ................................................................................................ 3-13
3.9 Shift operations ...................................................................................................... 3-14
3.10 Polynomials ........................................................................................................... 3-17
3.11 Instructions to permute vectors .............................................................................. 3-19
Chapter 4 NEON Intrinsics
4.1 Introduction .............................................................................................................. 4-2
4.2 Vector data types for NEON intrinsics ..................................................................... 4-3
4.3 Prototype of NEON Intrinsics ................................................................................... 4-5
4.4 Using NEON intrinsics ............................................................................................. 4-6
4.5 Variables and constants in NEON code .................................................................. 4-8
4.6 Accessing vector types from C ................................................................................ 4-9
4.7 Loading data from memory into vectors ................................................................ 4-10
4.8 Constructing a vector from a literal bit pattern ....................................................... 4-11
4.9 Constructing multiple vectors from interleaved memory ........................................ 4-12
4.10 Loading a single lane of a vector from memory ..................................................... 4-13
4.11 Programming using NEON intrinsics ..................................................................... 4-14
4.12 Instructions without an equivalent intrinsic ............................................................ 4-16
Chapter 5 Optimizing NEON Code
5.1 Optimizing NEON assembler code .......................................................................... 5-2
5.2 Scheduling ............................................................................................................... 5-4
Chapter 6 NEON Code Examples with Intrinsics
6.1 Swapping color channels ......................................................................................... 6-2
6.2 Handling non-multiple array lengths ........................................................................ 6-8
Chapter 7 NEON Code Examples with Mixed Operations
7.1 Matrix multiplication ................................................................................................. 7-2
7.2 Cross product .......................................................................................................... 7-6
Chapter 8 NEON Code Examples with Optimization
8.1 Converting color depth ............................................................................................. 8-2
8.2 Median filter ............................................................................................................. 8-5
8.3 FIR filter ................................................................................................................. 8-21
Appendix A NEON Microarchitecture
A.1 The Cortex-A5 processor ......................................................................................... A-2
A.2 The Cortex-A7 processor ......................................................................................... A-4
A.3 The Cortex-A8 processor ......................................................................................... A-5
A.4 The Cortex-A9 processor ......................................................................................... A-9
A.5 The Cortex-A15 processor ..................................................................................... A-11
Appendix B Operating System Support
B.1 FPSCR, the floating-point status and control register .............................................. B-2
B.2 FPEXC, the floating-point exception register ........................................................... B-4
B.3 FPSID, the floating-point system ID register ............................................................ B-5
B.4 MVFR0/1 Media and VFP Feature Registers .......................................................... B-6
Appendix C NEON and VFP Instruction Summary
C.1 List of all NEON and VFP instructions ..................................................................... C-2
C.2 List of doubling instructions ..................................................................................... C-7
C.3 List of halving instructions ........................................................................................ C-8
C.4 List of widening or long instructions ......................................................................... C-9
C.5 List of narrowing instructions ................................................................................. C-10
C.6 List of rounding instructions ................................................................................... C-11
ARM DEN0018A Copyright © 2013 ARM. All rights reserved. v
ID071613 Non-Confidential
C.7 List of saturating instructions ................................................................................. C-12
C.8 NEON general data processing instructions .......................................................... C-14
C.9 NEON shift instructions .......................................................................................... C-25
C.10 NEON logical and compare operations ................................................................. C-31
C.11 NEON arithmetic instructions ................................................................................. C-41
C.12 NEON multiply instructions .................................................................................... C-55
C.13 NEON load and store instructions ......................................................................... C-60
C.14 VFP instructions ..................................................................................................... C-67
C.15 NEON and VFP pseudo-instructions ..................................................................... C-73
Appendix D NEON Intrinsics Reference
D.1 NEON intrinsics description ..................................................................................... D-2
D.2 Intrinsics type conversion ........................................................................................ D-3
D.3 Arithmetic ................................................................................................................. D-8
D.4 Multiply ................................................................................................................... D-24
D.5 Data processing ..................................................................................................... D-50
D.6 Logical and compare ............................................................................................. D-74
D.7 Shift ........................................................................................................................ D-93
D.8 Floating-point ....................................................................................................... D-114
D.9 Load and store ..................................................................................................... D-120
D.10 Permutation ......................................................................................................... D-151
D.11 Miscellaneous ...................................................................................................... D-166
评论4
最新资源