ARMNEON使用手册_armneon资源-CSDN文库

5星 · 超过95%的资源需积分: 50 21 浏览量 2015-09-17 09:54:35 上传评论 1 收藏 114KB PDF 举报

资源推荐

资源详情

资源评论

- 1 -

NEON support in the RealView compiler

William Munns

18 June 2007

Introduction

This paper provides a simple introduction to the NEON

TM

Vector-SIMD architecture. It

continues by looking at the compiler support for SIMD, both through automatic recognition

and through the use of intrinsic functions.

NEON is a hybrid 64/128 bit SIMD architecture extension to the ARM v7-A profile,

targeted at multimedia applications. Positioning NEON within the processor allows it to

share the CPU resources for integer operation, loop control, and caching, significantly

reducing the area and power cost compared with a CPU plus hardware accelerator

combination. SIMD (Single Instruction Multiple Data) is where one instruction acts on

multiple data items, usually carrying out the same operation for all data.

The use of NEON instead of a CPU plus hardware accelerator combination allows savings

to be made in software development time as it creates a much simpler programming model

without forcing the programmer to search for ad-hoc concurrency and scheduling points.

On the ARM Cortex™-A8 the NEON unit is positioned in the pipeline so that loads can

come directly from the L2 cache. This means that a much larger dataset can be held in the

cache than would be allowed when executing ARM or Thumb

®

-2 code.

The NEON instruction set was designed to be an easy target for a compiler, including low

cost promotion/demotion and structure loads capable of accessing data from their natural

locations rather than forcing alignment to the vector size.

The RealView Development Tools

®

Suite version 3.1 supports NEON both in the standard

release using intrinsic functions and assembler, as well as through the vectorizing compiler

add-on which can recognise code sequences and automatically generate SIMD code. The

vectorizing compiler greatly reduces porting time, as well as reducing the requirement for

deep architectural knowledge.

© 2007 ARM Limited. All Rights Reserved.

ARM and RealView logo are registered

trademarks of ARM Ltd. All other trademarks

are the property of their respective owners and

are acknowledged

- 5 -

Writing NEON code using the standard RealView compiler

The standard tools shipped with RealView Development Suite 3.1 have support for NEON

directly in the assembler and embedded assembler. The compiler also provides NEON

support using pseudo functions called intrinsics. Intrinsic functions compile into one or

more NEON instructions which are inserted at the call site. There is at least one intrinsic

for each NEON instruction, with multiple intrinsic functions where needed for signed and

unsigned types.

Using intrinsics, rather than programming in assembly language directly, allows the

compiler to schedule registers, as well as giving the programmer easy access to C variables

and arrays.

Using vector registers directly from assembler could lead to programming errors such as a

64 bit vector containing data of 8 bits wide is operated upon by a 16 bit adder. These kind

of faults can be very difficult to track down as only particular corner cases will trigger an

erroneous condition. In the previous addition example, the output will only differ if one of

the data items overflows into another. Using intrinsics is type-safe and will not allow

accidental mixing of signed/unsigned or differing width data.

Accessing vector types from C

The header file arm_neon.h is required to use the intrinsics and defines C style types for

vector operations. The C types are written in the form :

uint8x16_t Unsigned integers, 8 bits, vector of 16 items - 128 bit “Q” register

int16x4_t Signed integers, 16 bits, vector of four items - 64 bit "D" register

As there is a basic incompatibility between scalar (ARM) and vector (NEON) types it is

impossible to assign a scalar to a vector, even if they have the same bit length. Scalar

values and pointers can only be used with NEON instructions that use scalars directly.

Example: Extract an unsigned 32 bit integer from lane 0 of a NEON vector

result = vget_lane_u32(vec64a, 0)

Vector types are not operable using standard C operators except for assignment, so the

appropriate VADD should be used rather than the operator “+”.

Where there are vector types which differ only in number of elements (uint32x2_t,

uint32x4_t) there are specific instructions to ‘assign’ the top or bottom vector elements of a

128 bit value to a 64 bit value and vice-versa. This operation does not use any code space

as long as the registers can be scheduled as aliases.

Example

: Use the bottom 64 bits of a 128 bit register

vec64 = vget_low_u32(vec128);

剩余20页未读，继续阅读

内容反馈

异次元空间1994

2018-02-04

好用，我借助这个成功写了neon汇编优化画图

肖老板

粉丝: 160
资源: 30

最新资源

资源上传下载、课程学习等过程中有任何疑问或建议，欢迎提出宝贵意见哦~我们会及时处理！点击此处反馈

feedback-tip