PFFFT: a pretty fast FFT.
TL;DR
--
PFFFT does 1D Fast Fourier Transforms, of single precision real and
complex vectors. It tries do it fast, it tries to be correct, and it
tries to be small. Computations do take advantage of SSE1 instructions
on x86 cpus, Altivec on powerpc cpus, and NEON on ARM cpus. The
license is BSD-like.
Why does it exist:
--
I was in search of a good performing FFT library , preferably very
small and with a very liberal license.
When one says "fft library", FFTW ("Fastest Fourier Transform in the
West") is probably the first name that comes to mind -- I guess that
99% of open-source projects that need a FFT do use FFTW, and are happy
with it. However, it is quite a large library , which does everything
fft related (2d transforms, 3d transforms, other transformations such
as discrete cosine , or fast hartley). And it is licensed under the
GNU GPL , which means that it cannot be used in non open-source
products.
An alternative to FFTW that is really small, is the venerable FFTPACK
v4, which is available on NETLIB. A more recent version (v5) exists,
but it is larger as it deals with multi-dimensional transforms. This
is a library that is written in FORTRAN 77, a language that is now
considered as a bit antiquated by many. FFTPACKv4 was written in 1985,
by Dr Paul Swarztrauber of NCAR, more than 25 years ago ! And despite
its age, benchmarks show it that it still a very good performing FFT
library, see for example the 1d single precision benchmarks here:
http://www.fftw.org/speed/opteron-2.2GHz-32bit/ . It is however not
competitive with the fastest ones, such as FFTW, Intel MKL, AMD ACML,
Apple vDSP. The reason for that is that those libraries do take
advantage of the SSE SIMD instructions available on Intel CPUs,
available since the days of the Pentium III. These instructions deal
with small vectors of 4 floats at a time, instead of a single float
for a traditionnal FPU, so when using these instructions one may expect
a 4-fold performance improvement.
The idea was to take this fortran fftpack v4 code, translate to C,
modify it to deal with those SSE instructions, and check that the
final performance is not completely ridiculous when compared to other
SIMD FFT libraries. Translation to C was performed with f2c (
http://www.netlib.org/f2c/ ). The resulting file was a bit edited in
order to remove the thousands of gotos that were introduced by
f2c. You will find the fftpack.h and fftpack.c sources in the
repository, this a complete translation of
http://www.netlib.org/fftpack/ , with the discrete cosine transform
and the test program. There is no license information in the netlib
repository, but it was confirmed to me by the fftpack v5 curators that
the same terms do apply to fftpack v4:
http://www.cisl.ucar.edu/css/software/fftpack5/ftpk.html . This is a
"BSD-like" license, it is compatible with proprietary projects.
Adapting fftpack to deal with the SIMD 4-element vectors instead of
scalar single precision numbers was more complex than I originally
thought, especially with the real transforms, and I ended up writing
more code than I planned..
The code:
--
Only two files, in good old C, pffft.c and pffft.h . The API is very
very simple, just make sure that you read the comments in pffft.h.
Comparison with other FFTs:
--
The idea was not to break speed records, but to get a decently fast
fft that is at least 50% as fast as the fastest FFT -- especially on
slowest computers . I'm more focused on getting the best performance
on slow cpus (Atom, Intel Core 1, old Athlons, ARM Cortex-A9...), than
on getting top performance on today fastest cpus.
It can be used in a real-time context as the fft functions do not
perform any memory allocation -- that is why they accept a 'work'
array in their arguments.
It is also a bit focused on performing 1D convolutions, that is why it
provides "unordered" FFTs , and a fourier domain convolution
operation.
Benchmark results (cpu tested: core i7 2600, core 2 quad, core 1 duo, atom N270, cortex-A9)
--
The benchmark shows the performance of various fft implementations measured in
MFlops, with the number of floating point operations being defined as 5Nlog2(N)
for a length N complex fft, and 2.5*Nlog2(N) for a real fft.
See http://www.fftw.org/speed/method.html for an explanation of these formulas.
MacOS Lion, gcc 4.2, 64-bit, fftw 3.3 on a 3.4 GHz core i7 2600
Built with:
gcc-4.2 -o test_pffft -arch x86_64 -O3 -Wall -W pffft.c test_pffft.c fftpack.c -L/usr/local/lib -I/usr/local/include/ -DHAVE_VECLIB -framework veclib -DHAVE_FFTW -lfftw3f
| input len |real FFTPack| real vDSP | real FFTW | real PFFFT | |cplx FFTPack| cplx vDSP | cplx FFTW | cplx PFFFT |
|-----------+------------+------------+------------+------------| |------------+------------+------------+------------|
| 64 | 2816 | 8596 | 7329 | 8187 | | 2887 | 14898 | 14668 | 11108 |
| 96 | 3298 | n/a | 8378 | 7727 | | 3953 | n/a | 15680 | 10878 |
| 128 | 3507 | 11575 | 9266 | 10108 | | 4233 | 17598 | 16427 | 12000 |
| 160 | 3391 | n/a | 9838 | 10711 | | 4220 | n/a | 16653 | 11187 |
| 192 | 3919 | n/a | 9868 | 10956 | | 4297 | n/a | 15770 | 12540 |
| 256 | 4283 | 13179 | 10694 | 13128 | | 4545 | 19550 | 16350 | 13822 |
| 384 | 3136 | n/a | 10810 | 12061 | | 3600 | n/a | 16103 | 13240 |
| 480 | 3477 | n/a | 10632 | 12074 | | 3536 | n/a | 11630 | 12522 |
| 512 | 3783 | 15141 | 11267 | 13838 | | 3649 | 20002 | 16560 | 13580 |
| 640 | 3639 | n/a | 11164 | 13946 | | 3695 | n/a | 15416 | 13890 |
| 768 | 3800 | n/a | 11245 | 13495 | | 3590 | n/a | 15802 | 14552 |
| 800 | 3440 | n/a | 10499 | 13301 | | 3659 | n/a | 12056 | 13268 |
| 1024 | 3924 | 15605 | 11450 | 15339 | | 3769 | 20963 | 13941 | 15467 |
| 2048 | 4518 | 16195 | 11551 | 15532 | | 4258 | 20413 | 13723 | 15042 |
| 2400 | 4294 | n/a | 10685 | 13078 | | 4093 | n/a | 12777 | 13119 |
| 4096 | 4750 | 16596 | 11672 | 15817 | | 4157 | 19662 | 14316 | 14336 |
| 8192 | 3820 | 16227 | 11084 | 12555 | | 3691 | 18132 | 12102 | 13813 |
| 9216 | 3864 | n/a | 10254 | 12870 | | 3586 | n/a | 12119 | 13994 |
| 16384 | 3822 | 15123 | 10454 | 12822 | | 3613 | 16874 | 12370 | 13881 |
| 32768 | 4175 | 14512 | 10662 | 11095 | | 3881 | 14702 | 11619 | 11524 |
| 262144 | 3317 | 11429 | 6269 | 9517 | | 2810 | 11729 | 7757 | 10179 |
| 1048576 | 2913 | 10551 | 4730 | 5867 | | 2661 | 7881 | 3520 | 5350 |
|-----------+------------+------------+------------+------------| |------------+------------+------------+------------|
Debian 6, gcc 4.4.5, 64-bit, fftw 3.3.1 on a 3.4 GHz core i7 2600
Built with:
gcc -o test_pffft -DHAVE_FFTW -msse2 -O3 -Wall -W pffft.c test_pffft.c fftpack.c -L$HOME/local/lib -I$HOME/local/include/ -lfftw3f -lm
| N (input length) | real FFTPack | real FFTW |
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
基于WebRTC的音视频SDK中的libwebrtc库(c++) 视频推流,拉流。 CMakeLists cmake_minimum_required (VERSION 3.5) project(libwebrtc) # 静态库输出目录 set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib) # 动态库输出目录 set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib) # 可执行文件输出目录 set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin) set(CMAKE_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX}/${CMAKE_SYSTEM_NAME}) include_directories( "${PROJECT_SOURCE_DIR}/src" "${PROJECT_SOURCE_DIR}/third_party/abseil-cpp"
资源推荐
资源详情
资源评论
收起资源包目录
基于WebRTC的时音视频SDK中的 libwebrtc 库 (2000个子文件)
tool_hugehelp.c 607KB
encoder.c 186KB
analyse.c 165KB
mc-c.c 153KB
openssl.c 144KB
http.c 141KB
ftp.c 136KB
url.c 134KB
sectransp.c 129KB
ratecontrol.c 123KB
checkasm.c 122KB
libssh2.c 121KB
tables.c 115KB
multi.c 114KB
opus_encoder.c 96KB
setopt.c 94KB
x264.c 93KB
tool_operate.c 90KB
libssh.c 89KB
celt_encoder.c 86KB
schannel.c 84KB
slicetype.c 84KB
lib557.c 83KB
macroblock.c 82KB
deblock-c.c 80KB
tool_getparam.c 80KB
http2.c 79KB
nss.c 76KB
sws.c 72KB
test_opus_api.c 71KB
pixel.c 66KB
transfer.c 65KB
NSQ_del_dec_neon_intr.c 65KB
opus_encode_regressions.c 63KB
imap.c 62KB
smtp.c 59KB
macroblock.c 58KB
pixel.c 58KB
ngtcp2.c 57KB
mime.c 57KB
pixel-c.c 56KB
base.c 55KB
mc-c.c 55KB
me.c 55KB
mc.c 53KB
connect.c 53KB
gtls.c 52KB
bands.c 52KB
cookie.c 50KB
rdo.c 49KB
urlapi.c 47KB
NSQ_del_dec_sse4_1.c 46KB
cabac.c 46KB
pop3.c 46KB
sockfilt.c 46KB
telnet.c 44KB
celt_decoder.c 43KB
tftp.c 42KB
lib1560.c 41KB
gskit.c 41KB
rtspd.c 41KB
vtls.c 41KB
NSQ_del_dec.c 39KB
opus_multistream_encoder.c 39KB
hostip.c 39KB
wolfssl.c 39KB
mbedtls.c 38KB
x509asn1.c 38KB
slicetype-cl.c 38KB
analysis.c 36KB
dct.c 36KB
tftpd.c 36KB
set.c 35KB
deblock.c 35KB
opus_demo.c 35KB
dct.c 35KB
wolfssh.c 35KB
easy.c 35KB
http_proxy.c 35KB
pitch_analysis_core_FIX.c 34KB
ccsidcurl.c 34KB
c-hyper.c 34KB
NSQ_sse4_1.c 34KB
socks.c 33KB
os400sys.c 33KB
frame.c 33KB
jquant1.c 32KB
predict.c 32KB
content_encoding.c 31KB
ftplistparser.c 31KB
opus_decoder.c 31KB
enc_API.c 31KB
mprintf.c 31KB
digest.c 31KB
getopt.c 31KB
formdata.c 30KB
smb.c 30KB
socksd.c 30KB
ldap.c 30KB
test_opus_encode.c 29KB
共 2000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 20
资源评论
_无往而不胜_
- 粉丝: 2w+
- 资源: 37
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功