没有合适的资源?快使用搜索试试~ 我知道了~
Linux线程库: NPTL
4星 · 超过85%的资源 需积分: 50 98 下载量 27 浏览量
2008-11-15
00:33:54
上传
评论 2
收藏 143KB PDF 举报
温馨提示
试读
11页
Linux线程库: NPTLLinux线程库: NPTLLinux线程库: NPTLLinux线程库: NPTLLinux线程库: NPTL
资源推荐
资源详情
资源评论
An analysis of Next Generation Threads on IA64
Ian Wienand
September 9, 2003
Abstract
For a long time Linux threading support has been solely via the LinuxThreads library. This library is now
showing its age and has been often criticised for its lacklustre performance and lack of POSIX compliance. High
performance threads are important to the success of the IA64 architecture as many of the CPU and memory
intensive applications it is targeted at use threads extensively. Recently, new efforts such as IBM’s NGPT and
Ulrich Drepper’s NPTL have sought to bring high performance POSIX threads to Linux. We compare and con-
trast the implementation of the old and new libraries and benchmark their performance on Itanium, Pentium and
PowerPC based hardware. Our results show that the next generation libraries drastically improve performance
of key measures.
This work is supported by UNSW and HP through the Gelato Federation.
http://www.gelato.org
http://www.gelato.unsw.edu.au
1 Introduction
1.1 The IA64 Architecture
The IA64 architecture is the result of a collaboration between HP and Intel to produce a next generation of 64 bit
processors. The IA64 architecture combines traditional design principles from RISC, CISC and VLIW designs
into a unique package designed to overcome the limitations of these architectures and to scale into the future.
Intel’s latest incarnation of the IA64 architecture is called the Itanium2.
The Linux port to the IA64 architecture has been actively developed since 1998 [10] and is currently considered
to be stable for production use.
1.2 Why is threading important?
Since the standardisation of POSIX threads (Pthreads) [8] many applications have been developed to take advan-
tage of the parallelism afforded by threads. For example, the latest release of the Apache web server, Apache 2,
uses POSIX threads to increase scalability [1], Java Virtual Machines make extensive use of threads and widely
deployed Open Source databases MySQL and PostgreSQL use POSIX threads.
It should be noted Pthreads abstractions are best applied to a uniprocessor or SMP based system; clusters
generally use more appropriate libraries such as OpenMP and MPI.
2 Overview of Threading Libraries
Multithreading is the ability of an operating system to support multiple threads of execution within a single process
[16].
There are a number of ways to organise a threading library (also see Table 1)
1. Kernel or 1:1 threads refer to a model where the kernel is aware of each thread within a process and
participates in its life–cycle (creation, scheduling, removal). This requires support from the kernel as it must
understand the relationship between processes and threads. However, it requires minimal library support.
1
Threading Model Kernel Support Library Support Sample Implementations
Kernel Extensive Minimal LinuxThreads, NPTL
Userspace None Extensive GNU Pth
Hybrid Some Some Solaris ; NGPT
Table 1: A comparison of threading models
2. Userspace threads refer to a model where the kernel only sees one process and a userspace library provides
the support for threads within that process. This requires no special kernel support but extensive library
support.
3. Hybrid or M:N threads refer to a model somewhere in between kernel and userspace threads where user
space threads may map either to a kernel thread or be managed by the userspace library. This requires both
kernel and library support.
Before the Linux kernel fully supported multiple threads there were several userspace libraries available, how-
ever, these were generally short lived with the introduction of LinuxThreads.
2.1 LinuxThreads
LinuxThreads was originally written by Xavier Leroy and released in 1996, around the time the 2.0 kernel was
released. It has remained the dominant Linux thread library for around 8 years and probably has a fair bit of
life left in it yet. LinuxThreads unfortunately deviates from the Pthreads standard in some respects and has some
serious architectural flaws that fundamentally limit its performance.
The Linux Kernel provides a single interface for creating process and threads; the
clone()
system call.
clone()
was probably first suggested in the Plan9 Operating System [14] and is closely related to the IRIX
sproc()
call. As opposed to providing two unique interfaces for copying a process and for creating a thread, it
was realised that
fork()
is simply a special case of thread creation where more of the process state is copied. By
passing a series of flags to
clone()
, varying levels of process state can be copied (obviously, the major difference
when creating a thread is that when a process
fork()s
it receives a new address space, whilst a thread does not).
The
clone()
interface provided by Linux obviously betrays its creator’s desired threading model and indeed
LinuxThreads is a 1:1 implementation.
LinuxThreads has a number of architectural limitations that hamper its performance [4] [9] :
• Signals : LinuxThreads signal infrastructure was initially hampered by a lack of kernel support and con-
sequently deviated from POSIX standards. In brief, POSIX states that any signal sent to a process can be
handled by any of its threads that does not have the signal blocked. Since using
clone()
makes each thread
a unique process as far as the Linux kernel is concerned, if a thread that receives a signal has the signal
blocked it will queue for that thread. This also causes problems with signals that are required to stop the
entire process such as
SIGKILL
or
SIGINT
, which require special kernel support.
LinuxThreads also uses signals to implement some parts of thread synchronisation, which leads to problems
with latency and complicates signal handling even further.
• Limited number of threads : The number of active threads is maintained in an array of limited size. By
default this limits you to 1024 threads. Over the years of development the requirement to search this list
was reduced with the implementation of thread registers
1
but needless the limitation remained.
This limit was reasonable on older kernels, as the scheduler would not have dealt with this many threads
reasonably. Further, the
/proc
interface was not designed to deal with such high numbers of threads and
the fact that the kernel associated a unique PID with each thread meant
/proc
(and associated tools such as
top
) would became almost unusable. The fact that
getpid()
returned a unique PID for each thread also
did not correlate with other POSIX implementations.
1
A thread register is a processor register reserved to point to the current thread. This register is updated by the kernel on context switch
and allows a thread to always find out information about its self quickly and easily. This allows thread local storage (TLS) [3], the most useful
application of which is the
thread
attribute for variables which allocates a variable privately for each thread.
2
• Manager Thread : Thread creation and termination require the intervention of a manager thread, which
does things like allocate stack for the thread and clean up on termination. When spawning many threads
this design is an obvious bottleneck.
The manager thread also shows up in debugging sessions and if somehow killed leaves the process in a
state that requires manual cleanup. It also causes problems with process accounting, for example the
time
application will not return correct values for multithreaded programs.
LinuxThreads has undergone much development over its life span and provides a reasonable implementation
despite its limitations.
2.2 Native POSIX Threading Library
The Native POSIX Threading Library (NPTL) is the next generation of POSIX threading for Linux. It has been
made possible by significant kernel support developed over the life of the 2.5 development series, and provides
significant performance increases across the board. Development was announced in September 2002; the first
distribution to include support was Redhat 9 in early April 2003.
Whereas LinuxThreads was forced to work around a lack of kernel support, the clear requirement for high
performance POSIX threads had its effect on kernel developers and significant support has been provided in the
2.5 development series. Below we discuss the most important of these changes and how they integrate with NPTL.
2.2.1 Futexes
Futexes were introduced by Rusty Russell into the 2.5.7 series kernel and have become an integral part of many
applications. Full details about the implementation of futexes can be found in [7].
Just as we can categorise threading models via the level of kernel involvement, we can frame synchronisa-
tion primitives the same way. Traditional System V IPC synchronisation techniques such as
semaphores
and
msgqueues
are implemented completely in-kernel and always require a system call when modified.
Pure userspace locking can be provided on an ad-hoc basis with some combination of shared memory, atomic
operations and process control but fails as a generic solution. Whilst the actual locking may avoid system calls, as
the kernel does not explicitly know about waiting threads it can not make optimal scheduling decisions. Futexes
aim to provide the best of both worlds — a standardised interface with the best case not requiring kernel inter-
vention and an efficient waiting mechanism when required. Futexes require user level atomic operations, however
these are well supported on most modern architectures.
The actual interface to the futex operations are quite straight forward. The futex system call is prototyped
long sys futex(u32 *uaddr, int op, int val, struct timespec *utime, u32 *uaddr2)
uaddr
is the userspace address that is being used to hold the futex value.
op
and the other arguments vary as
below
• FUTEX WAIT : puts the current processes on the wait queue for this futex. First
val
is check to make sure
it is the same as the value in
uaddr
and assuming it is, the process is then queued on the futex. The optional
utime
argument gives a timeout (so timed waits can be implemented).
• FUTEX WAKE : wakes up
val
number of waiters.
• FUTEX REQUEUE (since 2.5.70) : will requeue threads waiting on
uaddr
to
uaddr2
.
val
takes the number
of processes to wake up, whilst
utime
is overloaded to be the number of waiters to move between the
queues. Requeuing allows you to avoid swarming; imagine having two locks
a
and
b
, where there are
n
waiters on
a
. Once
a
is unlocked, all
n
swarm trying to get lock
b
, however only one will get it. The other
n-1
waiters will immediately go onto the wait queue of the second lock.
To fully understand the process, we can look at the locking primitives used to implement mutex’s in NPTL.
A process locking a mutex through the POSIX standard
pthread mutex lock()
interface will end up executing
something like Algorithm 1. Note that in the uncontested case there is no need for a system call or even a context
switch. In the contested case, we wait on the futex and when woken, test if we have the lock (n.b, the real code
obviously makes sure the appropriate parts are atomic).
3
剩余10页未读,继续阅读
资源评论
- Leslie-M2012-07-31英文的, 11页,
- lichunqiang112013-08-09资料不错,长见识了,谢谢!
realdragon2
- 粉丝: 5
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功