Linux线程库:NPTL资源-CSDN文库

NPTL

4星 · 超过85%的资源需积分: 50 27 浏览量 2008-11-15 00:33:54 上传评论 2 收藏 143KB PDF 举报

资源推荐

资源详情

资源评论

An analysis of Next Generation Threads on IA64

Ian Wienand

September 9, 2003

Abstract

For a long time Linux threading support has been solely via the LinuxThreads library. This library is now

showing its age and has been often criticised for its lacklustre performance and lack of POSIX compliance. High

performance threads are important to the success of the IA64 architecture as many of the CPU and memory

intensive applications it is targeted at use threads extensively. Recently, new efforts such as IBM’s NGPT and

Ulrich Drepper’s NPTL have sought to bring high performance POSIX threads to Linux. We compare and con-

trast the implementation of the old and new libraries and benchmark their performance on Itanium, Pentium and

PowerPC based hardware. Our results show that the next generation libraries drastically improve performance

of key measures.

This work is supported by UNSW and HP through the Gelato Federation.

http://www.gelato.org

http://www.gelato.unsw.edu.au

1 Introduction

1.1 The IA64 Architecture

The IA64 architecture is the result of a collaboration between HP and Intel to produce a next generation of 64 bit

processors. The IA64 architecture combines traditional design principles from RISC, CISC and VLIW designs

into a unique package designed to overcome the limitations of these architectures and to scale into the future.

Intel’s latest incarnation of the IA64 architecture is called the Itanium2.

The Linux port to the IA64 architecture has been actively developed since 1998 [10] and is currently considered

to be stable for production use.

1.2 Why is threading important?

Since the standardisation of POSIX threads (Pthreads) [8] many applications have been developed to take advan-

tage of the parallelism afforded by threads. For example, the latest release of the Apache web server, Apache 2,

uses POSIX threads to increase scalability [1], Java Virtual Machines make extensive use of threads and widely

deployed Open Source databases MySQL and PostgreSQL use POSIX threads.

It should be noted Pthreads abstractions are best applied to a uniprocessor or SMP based system; clusters

generally use more appropriate libraries such as OpenMP and MPI.

2 Overview of Threading Libraries

Multithreading is the ability of an operating system to support multiple threads of execution within a single process

[16].

There are a number of ways to organise a threading library (also see Table 1)

1. Kernel or 1:1 threads refer to a model where the kernel is aware of each thread within a process and

participates in its life–cycle (creation, scheduling, removal). This requires support from the kernel as it must

understand the relationship between processes and threads. However, it requires minimal library support.

Threading Model Kernel Support Library Support Sample Implementations

Kernel Extensive Minimal LinuxThreads, NPTL

Userspace None Extensive GNU Pth

Hybrid Some Some Solaris ; NGPT

Table 1: A comparison of threading models

2. Userspace threads refer to a model where the kernel only sees one process and a userspace library provides

the support for threads within that process. This requires no special kernel support but extensive library

support.

3. Hybrid or M:N threads refer to a model somewhere in between kernel and userspace threads where user

space threads may map either to a kernel thread or be managed by the userspace library. This requires both

kernel and library support.

Before the Linux kernel fully supported multiple threads there were several userspace libraries available, how-

ever, these were generally short lived with the introduction of LinuxThreads.

2.1 LinuxThreads

LinuxThreads was originally written by Xavier Leroy and released in 1996, around the time the 2.0 kernel was

released. It has remained the dominant Linux thread library for around 8 years and probably has a fair bit of

life left in it yet. LinuxThreads unfortunately deviates from the Pthreads standard in some respects and has some

serious architectural ﬂaws that fundamentally limit its performance.

The Linux Kernel provides a single interface for creating process and threads; the

clone()

system call.

clone()

was probably ﬁrst suggested in the Plan9 Operating System [14] and is closely related to the IRIX

sproc()

call. As opposed to providing two unique interfaces for copying a process and for creating a thread, it

was realised that

fork()

is simply a special case of thread creation where more of the process state is copied. By

passing a series of ﬂags to

clone()

, varying levels of process state can be copied (obviously, the major difference

when creating a thread is that when a process

fork()s

it receives a new address space, whilst a thread does not).

The

clone()

interface provided by Linux obviously betrays its creator’s desired threading model and indeed

LinuxThreads is a 1:1 implementation.

LinuxThreads has a number of architectural limitations that hamper its performance [4] [9] :

• Signals : LinuxThreads signal infrastructure was initially hampered by a lack of kernel support and con-

sequently deviated from POSIX standards. In brief, POSIX states that any signal sent to a process can be

handled by any of its threads that does not have the signal blocked. Since using

clone()

makes each thread

a unique process as far as the Linux kernel is concerned, if a thread that receives a signal has the signal

blocked it will queue for that thread. This also causes problems with signals that are required to stop the

entire process such as

SIGKILL

SIGINT

, which require special kernel support.

LinuxThreads also uses signals to implement some parts of thread synchronisation, which leads to problems

with latency and complicates signal handling even further.

• Limited number of threads : The number of active threads is maintained in an array of limited size. By

default this limits you to 1024 threads. Over the years of development the requirement to search this list

was reduced with the implementation of thread registers

but needless the limitation remained.

This limit was reasonable on older kernels, as the scheduler would not have dealt with this many threads

reasonably. Further, the

/proc

interface was not designed to deal with such high numbers of threads and

the fact that the kernel associated a unique PID with each thread meant

/proc

(and associated tools such as

top

) would became almost unusable. The fact that

getpid()

returned a unique PID for each thread also

did not correlate with other POSIX implementations.

A thread register is a processor register reserved to point to the current thread. This register is updated by the kernel on context switch

and allows a thread to always ﬁnd out information about its self quickly and easily. This allows thread local storage (TLS) [3], the most useful

application of which is the

thread

attribute for variables which allocates a variable privately for each thread.

• Manager Thread : Thread creation and termination require the intervention of a manager thread, which

does things like allocate stack for the thread and clean up on termination. When spawning many threads

this design is an obvious bottleneck.

The manager thread also shows up in debugging sessions and if somehow killed leaves the process in a

state that requires manual cleanup. It also causes problems with process accounting, for example the

time

application will not return correct values for multithreaded programs.

LinuxThreads has undergone much development over its life span and provides a reasonable implementation

despite its limitations.

2.2 Native POSIX Threading Library

The Native POSIX Threading Library (NPTL) is the next generation of POSIX threading for Linux. It has been

made possible by signiﬁcant kernel support developed over the life of the 2.5 development series, and provides

signiﬁcant performance increases across the board. Development was announced in September 2002; the ﬁrst

distribution to include support was Redhat 9 in early April 2003.

Whereas LinuxThreads was forced to work around a lack of kernel support, the clear requirement for high

performance POSIX threads had its effect on kernel developers and signiﬁcant support has been provided in the

2.5 development series. Below we discuss the most important of these changes and how they integrate with NPTL.

2.2.1 Futexes

Futexes were introduced by Rusty Russell into the 2.5.7 series kernel and have become an integral part of many

applications. Full details about the implementation of futexes can be found in [7].

Just as we can categorise threading models via the level of kernel involvement, we can frame synchronisa-

tion primitives the same way. Traditional System V IPC synchronisation techniques such as

semaphores

and

msgqueues

are implemented completely in-kernel and always require a system call when modiﬁed.

Pure userspace locking can be provided on an ad-hoc basis with some combination of shared memory, atomic

operations and process control but fails as a generic solution. Whilst the actual locking may avoid system calls, as

the kernel does not explicitly know about waiting threads it can not make optimal scheduling decisions. Futexes

aim to provide the best of both worlds — a standardised interface with the best case not requiring kernel inter-

vention and an efﬁcient waiting mechanism when required. Futexes require user level atomic operations, however

these are well supported on most modern architectures.

The actual interface to the futex operations are quite straight forward. The futex system call is prototyped

long sys futex(u32 *uaddr, int op, int val, struct timespec *utime, u32 *uaddr2)

uaddr

is the userspace address that is being used to hold the futex value.

and the other arguments vary as

below

• FUTEX WAIT : puts the current processes on the wait queue for this futex. First

val

is check to make sure

it is the same as the value in

uaddr

and assuming it is, the process is then queued on the futex. The optional

utime

argument gives a timeout (so timed waits can be implemented).

• FUTEX WAKE : wakes up

val

number of waiters.

• FUTEX REQUEUE (since 2.5.70) : will requeue threads waiting on

uaddr

uaddr2

val

takes the number

of processes to wake up, whilst

utime

is overloaded to be the number of waiters to move between the

queues. Requeuing allows you to avoid swarming; imagine having two locks

and

, where there are

waiters on

. Once

is unlocked, all

swarm trying to get lock

, however only one will get it. The other

n-1

waiters will immediately go onto the wait queue of the second lock.

To fully understand the process, we can look at the locking primitives used to implement mutex’s in NPTL.

A process locking a mutex through the POSIX standard

pthread mutex lock()

interface will end up executing

something like Algorithm 1. Note that in the uncontested case there is no need for a system call or even a context

switch. In the contested case, we wait on the futex and when woken, test if we have the lock (n.b, the real code

obviously makes sure the appropriate parts are atomic).

剩余10页未读，继续阅读

评论收藏

内容反馈

Leslie-M

2012-07-31

英文的， 11页，
lichunqiang11

2013-08-09

资料不错，长见识了，谢谢！

realdragon2

粉丝: 5
资源: 1

Linux线程库: NPTL

最新资源

Linux线程库: NPTL

linux 下线程库的使用

由浅入深Linux下pthread线程库介绍

Linux2.6内核实现的是NPTL线程模型

Linux操作系统线程库性能测试与分析

arm-hisiv100nptl-linux.zip

NPTL development document

海思开发包arm-hisiv100nptl-linux

NPTL_Support_for_uClibc.pdf

nptl-design.pdf.gz_NPTL_doc

Serveur_Lugdunum_DServer_EServer_17.15.x86_64-linux.nptl

NPTL Source Code

er_Serveur_Lugdunum_DServer_EServer_17.15.i686-linux.nptl.gz

gdbserver_hisi_v100nptl

nptl design

Linux 操作系统安装

linux epoll模型

ntp.zip_NTP 海思_arm ntp_linux NTP_ntp linux_海思

bash-4.3.30-mips32r1-linux-static.tar.xz

bash-5.0-mips32r1-linux-static.tar.xz

Hi3520D_V100R001C01SPC022

max30102实战资料，全部免费开源，包含硬件设计，下位机程序，上位机程序，结构设计

VMware Workstation 16虚拟机安装包

FinalShell安装包，让用户通过SSH、Telnet或者RDP等协议连接到远程服务器或设备，实现远程控制和管理

Ubuntu微信Linux版（非Wine版）

iStoreOS-PassWall

vdhcoapp 2.0.19 linux版本，配合video downloadhelper使用

黑白群晖 DSM7.X 监控套件 SurveillanceStation-x86-64-9.1.1-10728 学习版

最新资源