any buy-in by other parties. If developers of boutique vir-
tual I/O mechanisms are familiar with Linux, it might guide
them to map the Linux API neatly onto their own ABI. But
“if” and “might” are insufficient: We can be more ambitious
than this.
Exp erience has shown that boutique transport mechanisms
tend to be particular not only to a given hyp er visor and ar-
chitecture, but often to each particular kind of device. So
the next obvious step in our attempt to guide towards uni-
formity is to provide a common ABI for general publication
and use of buffers. Deliberately, our virtio ring implemen-
tation is not at all revolutionary: developers should look at
this code and see nothing to dislike.
Finally, we provide a two complete ABI implementations,
using the virtio ring infrastructure and the Linux API for
virtual I/O devices. These implement the final part of vir-
tual I/O: device probing and configuration. Importantly,
they demonstrate how simple it is to use the Linux virtual
I/O API to provide feature negotiation in a forward and
backward compatible manner so that future Linux driver
features can be detected and used by any host implementa-
tion.
The explicit separation of drivers, transport and configura-
tion represents a change in thinking from current implemen-
tations. For example, you can’t really use the Xen’s Linux
network driver in a new hypervisor unless you support Xen-
Bus probing and configuration system.
3. VIRTIO: A LINUX-INTERNAL ABSTRAC-
TION API
If we want to reduce duplication in virtual device drivers,
we need a decent abstraction so drivers can share code. One
metho d is to provide a set of common helpers which virtual
drivers can use, but more ambitious is to use common drivers
and an operations structure: a series of function pointers
which are handed to the generic driver to interface with any
of several transport implementations. The task is to cr eate a
transp or t abstraction for all virtual devices which is simple,
close to optimal for an efficient transport, and yet allows a
shim to existing transports without undue pain.
The current result (integrated in 2.6.24) is that virtio drivers
register themselves to handle a particular 32-bit device type,
optionally restricting to a specific 32-bit vendor field. The
driver’s probe function is called when a suitable virtio device
is found: the struct virtio_device passed in has a vir-
tio_config_ops pointer which the driver uses to unpack the
device configuration.
The configuration operations can be divided into four parts:
reading and writing feature bits, reading and writing the
configuration space, reading and writing the status bits and
device reset. The device looks for device-type-specific fea-
ture bits corresponding to features it wants to use, such
as the VIRTIO_NET_F_CSUM feature bit indicating whether a
network device supports checksum offload. Features bits are
explicitly acknowledged: the host knows what feature bits
are acked by the guest, and hence what features that driver
understands.
The second part is the configuration space: this is effec-
tively a structure associated with the virtual device contain-
ing device-specific information. This can be both read and
written by the guest. For example, network devices have a
VIRTIO_NET_F_MAC feature bit, which indicates that the host
wants the device to have a particular MAC address, and the
configuration space contains the value.
These mechanisms give us room to grow in future, and for
hosts to add features to devices with the only requirement
b eing that the feature bit numbers and configuration space
layout be agreed upon.
There are also operations to set and get an 8 bit device
status word which the guest uses to indicate the status of
device probe; when the VIRTIO_CONFIG_S_DRIVER_OK is set,
it shows that the guest driver has completed feature probing.
At this point the host knows what features it understands
and wants to use.
Finally, the reset operation is expected to reset the device,
its configuration and status bits. This is necessary for mo d-
ular drivers which may be removed and then re-added, thus
encountering a previously initialized device. It also avoids
the problem of removing buffers from a device on driver
shutdown: after reset the buffers can be freed in the sure
knowledge that the device won’t overwrite them. It could
also be used to attempt driver recovery in the guest.
3.1 Virtqueues: A Transport Abstraction
Our configuration API is important, but the performance-
critical part of the API is the actual I/O mechanism. Our
abstraction for this is a virtqueue: the configuration oper-
ations have a find_vq which returns a populated structure
for the queue, given the virtio device and an index number.
Some devices have only one queue, such as the virtio block
device, but others such as networking and console devices
have a queue for input and one for output.
A virtqueue is simply a queue into which buffers are posted
by the guest for consumption by the host. Each buffer is
a scatter-gather array consisting of readable and writable
parts: the structure of the data is dependent on the device
type. The virtqueue operations structure looks like so:
struct virtqueue_ops {
int (*add_buf)(struct virtqueue *vq,
struct scatterlist sg[],
unsigned int out_num,
unsigned int in_num,
void *data);
void (*kick)(struct virtqueue *vq);
void *(*get_buf)(struct virtqueue *vq,
unsigned int *len);
void (*disable_cb)(struct virtqueue *vq);
bool (*enable_cb)(struct virtqueue *vq);
};
The add_buf call is used to add a new buffer to the queue;
the data argument is a driver-supplied non-NULL token which
is returned when the buffer has been consumed. The kick
call notifies the other side (i.e., the host) when buffers have
精品传输机制
Virtio:Linux内部抽象API
1.feature bits
2.cfg space
3.status bits
4.dev reset
virtqueues:传输的抽象