Introduction to Xen Virtual Machine

上一篇 / 下一篇  2007-12-17 23:54:38 / 个人分类:Virtualization

[http://www.linuxjournal.com/article/8540]
September 1st, 2005 byRami RoseninLinux Journal
 
Everyone's talking about Xen, but the code is complex. Here's a starting point.

~ G?6ou(FL$i0@0This article is intended mainly for developers who are new to Xen and who want to know more about it. The first two sections, however, are general and do not deal with code.

HW f\.UX;u051Testing软件测试网 p~P1bZ0c8{

TheXen VMM(virtual machine monitor) is an open-source project that is being developed in the computer laboratory of the University of Cambridge, UK. It enables us to create many virtual machines, each of which runs an instance of an operating system.

`B7Q6I/Kq051Testing软件测试网!p_H TH9H k D Vi

These guest operating systems can be a patched Linux kernel, version 2.4 or 2.6, or a patched NetBSD/FreeBSD kernel. User applications can run on guest OSes as they are, without any change in code. Sun also is working on aSolaris-on-Xen port.51Testing软件测试网F}4xL3?oia

51Testing软件测试网| _go;U+w)a'uZ

I have been following the Xen project closely for more than a year. My interest in Xen began after I read about it in the OLS (Ottawa Linux Symposium) 2004 proceedings. It increased after hearingan interesting lectureon the subject at a local UNIX group meeting.51Testing软件测试网,O!Pksb rtQ8p0^

2|ezse0Full virtualization has been done with some hardware emulators; one of the popular open-source projects is theBochs IA-32 Emulator. Another known project isqemu. The disadvantage of hardware emulators is their performance.

2HKo8aT\051Testing软件测试网 ?5C'Zr*w2f$@#L(X D7}

The idea behind the Xen Project (para-virtualization) is not new.The performance metricsand the high efficiency it achieves, however, can be seen as a breakthrough. The overhead of running Xen is very small indeed, about 3%.51Testing软件测试网!Mlz3ejpYSH;V"j

51Testing软件测试网l^g{]m

As was said in the beginning, currently Xen patches the kernel. But, future processors will support virtualization so that the kernel can run on it unpatched. For example, both Intel VT and AMD Pacifica processors will include such support.

v#^`N O]?051Testing软件测试网&\ ND k x

In August 2005,XenSource, a commercial company that develops virtualization solutions based on Xen, announced in Intel Developer Forum (IDF) that it has used Intel VT-Enabled Platforms with Xen to virtualize both Linux and Microsoft Windows XP SP2.

PmR,z#iO4G051Testing软件测试网:Oyq,@'Nd^yB D5s;K

Xen with Intel VT or Xen with AMD Pacifica would be competitive with if not superior to other virtualization methods, as well as to native operation.

Hk2w d7u{&o{HB6L0

P2ig*W9a*p0In the same arena, VMware is a commercial company that develops the ESX server, a virtualization solution not based on Xen. VMware announced in early August 2005 that it will be providing its partners with access to VMware ESX Server source code and interfaces under a new program calledVMware Community Source.

;b p(rJP s+V6z051Testing软件测试网U&d&cSu(hs1u

A clear advantage of VMware is that it does not require a patch on the guest OS. The VMware solution also enables the guest OS to be Windows. VMware solution is probably slower than Xen, though, because it uses shadow page tables whereas Xen uses both direct and shadow page tables.51Testing软件测试网Uu6?#xBA9c

51Testing软件测试网 _,xz*|N2mM

Xen already is bundled in some distributions, including Fedora Core 4, Debian and SuSE Professional 9.3, and it will be included in RHEL5. The Fedora Project hasRPMs for installing Xen, and other Linux distros have prepared installation packages for Xen as well.51Testing软件测试网;A p@A|+z|}

8@@$p^6J0In addition, there is a port of Xen to IA-64. Plus, an interesting Master's Thesis already has been written on the topic,"HPC Virtualization with Xen on Itanium".

Q YTJ P^0e E051Testing软件测试网yg#F&J#Nvht

Support for other processors is in progress. The Xen team is working on an x86_64 port, while IBM is working on Power5 support.

b8?Ln-O2m6`%` TN051Testing软件测试网}9p }xjb#S1T

The Xen Web site has some versions available for download, both the 2.0.* version and the xen-unstable version, also termed xen-3.0-devel. You also can use the Mercurial source code management system to download the latest version.51Testing软件测试网 b6NZ@Bs3b5fS

51Testing软件测试网Tp;S9ddwd%H] Q#B

I installed the xen-3.0-devel, because at the time, the 2.0.* version did not have the AGP support I had needed. This may have changed since my installation. I found the installation process to be quite simple. You should runmake worldandmake install, update the bootloader conf file and that's it--you're ready to boot into Xen. You should follow the instructions inthe user manualfor best results.

Na7S)d%])w5G0
The Return of the Ring
51Testing软件测试网~1?,j z-rv3K

The protection model of the Intel x386 CPU is built from four rings: ring 0 is for the OS and ring 3 is for user applications. Rings 1 and 2 are not used except in rare cases, such as OS/2; see theIA-32 Intel Architecture Software Developer's Manual, Volume 1: Basic Architecture, section 4.5 (privilege levels).

,yt.m5w4G!B0

di$r8I4`B,Wh0In Xen, a "hypervisor" runs in ring 0, while guest OSes run in ring 1 and applications run in ring 3. The x64/64 is a little different in this respect: both guest kernel and applications run in ring 3 (seeXen 3.0 and the Art of Virtualization, section 4.1in OLS 2005 proceedings).

,}yiF/fW fn0

OP.]-U&M0Xen itself is called a hypervisor because it operates at a higher privilege level than the supervisor code of the guest operating systems that it hosts.

5h'HUV)a:[hq z0

/D*p*`3h-h h2k,R0At boot time, Xen is loaded into memory in ring 0. It starts a patched kernel in ring 1; this is called domain 0. From this domain you can create other domains, destroy them, perform migrations of domains, set parameters to a domain and more. The domains you create also run their kernels in ring 1. User applications run in ring 3. See Figure 1, illustrating the x86 protection rings in Xen.

Y9HXo6mT[Z0
51Testing软件测试网V4S"ms @ LU9t"G(Y/x

Figure 1

I @\l2t0
51Testing软件测试网!t PB/j t~I

Currently, domain 0 can be a patched 2.4 or 2.6 Linux kernel. According to the Xen developer mailing list, however, it seems that in the future, domain 0 will support only a 2.6 kernel patch. Much of the work of building domain0 is done in construct_dom0() method, in xen/arch/x86/domain_build.c.51Testing软件测试网 @O,r8S3u3j(y+m

51Testing软件测试网6]!P!d@ cz M8W

The physical device drivers run only in the privileged domain, domain 0. Xen relies on Linux or another patched OS kernel to provide virtually all of its device support. The advantage of this is it liberates the Xen development team from having to write its own device drivers.

7zF`'B;TO;Ko b0

;eaR T%`6s0Using Xen on a processor that has a tagged TLB improves performance. A tagged TLB enables attaching address space identifier (ASID) to the TLB entries. With this feature, there is no need to flush the TLB when the processor switches between the hypervisor and the guest OSes, and this reduces the cost of memory operations.51Testing软件测试网BP[ZWfFZx

51Testing软件测试网;} q~ j P;MF A

Some manufacturers offer this tagged TLB feature. For example, a document titled"AMD64 Virtualization Codenamed 'Pacifica' Technology Secure Virtual Machine Architecture Reference Manual"was published in May 2005. According to it, this architecture uses a tagged TLB.

A MpLY&K$\*x0

+[In F(O1|vO)s]0Next up is an overview of the Xend and XCS layers. These layers are the management layers that enable users to manage and control both the domains and Xen. Following it is a discussion of the communication mechanism between domains and of virtual devices. The Xen Project source code is quite complex, and I hope this may be a starting point for delving into it.51Testing软件测试网\[&QV.v

The Xend Daemon
51Testing软件测试网W;X;r1B| M2j

First, what is the Xend daemon? It is the Xen controller daemon, meaning it handles creating new domains, destroying extant domains, migration and many other domain management tasks. A large part of its activity is based on running an HTTP server. The default port of the HTTP socket is 8000, which can be configured. Various requests for controlling the domains are handled by sending HTTP requests for domain creation, domain shutdown, domain save and restore, live migration and more. A large part of the Xend code is written in Python, and it also uses calls to C methods from within Python scrīpts.51Testing软件测试网+D1t*idg"v*@

Vi.h)E&B[ w0We start the Xend daemon by running from the command line, after booting into Xen,xend start. What exactly does this command involve? First,Xend requires Python 2.3to support its logging functions.51Testing软件测试网'q~_+k r7h

51Testing软件测试网 ts0Vk G)Z`0jm|

The work of the Xend daemon is based on interaction with an XCS server, the control Switch. So, when we start the Xend daemon, we check to see if the XCS is up and running. If it is not, we try to start XCS. This step is discussed more fully later in this article. .

&@e0` MS-l"l%H7}%v051Testing软件测试网@ZMx!b+G&t%M J

The SrvDaemon is, in fact, the Xend main program; starting the Xend daemon creates an instance of SrvDaemon class (tools/python/xen/xend/server/SrvDaemon.py.). Two log files are created here, /var/log/xend.log and /var/log/xend-debug.log.51Testing软件测试网i#V!Bg^+acUE

51Testing软件测试网XiJS8|wP+uR

We next create a Channel Factory in createFactories() method. The Channel Factory has a notifier object embedded inside. Much of the work of the Xend daemon is based on messages received by this notifier. This factory creates a thread that reads the notifier in an endless loop. The notifier delegates the read request to the XCS server; see xu_notifier_read() in xen/lowlevel/xu.c. This method sends the read request to the XCS server by calling xcs_data_read().

%aa `M5g BQ0
Creating a Domain

5u2j2D xHu0The creation of a domain is accomplished by using a hypercall (DOM0_CREATEDOMAIN). What is a hypercall? In the Linux kernel, there is a system call with which a user space can call a method in the kernel; this is done by an interrupt (Int 0x80). In Xen, the analogous call is a hypervisor call, through which domain 0 calls a method in the hypervisor. This also is accomplished by an interrupt (Int 0x82). The hypervisor accesses each domain by its virtual CPU, struct vcpu in include/xen/sched.h.51Testing软件测试网4B p\A/e2Zp

51Testing软件测试网 N.W7E RV

The XendDomain class and the XendDomainInfo class play a significant part in creating and destroying domains. The domain_create() method in XendDomain class is called when we create a new domain; it starts the process of creating of a domain.

*N i*u4W[SF9u:`051Testing软件测试网,l?/[fK H R

The XendDomainInfo class and its methods are responsible for the actual construction of a domain. The construction process includes setting up the devices in the new domain. This involves a lot of messaging between the front end device drivers in the domain and the back end device drivers in the back end domain. We talk about the back end and front end device drivers later.

D-a?\ V:F0
The XCS Server

O"Ge%sT:G'b0The XCS server opens two TCP sockets, the control connection and the data connection. The difference between the control connection and the data connection is the control connection is synchronous while the data connection is asynchronous. The notifier object, which was mentioned before, for example, is a client of the XCS server.51Testing软件测试网9c6k-p:Ix&sq'P

\7M ?;`:s!i,VT0A connection to the XCS server is represented by an object of type connection_t. After a connection is bound, it is added to a list of connections, connection_list, which is iterated every five seconds to see whether new control or data messages arrived. Control messages, which can be control or data messages, are handled by handle_control_message() or by handle_data_message(), respectively.51Testing软件测试网r U6B fO\8TQn

Creating Virtual Devices When Creating a Domain

a8A A(AVF0The create() method in XendDomainInfo starts a chain of actions to create a domain. The virtual devices of the domain first are created. The create() method calls create_blkif() to create a block device interface (blkif); this is a must even if the VM doesn't use a disk. The other virtual devices are created by create_configured_devices(), which eventually calls the createDevice() method of DevController class (see controller.py). This method calls the newDevice() method of the corresponding class. All the device classes inherit from Dev, which is an abstract class representing a device attached to a device controller. Its attach() abstract (empty) method is implemented in each subclass of the Dev class; this method attaches the device to its front end and back end. Figure 2 shows the devices hierarchy, and Figure 3 shows the device controller hierarchy.51Testing软件测试网Ox{,w"k)W2[AH%Yf6l

qFvyo0Figure 251Testing软件测试网1cb(Y]X,}D2Q

51Testing软件测试网j"j2x o d}5A

Figure 3

(e_6U4U DE0
51Testing软件测试网L4IC&\6X4qD+I

Domain 0 runs the back end drivers, and the newly created domain runs the front end drivers. A lot of messages pass between the back end and front end drivers. The front end driver is a virtual driver in the sense that it does not use specific hardware details; the code resides in drivers/xen, in the sparse tree.51Testing软件测试网 Z3]S `m](L

9@2S3b(Uc0Event channels and shared-memory rings are the means of communication among domains. For example, in the case of netfront device (netfront.c), which is the network card front end interface, the np->tx and the np->rx are the shared memory pages, one for the receiver buffer and one for the transmitted buffer. In send_interface_connect(), we tell the netback end to bring up the interface. The connect message travels through the event channel to the netif_connect() method of the back end, interface.c. The netif_connect() method calls the get_vm_area(2*PAGE_SIZE, VM_IOREMAP)). The get_vm_area() method searches in the kernel virtual mapping area for an area whose size equals two pages.51Testing软件测试网T6l'U @[ N

Ot;N.Vip6Q0In the blkif case, which is the block device front end interface, blkif_connect() also calls get_vm_area(). In this case, however, it uses only one page of memory.51Testing软件测试网)L,X)Hc-z.`IN

+ztr C*HJ*cr0The interrupts associated with virtual devices are virtual interrupts. When you runcat /proc/interruptsfrom domainU, look at the interrupts with numbers higher than 256; they are labeled "Dynamic-irq".

+}\f[Hx+LVP051Testing软件测试网^"BG2Zl4_U5}

How are IRQs redirected to the guest OS? The do_IRQ() method was changed to support IRQs for the guest OS. This method calls __do_IRQ_guest() if the IRQ is for the guest OS, xen/arch/x86/irq.c. The __do_IRQ_guest() uses the event channel mechanism to send the interrupt to the guest OS, send_guest_pirq() method in event_channel.c.

O3\9M%o}Q8twMZ0
Conclusion

p)sx&g-o9`0The Xen Project is an interesting and promising project that received increasing notice over the past year. The code is complex, especially the virtual memory management, the live migration implementation and the grant tables mechanism. This article is an introductory article, however, and does not deal with these topics. I hope, though, that it has provided a starting point to those who want to learn more and delve into the code.51Testing软件测试网g&g7T"N&i0l4YdU;Q

CHB9c)Dw0\0Note: This article refers to Xen-unstable, xen-3.0-devel, which is the basis for Xen-3.0, which should be released soon. The kernel referred to for dom0/domU is a 2.6.* kernel. Whenever the term class is used, it refers to a Python class.

i2{5?-bJ#b+]!f0
51Testing软件测试网F1z*{w#Z]

Rami Rosen is a Computer Science graduate of Technion, the Israel Institute of Technology, located in Haifa. He works as a Linux kernel programmer for a networking start-up, and he can be reached atramirose@gmail.com. In his spare time he likes running, solving cryptic puzzles and convincing and helping everyone he knows to move to this wonderful operating system, Linux.

M4z^ NR}%S0

TAG: xen Virtualization

 

评分:0

我来说两句

Open Toolbar