xref: /cloud-hypervisor/docs/custom-image.md (revision d3fade85a725d36653dc4f636a1e55177eac2ddc)
1# How to create a custom Ubuntu image
2
3In the context of adding more utilities to the Ubuntu cloud image being used
4for integration testing, this quick guide details how to achieve the proper
5modification of an official Ubuntu cloud image.
6
7## Create the image
8
9Let's go through the steps on how to extend an official Ubuntu image. These
10steps can be applied to other distributions (with a few changes regarding
11package management).
12
13### Get latest Ubuntu cloud image
14
15```bash
16wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img
17```
18
19### Check the file format is QCOW2
20
21```bash
22file focal-server-cloudimg-amd64.img
23focal-server-cloudimg-amd64.img: QEMU QCOW2 Image (v2), 2361393152 bytes
24```
25
26### Convert QCOW2 into RAW
27
28```bash
29qemu-img convert -p -f qcow2 -O raw focal-server-cloudimg-amd64.img focal-server-cloudimg-amd64.raw
30```
31
32### Identify the Linux partition
33
34The goal is to mount the image rootfs so that it can be modified as needed.
35That's why we need to identify where the Linux filesystem partition is located
36in the image.
37
38```bash
39sudo fdisk -l focal-server-cloudimg-amd64.raw
40Disk focal-server-cloudimg-amd64.raw: 2.2 GiB, 2361393152 bytes, 4612096 sectors
41Units: sectors of 1 * 512 = 512 bytes
42Sector size (logical/physical): 512 bytes / 512 bytes
43I/O size (minimum/optimal): 512 bytes / 512 bytes
44Disklabel type: gpt
45Disk identifier: A1171ABA-2BEA-4218-A467-1B2B607E5953
46
47Device                             Start     End Sectors  Size Type
48focal-server-cloudimg-amd64.raw1  227328 4612062 4384735  2.1G Linux filesystem
49focal-server-cloudimg-amd64.raw14   2048   10239    8192    4M BIOS boot
50focal-server-cloudimg-amd64.raw15  10240  227327  217088  106M EFI System
51
52Partition table entries are not in disk order.
53```
54
55### Mount the Linux partition
56
57```bash
58mkdir -p /mnt
59sudo mount -o loop,offset=$((227328 * 512)) focal-server-cloudimg-amd64.raw /mnt
60```
61
62### Set up DNS
63
64The next step describes changing the root directory to the rootfs contained by
65the cloud image. For DNS to work in the root directory, you will need to first bind-mount
66the host `/etc/resolv.conf` onto the mounted linux partition of the cloud image.
67
68```bash
69sudo mount -o bind /etc/resolv.conf /mnt/etc/resolv.conf
70```
71
72### Change root directory
73
74Changing the root directory will allow us to install new packages to the rootfs
75contained by the cloud image.
76
77```bash
78sudo chroot /mnt
79mount -t proc proc /proc
80mount -t devpts devpts /dev/pts
81```
82
83### Install needed packages
84
85In the context Cloud Hypervisor's integration tests, we need several utilities.
86Here is the way to install them for a Ubuntu image. This step is specific to
87Ubuntu distributions.
88
89```bash
90apt update
91apt install fio iperf iperf3 socat stress cpuid tpm2-tools
92```
93
94### Remove counterproductive packages
95
96* snapd:
97
98This prevents snapd from trying to mount squashfs filesystem when the kernel
99might not support it. This might be the case when the image is used with direct
100kernel boot. This step is specific to Ubuntu distributions.
101
102* pollinate:
103
104Remove this package which can fail and lead to the SSH daemon failing to start.
105See #2113 for details.
106
107```bash
108apt remove --purge snapd pollinate
109```
110
111
112### Cleanup the image
113
114Leave no trace in the image before unmounting its content.
115
116```bash
117umount /dev/pts
118umount /proc
119history -c
120exit
121umount /mnt/etc/resolv.conf
122umount /mnt
123```
124
125### Rename the image
126
127Renaming is important to identify this is a modified image.
128
129```bash
130mv focal-server-cloudimg-amd64.raw focal-server-cloudimg-amd64-custom-$(date "+%Y%m%d")-0.raw
131```
132
133The `-0` is the revision and is only necessary to change if multiple images are
134updated on the same day.
135
136### Create QCOW2 from RAW
137
138Last step is to create the QCOW2 image back from the modified image.
139
140```bash
141qemu-img convert -p -f raw -O qcow2 focal-server-cloudimg-amd64-custom-$(date "+%Y%m%d")-0.raw focal-server-cloudimg-amd64-custom-$(date "+%Y%m%d")-0.qcow2
142```
143
144## Switch CI to use the new image
145
146### Upload to Azure storage
147
148The next step is to update both images (QCOW2 and RAW) stored as part of the
149Azure storage account, replacing them with the newly created ones. This will
150make these new images available from the integration tests. This is usually
151achieved through the web interface.
152
153### Update integration tests
154
155Last step is about updating the integration tests to work with this new image.
156The key point is to identify where the Linux filesystem partition is located,
157as we might need to update the direct kernel boot command line, replacing
158`/dev/vda1` with the appropriate partition number.
159
160Update all references to the previous image name to the new one.
161
162## NVIDIA image for VFIO baremetal CI
163
164Here we are going to describe how to create a cloud image that contains the
165necessary NVIDIA drivers for our VFIO baremetal CI.
166
167### Download base image
168
169We usually start from one of the custom cloud image we have previously created
170but we can use a stock cloud image as well.
171
172```bash
173wget https://ch-images.azureedge.net/jammy-server-cloudimg-amd64-custom-20230119-0.raw
174mv jammy-server-cloudimg-amd64-custom-20230119-0.raw jammy-server-cloudimg-amd64-nvidia.raw
175```
176
177### Extend the image size
178
179The NVIDIA drivers consume lots of space, which is why we must resize the image
180before we proceed any further.
181
182```bash
183qemu-img resize jammy-server-cloudimg-amd64-nvidia.raw 5G
184```
185
186### Resize the partition
187
188We use `parted` for fixing the GPT after the image was resized, as well as for
189resizing the `Linux` partition.
190
191```bash
192sudo parted jammy-server-cloudimg-amd64-nvidia.raw
193
194(parted) print
195Warning: Not all of the space available to jammy-server-cloudimg-amd64-nvidia.raw
196appears to be used, you can fix the GPT to use all of the space (an extra 5873664
197blocks) or continue with the current setting?
198Fix/Ignore? Fix
199Model:  (file)
200Disk jammy-server-cloudimg-amd64-nvidia.raw: 5369MB
201Sector size (logical/physical): 512B/512B
202Partition Table: gpt
203Disk Flags:
204
205Number  Start   End     Size    File system  Name  Flags
20614      1049kB  5243kB  4194kB                     bios_grub
20715      5243kB  116MB   111MB   fat32              boot, esp
208 1      116MB   2361MB  2245MB  ext4
209
210(parted) resizepart 1 5369MB
211(parted) print
212Model:  (file)
213Disk jammy-server-cloudimg-amd64-nvidia.raw: 5369MB
214Sector size (logical/physical): 512B/512B
215Partition Table: gpt
216Disk Flags:
217
218Number  Start   End     Size    File system  Name  Flags
21914      1049kB  5243kB  4194kB                     bios_grub
22015      5243kB  116MB   111MB   fat32              boot, esp
221 1      116MB   5369MB  5252MB  ext4
222
223(parted) quit
224```
225
226### Create a macvtap interface
227
228Rely on the following [documentation](macvtap-bridge.md) to set up a
229macvtap interface to provide your VM with proper connectivity.
230
231### Boot the image
232
233It is particularly important to boot with a `cloud-init` disk attached to the
234VM as it will automatically resize the Linux `ext4` filesystem based on the
235partition that we have previously resized.
236
237```bash
238./cloud-hypervisor \
239	--kernel hypervisor-fw  \
240	--disk path=focal-server-cloudimg-amd64-nvidia.raw path=/tmp/ubuntu-cloudinit.img \
241	--cpus boot=4 \
242	--memory size=4G \
243	--net fd=3,mac=$mac 3<>$"$tapdevice"
244```
245
246### Bring up connectivity
247
248If your network has a DHCP server, run the following from your VM
249
250```bash
251sudo dhclient
252```
253
254But if that's not the case, let's give it an IP manually (the IP addresses
255depend on your actual network) and set the DNS server IP address as well.
256
257```bash
258sudo ip addr add 192.168.2.10/24 dev ens4
259sudo ip link set up dev ens4
260sudo ip route add default via 192.168.2.1
261sudo resolvectl dns ens4 8.8.8.8
262```
263
264#### Check connectivity and update the image
265
266```bash
267sudo apt update
268sudo apt upgrade
269```
270
271### Install NVIDIA drivers
272
273The following steps and commands are referenced from the
274[NVIDIA official documentation](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#ubuntu-lts)
275about Tesla compute cards.
276
277```bash
278distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
279wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
280sudo dpkg -i cuda-keyring_1.0-1_all.deb
281sudo apt-key del 7fa2af80
282sudo apt update
283sudo apt -y install cuda-drivers
284```
285
286### Check the `nvidia-smi` tool
287
288Quickly validate that you can find and run the `nvidia-smi` command from your
289VM. At this point it should fail given no NVIDIA card has been passed through
290the VM, therefore no NVIDIA driver is loaded.
291
292### Workaround LA57 reboot issue
293
294Add `reboot=a` to `GRUB_CMDLINE_LINUX` in `etc/default/grub` so that the VM
295will be booted with the ACPI reboot type. This resolves a reboot issue when
296running on 5-level paging systems.
297
298```bash
299sudo vim /etc/default/grub
300sudo update-grub
301sudo reboot
302```
303
304### Remove previous logins
305
306Since our integration tests rely on past logins to count the number of reboots,
307we must ensure to clear the list.
308
309```bash
310>/var/log/lastlog
311>/var/log/wtmp
312>/var/log/btmp
313```
314
315### Clear history
316
317```
318history -c
319rm /home/cloud/.bash_history
320```
321
322### Reset cloud-init
323
324This is mandatory as we want `cloud-init` provisioning to work again when a new
325VM will be booted with this image.
326
327```
328sudo cloud-init clean
329```
330