1# How to create a custom Ubuntu image 2 3In the context of adding more utilities to the Ubuntu cloud image being used 4for integration testing, this quick guide details how to achieve the proper 5modification of an official Ubuntu cloud image. 6 7## Create the image 8 9Let's go through the steps on how to extend an official Ubuntu image. These 10steps can be applied to other distributions (with a few changes regarding 11package management). 12 13### Get latest Ubuntu cloud image 14 15```bash 16wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img 17``` 18 19### Check the file format is QCOW2 20 21```bash 22file focal-server-cloudimg-amd64.img 23focal-server-cloudimg-amd64.img: QEMU QCOW2 Image (v2), 2361393152 bytes 24``` 25 26### Convert QCOW2 into RAW 27 28```bash 29qemu-img convert -p -f qcow2 -O raw focal-server-cloudimg-amd64.img focal-server-cloudimg-amd64.raw 30``` 31 32### Identify the Linux partition 33 34The goal is to mount the image rootfs so that it can be modified as needed. 35That's why we need to identify where the Linux filesystem partition is located 36in the image. 37 38```bash 39sudo fdisk -l focal-server-cloudimg-amd64.raw 40Disk focal-server-cloudimg-amd64.raw: 2.2 GiB, 2361393152 bytes, 4612096 sectors 41Units: sectors of 1 * 512 = 512 bytes 42Sector size (logical/physical): 512 bytes / 512 bytes 43I/O size (minimum/optimal): 512 bytes / 512 bytes 44Disklabel type: gpt 45Disk identifier: A1171ABA-2BEA-4218-A467-1B2B607E5953 46 47Device Start End Sectors Size Type 48focal-server-cloudimg-amd64.raw1 227328 4612062 4384735 2.1G Linux filesystem 49focal-server-cloudimg-amd64.raw14 2048 10239 8192 4M BIOS boot 50focal-server-cloudimg-amd64.raw15 10240 227327 217088 106M EFI System 51 52Partition table entries are not in disk order. 53``` 54 55### Mount the Linux partition 56 57```bash 58mkdir -p /mnt 59sudo mount -o loop,offset=$((227328 * 512)) focal-server-cloudimg-amd64.raw /mnt 60``` 61 62### Set up DNS 63 64The next step describes changing the root directory to the rootfs contained by 65the cloud image. For DNS to work in the root directory, you will need to first bind-mount 66the host `/etc/resolv.conf` onto the mounted linux partition of the cloud image. 67 68```bash 69sudo mount -o bind /etc/resolv.conf /mnt/etc/resolv.conf 70``` 71 72### Change root directory 73 74Changing the root directory will allow us to install new packages to the rootfs 75contained by the cloud image. 76 77```bash 78sudo chroot /mnt 79mount -t proc proc /proc 80mount -t devpts devpts /dev/pts 81``` 82 83### Install needed packages 84 85In the context Cloud Hypervisor's integration tests, we need several utilities. 86Here is the way to install them for a Ubuntu image. This step is specific to 87Ubuntu distributions. 88 89```bash 90apt update 91apt install fio iperf iperf3 socat stress cpuid tpm2-tools 92``` 93 94### Remove counterproductive packages 95 96* snapd: 97 98This prevents snapd from trying to mount squashfs filesystem when the kernel 99might not support it. This might be the case when the image is used with direct 100kernel boot. This step is specific to Ubuntu distributions. 101 102* pollinate: 103 104Remove this package which can fail and lead to the SSH daemon failing to start. 105See #2113 for details. 106 107```bash 108apt remove --purge snapd pollinate 109``` 110 111 112### Cleanup the image 113 114Leave no trace in the image before unmounting its content. 115 116```bash 117umount /dev/pts 118umount /proc 119history -c 120exit 121umount /mnt/etc/resolv.conf 122umount /mnt 123``` 124 125### Rename the image 126 127Renaming is important to identify this is a modified image. 128 129```bash 130mv focal-server-cloudimg-amd64.raw focal-server-cloudimg-amd64-custom-$(date "+%Y%m%d")-0.raw 131``` 132 133The `-0` is the revision and is only necessary to change if multiple images are 134updated on the same day. 135 136### Create QCOW2 from RAW 137 138Last step is to create the QCOW2 image back from the modified image. 139 140```bash 141qemu-img convert -p -f raw -O qcow2 focal-server-cloudimg-amd64-custom-$(date "+%Y%m%d")-0.raw focal-server-cloudimg-amd64-custom-$(date "+%Y%m%d")-0.qcow2 142``` 143 144## Switch CI to use the new image 145 146### Upload to Azure storage 147 148The next step is to update both images (QCOW2 and RAW) stored as part of the 149Azure storage account, replacing them with the newly created ones. This will 150make these new images available from the integration tests. This is usually 151achieved through the web interface. 152 153### Update integration tests 154 155Last step is about updating the integration tests to work with this new image. 156The key point is to identify where the Linux filesystem partition is located, 157as we might need to update the direct kernel boot command line, replacing 158`/dev/vda1` with the appropriate partition number. 159 160Update all references to the previous image name to the new one. 161 162## NVIDIA image for VFIO bare-metal CI 163 164Here we are going to describe how to create a cloud image that contains the 165necessary NVIDIA drivers for our VFIO bare-metal CI. 166 167### Download base image 168 169We usually start from one of the custom cloud image we have previously created 170but we can use a stock cloud image as well. 171 172```bash 173wget https://ch-images.azureedge.net/jammy-server-cloudimg-amd64-custom-20230119-0.raw 174mv jammy-server-cloudimg-amd64-custom-20230119-0.raw jammy-server-cloudimg-amd64-nvidia.raw 175``` 176 177### Extend the image size 178 179The NVIDIA drivers consume lots of space, which is why we must resize the image 180before we proceed any further. 181 182```bash 183qemu-img resize jammy-server-cloudimg-amd64-nvidia.raw 5G 184``` 185 186### Resize the partition 187 188We use `parted` for fixing the GPT after the image was resized, as well as for 189resizing the `Linux` partition. 190 191```bash 192sudo parted jammy-server-cloudimg-amd64-nvidia.raw 193 194(parted) print 195Warning: Not all of the space available to jammy-server-cloudimg-amd64-nvidia.raw 196appears to be used, you can fix the GPT to use all of the space (an extra 5873664 197blocks) or continue with the current setting? 198Fix/Ignore? Fix 199Model: (file) 200Disk jammy-server-cloudimg-amd64-nvidia.raw: 5369MB 201Sector size (logical/physical): 512B/512B 202Partition Table: gpt 203Disk Flags: 204 205Number Start End Size File system Name Flags 20614 1049kB 5243kB 4194kB bios_grub 20715 5243kB 116MB 111MB fat32 boot, esp 208 1 116MB 2361MB 2245MB ext4 209 210(parted) resizepart 1 5369MB 211(parted) print 212Model: (file) 213Disk jammy-server-cloudimg-amd64-nvidia.raw: 5369MB 214Sector size (logical/physical): 512B/512B 215Partition Table: gpt 216Disk Flags: 217 218Number Start End Size File system Name Flags 21914 1049kB 5243kB 4194kB bios_grub 22015 5243kB 116MB 111MB fat32 boot, esp 221 1 116MB 5369MB 5252MB ext4 222 223(parted) quit 224``` 225 226### Create a macvtap interface 227 228Rely on the following [documentation](macvtap-bridge.md) to set up a 229macvtap interface to provide your VM with proper connectivity. 230 231### Boot the image 232 233It is particularly important to boot with a `cloud-init` disk attached to the 234VM as it will automatically resize the Linux `ext4` filesystem based on the 235partition that we have previously resized. 236 237```bash 238./cloud-hypervisor \ 239 --kernel hypervisor-fw \ 240 --disk path=focal-server-cloudimg-amd64-nvidia.raw path=/tmp/ubuntu-cloudinit.img \ 241 --cpus boot=4 \ 242 --memory size=4G \ 243 --net fd=3,mac=$mac 3<>$"$tapdevice" 244``` 245 246### Bring up connectivity 247 248If your network has a DHCP server, run the following from your VM 249 250```bash 251sudo dhclient 252``` 253 254But if that's not the case, let's give it an IP manually (the IP addresses 255depend on your actual network) and set the DNS server IP address as well. 256 257```bash 258sudo ip addr add 192.168.2.10/24 dev ens4 259sudo ip link set up dev ens4 260sudo ip route add default via 192.168.2.1 261sudo resolvectl dns ens4 8.8.8.8 262``` 263 264#### Check connectivity and update the image 265 266```bash 267sudo apt update 268sudo apt upgrade 269``` 270 271### Install NVIDIA drivers 272 273The following steps and commands are referenced from the 274[NVIDIA official documentation](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#ubuntu-lts) 275about Tesla compute cards. 276 277```bash 278distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g') 279wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb 280sudo dpkg -i cuda-keyring_1.0-1_all.deb 281sudo apt-key del 7fa2af80 282sudo apt update 283sudo apt -y install cuda-drivers 284``` 285 286### Check the `nvidia-smi` tool 287 288Quickly validate that you can find and run the `nvidia-smi` command from your 289VM. At this point it should fail given no NVIDIA card has been passed through 290the VM, therefore no NVIDIA driver is loaded. 291 292### Workaround LA57 reboot issue 293 294Add `reboot=a` to `GRUB_CMDLINE_LINUX` in `etc/default/grub` so that the VM 295will be booted with the ACPI reboot type. This resolves a reboot issue when 296running on 5-level paging systems. 297 298```bash 299sudo vim /etc/default/grub 300sudo update-grub 301sudo reboot 302``` 303 304### Remove previous logins 305 306Since our integration tests rely on past logins to count the number of reboots, 307we must ensure to clear the list. 308 309```bash 310>/var/log/lastlog 311>/var/log/wtmp 312>/var/log/btmp 313``` 314 315### Clear history 316 317``` 318history -c 319rm /home/cloud/.bash_history 320``` 321 322### Reset cloud-init 323 324This is mandatory as we want `cloud-init` provisioning to work again when a new 325VM will be booted with this image. 326 327``` 328sudo cloud-init clean 329``` 330