Musings On AWS, Munching Linux: 2017

Saturday, October 28, 2017

So Amazon Develops Its Own NIC - AWS Enhanced Networking!

Ethernet adapter types available for EC2

AWS EC2 instances offer 3 types of network adapters (depending upon the instance types) - the default VIF, which offers low (~100Mbps) to moderate (~300Mbps) network throughput, and two other network adapters that support enhanced networking (~10Gbps or greater) - Intel 82599 Virtual Function adapter, and the next generation Elastic Network Adapter (ENA). Enhanced networking also requires specialized H/W support (the above mentioned physical network adapters installed on the EC2 instance host) and thus this feature is only available through specific EC2 instance types.

Enhanced Networking

The VIF adapter present in an EC2 instance is provided by the underlying virtualization layer i.e. XEN hypervisor for AWS. This adapter employs usual network virtualization technique that involves significant overheads as it relies on traditional interrupt based (IRQ) approach inherent within the PCIe NIC design.

In a attempt to support higher network throughput, Amazon Web Services introduced enhanced networking for which they relied on PCIe NICs equipped with SR-IOV technology that allows a VM (EC2 instance) to bypass the underlying hypervisor (XEN) and use direct memory access (DMA) instead of interrupting the CPU (more about SR-IOV in the last section of this article).

Initially, Amazon offered 10Gbps network throughput for a few EC2 instance types using Intel's 82599 VF PCIe NIC. In January 2015, Amazon bought an Israel based chip maker Annapurna Labs; based upon the newly acquired company's flagship product Alpine, Amazon launched its own new generation PCIe NIC that supported upto 25Gbps of network throughput. Amazon christened this NIC as Elastic Network Adapter (ENA). ENA is only available for some specific EC2 instance types.

The following table, based on a similar table from Amazon Knowledge Center page on Enhanced Networking, summarizes the network adapter types and which EC2 instances types they are available for as of today -

How do I enable enhanced networking on EC2 instances

Refer Amazon documentation on enabling Enhanced Networking for EC2 instances.

The technology behind Enhanced Networking - SR-IOV

Assigning a dedicated PCI Network H/W adapter (or port) for each VM on a host can give a line rate throughput but it is not feasible, and software based sharing of IO device (IO Virtualization) imposes significant overheads and thus unable to use the capabilities of the physical device fully.

Single Root - Input Output Virtualization (SR-IOV) specification, released by PCI SIGin 2007, is one of the technologies to achieve Network Function Virtualization (NFV). It gets its name "Single Root" from the PCIe Root Complex. SR-IOV enables the physical network adapter to be shared directly with the VMs bypassing the hypervisor as shown below.

SR-IOV architecture offers two function types -

Physical Functions (PFs) - A NIC feature for configuring and managing SR-IOV functionality on the deivce. It is exploited by PF device driver that is part of the hypervisor.
Virtual Functions (VFs) - It is a PCIe function that is used by the respective VMs for communicating directly with the physical NIC. The hypervisor, using the PF device driver, assigns VFs to VMs and then VFs use native virtual function device drivers to directly communicate with the NIC.

So when a data packet is received by the NIC adapter, the classifier at SR-IOV capable NIC - as configured by the PF device driver - places it to the appropriate queue mapped to the appropriate virtual function and its target VM.

Question to ponder upon ...

Since SR-IOV lets a VM directly map to a PCI port bypassing the hypervisor, how does Amazon achieve Inter-EC2 network switching, implement Security Groups or ACLs?

Hint.

---------------------------------------------------------------------------------------------

Resources:

Amazon's docs, blogs & videos:

Independent blogs & videos:

Saturday, October 7, 2017

Drooling Over Docker #2 - Understanding Union File Systems

To understand the composition of Container images better, let us take a detour and learn about Union File Systems first.

File Systems and Mounting in Unix/GNU Linux

In Unix (and GNU Linux), everything is a file i.e. - apart from regular data files, even system devices are exposed through a file system name space, e.g. a hard disk can be seen as a file named sda in the directory for device files i.e. /dev and can be accessed through its absolute path being /dev/sda.

Even disk partitions are seen as device files within rootfs e.g. /dev/sda1 is the first primary partition of disk represented by /dev/sda1. Such a partition is further formatted with a file system driver (etx3, ext4 etc.) so that it can store files and directories within it.

Now, if we have a formatted (have a file system) partition /dev/sda1 that has some data stored on it in files organized within a few sub-directories and we want to read from or write to those files - we will have to attach this file system (on file /dev/sda1) to some directory in the logical file system tree. This process is known as mounting (to an existing directory) a file system.

#mount -t ext4 /dev/sda1 /mnt

Union File System - what is that!?

The keyword here is UNION as in SET THEORY.

If you experiment and mount two separate file systems at the same mount point, one after the other, you'll only get to see the files from the file system that was mounted last.

Try out these commands in your Linux terminal and see for yourself -

Unlike simple mount option, a union mount would provide a UNION of both the file systems mounted at the same mount point. -

After these examples, now consider that you have a read-only file system and you want to modify a certain file in there so that you can go ahead with your computing needs - on the lines of above mentioned example, Union File System can help us here. We can create another read-write file system either on disk or in RAM as the case may be, and mount both these file systems to another mount point using Union File System. Now, this mount point can give you access to all the files in both ro and rw file systems. In case, you want to modify any of the files residing on the ro file system, Union File System driver would search for that file and perform a CoW (Copy on Write) to make another copy of the file in rw file system that overrides the copy that exists on ro file system. This newly created copy is finally updated with the new contents. Any new files as part of software installation would also go in to the rw files system.

Please check Link1 and Link2 for some very good examples on the points discussed so far. I have also given these links in the resources section in the end of the article.

A use case for Union File System - Knoppix

Again, what if the file system we attempting to mount is read-only (e.g. from a CD ROM) and we intend to change its contents by editing/removing existing files or adding new files to the file system. Is it possible?

Let us understand another example from Knoppix - a Live CD version of Linux that could boot your machine and allow you to work on your system. You could change system settings or download additional software when the Knoppix OS was running in memory and even save these changes for subsequent runs.

The possibility of making Knoppix settings persistent allowed Knoppix to be a good portable desktop OS - all it required was a Live Knoppix CD and a USB drive. You could boot system through Knoppix and load changed settings from the USB drive and you could get the same desktop environment on any machine.

With the support from Union File Syems (Knoppix 3.9 brought in UnionFS and its later versions used aufs), Knoppix mounted multiple filesystems on top of each other ( in the logical space as mounting happens in logical space) i.e. here mounting rw RamDisk on the top of ro CD file system forming a UNION of both these file systems.

If you write to a file that is in the ro area, the aufs driver would copy it in the rw area and perform the write operation. When next time you access the file,it accesses the modified version of the file from the rw area (which hides the same named file in the ro area). You work on it transparently and aufs does the work for you - you keep working as it it was a writable system.

How Union File Systems help Docker Containers

Docker Containers bring you immutable (unchanging over time) software in the form of layers. During build time, you stack multiple such immutable layers of software to get the desired applications with their dependencies. Many a times there are software layers that override the functionality given by the lower layers in the stack. This is only possible by implementing a Union File System. Also, at run time, you download some additional software within the container that is the newer version of some software; such a case would trigger CoW and have the updated copy of it written in the rw layer of the software stack. Even any newly installed software would settle in this very rwtop layer of the stack. I'll discuss more about Docker Layers in the next chapter...

Contd...

---------------------------------------------------------------------------------------------------------

Thursday, October 5, 2017

Drooling Over Docker #1 - The Genesis of Containers

“I once heard that hypervisors are the living proof of operating system’s incompetence.” — Glauber Costa, LinuxCon Europe 2012

Effectively handling resource contention has been a primary challenge for Operating Systems.

Consider the following resource-allocation challenges:

One process demands more memory from the Operating System which in turn pages out (read punishes) the memory used by some other process.
It is possible that one application, running a group of related processes, gets better share of CPU cycles than an application running with lesser number of processes.
A process with bugs (or fork bombs) causing a denial-of-service attackthat leaves the kernel resources exhausted and thus impacting other unrelated processes.

Hypervisors, and Containers attempt to overcome these (and even some more) challenges by isolating process execution space & controlling the system resources in their own ways that we’ll explore in this Docker Seriesfurther; however, our focus here will remain on Containers.

Genesis of Containers

A Container, as we know it today (Docker, Rkt etc), is a group of processes running on the underlying Operating System in its own isolated process space. It can be configured to have its own share of system resources; has its own file-system that, though mounted on some directory of the file-system of the host Operating System, is seen as a separate file-system tree with its own root directory, by the processes running inside it. This lightweight execution unit has its own copy of software, shared libraries, and any other necessary files required to run the software it is prepared to run.

Running Containerized software could only become possible because of the development of some underlying technologies in recent years.

Three such major technologies are -

#1 — CGroups — Controlling What You Can USE

In 2006, a team of engineers at GOOGLE developed this Linux Kernel feature.
Cgroups (Control Groups) feature was eventually introduced in Linux Kernel version 2.6.24 — released in January 2008.
It allows to limit (apply quota), account for, and isolate the use of computing resources (CPU, memory, disk I/O, network, etc.).
Containers management software would make use of CGroups for effective resource control by Containers.

The following diagrams shows limited system resources (CPU, Memory, N/W, Storage, etc.) being distributed amongst separate Control Groups of processes.

The above picture is taken from Mairin Duffey’s Blog

#2 — Namespaces — Controlling What You Can SEE

Initial version released in 2002 with Linux Kernel version 2.4.19.
Functionality usable by Containers was added with Linux Kernel version 3.8.
Using Namespaces, it became possible to isolate and virtualize resources for processes e.g. separate process IDs, hostnames, user IDs, network access, IPCs, and filesystems.
Namespaces are a fundamental aspect of containers on Linux.
Namespaces would provide much needed process virtualization space to the Containers.

#3 — Union File Systems - Managing Software Image for Containers

A file system that allows a collection of different file systems and directories (called branches) to be transparently overlaid (does a UNION of) into a single logical file system.
Union File Systems show Containers a way to manage their software using a layered image and mounting them by doing a UNION of underlying file systems branches.

Note — This concept deserves a separate detailed description and I would cover it in a chapter of its own.

CGroups and Namespaces, when used together, provide an isolated environment within a Linux System where the running processes can only see the boot directory, from related processes, own user id, and related network interfaces (common NAMESPACE) & a control over CPU, memory, network and IO usage (common CGroup). These functionalities were combined with easy to manage command line interface under the name of LXC (Linux Containers).

Arrival of Docker

In 2013, Docker Inc. introduced Docker — a technology to create and run Docker Containers.

Docker makes use of CGroups, Namespaces, & Union File Systems to package an application and its dependencies to make a Docker image which can be run as a Container on a Linux Operating System — a much more portable and efficient method of running an application than running it on a Virtual Machine.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Musings On AWS, Munching Linux

Saturday, October 28, 2017

So Amazon Develops Its Own NIC - AWS Enhanced Networking!

How do I enable enhanced networking on EC2 instances

Refer Amazon documentation on enabling Enhanced Networking for EC2 instances.

The technology behind Enhanced Networking - SR-IOV

Saturday, October 7, 2017

Drooling Over Docker #2 - Understanding Union File Systems

Thursday, October 5, 2017

Drooling Over Docker #1 - The Genesis of Containers

Genesis of Containers

Arrival of Docker

Drooling Over Docker #4 — Installing Docker CE on Linux

Blog Archive

Search This Blog

Saturday, October 28, 2017

So Amazon Develops Its Own NIC - AWS Enhanced Networking!

How do I enable enhanced networking on EC2 instances Refer Amazon documentation on enabling Enhanced Networking for EC2 instances.

The technology behind Enhanced Networking - SR-IOV

Saturday, October 7, 2017

Drooling Over Docker #2 - Understanding Union File Systems

Thursday, October 5, 2017

Drooling Over Docker #1 - The Genesis of Containers

Genesis of Containers

Arrival of Docker

Drooling Over Docker #4 — Installing Docker CE on Linux

How do I enable enhanced networking on EC2 instances

Refer Amazon documentation on enabling Enhanced Networking for EC2 instances.