Welcome Back!

Update

This information doesn’t work on the new 1.1.0+ Agents. You will need to checkout 1.0.2 if you want to use this.

Last time I was trying to get my Odroid’s, and Raspberry Pi’s working as a Docker Swarm built on the newly integrated tools baked into Docker 1.12. That wasn’t working quite right as I explained in that post, primarily with the overlay network. The Odroid’s couldn’t quite route traffic right between all the hosts. So, I once again attempted Kubernetes, and ended up running into the exact same problems as before. Primarily, though I could get it installed and deploy containers among all the boxes, the DNS would cease to function after a reboot and it doesn’t survive a reboot of the Master node very well. I do like the interface for Kubernetes, but it really needs to make Multi-Master configurations easier to setup, and I don’t like how many changes to the underlying host OS it requires to run properly.

I then found Rancher and saw that there has been a lot of work into getting Rancher to work on Arm. I also found this blog post at Withinboredom that helped explain how to use Imikushin’s git repositories to fix the Rancher source. Rancher requires no host modifications, running entirely within docker images, and while Muli-Master setup isn’t necessarily “click-finish” it does seem largely aided by it’s web interface. This seemed like an excellent idea, as in addition to my Pi’s and my Odroid’s I have an Asus Chromebox running my minecraft server (too resource intensive for the Arm systems) that would be nice to include in my cluster.

Therefore, I grabbed the official rancher/server image and ran it on my Chromebox and proceeded to try and register the arm devices as Agents. However, his pre-built images didn’t work for me, and even after compiling my own agent’s using his instructions, as well as a couple fixes in the comment section, I could register my agents but the network components didn’t work.

Very long story short, Rancher is setup in such a way that the Agent’s get information about where to download their binaries from on startup from the Server, and in this environment, my Chromebox AMD64 system is acting as my server so it was sending the agents to grab x86 binaries instead of Arm.

After several days of trying to “trick” the Agents into automatically grabbing the right files hosted at Imikushin’s various git repositories, setting environment variables, even trying to give up and build an Arm Rancher Server (didn’t work), and several more days trying to compile cattle.jar myself with the multi-arch fixes I found on git, I finally figured out that Imikushin has released cattle.jar builds that contain the multi-arch configuration fixes and just need to be placed into the Server container and bounce.

So, I am going to try and outline everything necessary to get a truly heterogeneous network configured running on Rancher, which has become my new favorite orchestration tool for Docker. My environment now consist of an Asus Chromebox (amd64), 2xRPi 3’s, 2xOdroid-C1’s and an Odroid-XU4, all of these agents configured with a label to distinguish their architecture for container deployment. I also still intend to add my NanoPC-T1 back into this mix as well, if I can get a decent kernel on it.

Ultimately, the goal here will be to turn this into a Multi-Server situation with, probably, the XU4 and one of the C1’s or a Pi 3 acting as additional servers for true HA. Unfortunately, I am currently unable to get the Arm Rancher Server to build properly, errors with “install_cattle_binaries” that I am working through. Hopefully either Rancher/Imikushin get these commits live and official releases soon, or I can figure out how to get #4704 to build properly.

Setting up the devices

To begin our adventure we must do what we must always do to work on any computer system. Install it. My current configuration is as follows:

Update: I have removed RancherOS and just installed Arch Linux on my Raspberry Pi’s.

I will not go into detail installing all of the above, we will leave that as an exercise for the reader. You could also use HypriotOS if you like for all of those (except the Chromebox). Now, the Raspberry Pi’s and the Chromebox have access to kernels >4 which is useful. For the Odroid-C1 and the Odroid-XU4 I suggest compiling your own kernel and adding in some useful kernel modules. This isn’t overly difficult, but is time consuming (>2 hours) if you compile directly on the device. If you follow the below, I’d recommend doing it as a regular user, but you will need to install the base-devel group for Arch Linux, it also helps to have sudo installed, and git. If you want to build your kernel using an alternate method, go for it. It is possible to compile kernels >4 for the XU4, but we won’t need to.

For the C1:

git clone https://github.com/archlinuxarm/PKGBUILDs.git archlinuxarm
cd archlinuxarm/core/linux-odroid-c1/
vi config # use any editor you like, vi, nano, emacs, etc.
makepkg -si

The configuration items I added were:

CONFIG_XFS_FS=m
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_NET_CLS_CGROUP=y
CONFIG_IP_VS=y
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
CONFIG_IPVLAN=y
CONFIG_DUMMY=y
CONFIG_AUFS_FS=y
CONFIG_OVERLAY_FS=y`

During the makepkg it’s going to ask you a whole host of questions for related options that opened by enabling these. Use best judgement, I mostly accepted defaults. Follow the same procedure above for the XU4, however the directory is: archlinuxarm/core/linux-odroid-xu3. Now that our kernel has all the Docker items we might need, we need to install docker on all of the Arch Linux devices (Odroid’s and Chromebox), the devices with RancherOS will have Docker installed already.

pacman -Sy docker
systemctl enable docker

That’s it, at least for the core operating environment. All of our devices should now be running an OS with Docker installed.

Setting up Rancher Server

Ok, next step will be to get the Rancher Server running and set it up so that it knows how to handle both x86 and Arm devices. Login to your X86-based system, in my case the chromebox:

docker run -d -e CATTLE_USE_LOCAL_ARTIFACTS=false --restart=always -p 8080:8080 \ 
	--name rancher-server rancher/server
curl -sSL https://github.com/imikushin/cattle/releases/download/v0.164.1-multiarch/cattle.jar \
	> cattle.jar
docker cp cattle.jar rancher-server:/usr/share/cattle/cattle.jar
docker stop rancher-server
docker start rancher-server

Note: CATTLE_USE_LOCAL_ARTIFACTS is necessary, otherwise you can only get x86, or Arm devices to network properly by manually downloading tar files.

This will download and install the v0.164.1-multiarch release of cattle.jar from Imikushin’s git repo as that is the latest release at the time of this writing. That’s it for the Server, since there is the pre-built image for it it’s quite easy. We should be able to access the UI now at http://chromebox:8080 using whatever name or address you have for your x86 device. Once you can access that page, you click “Add Host”, and it will ask you some question as well as allow you add a label for agents. I added “architecture=arm” and “architecture=amd64” to distinguish between my devices. This creates a little docker run command that you can use later once we have the Agents built and installed on the Arm devices.

Setting up the Rancher Arm Agent

In this step, we will steal a lot from Rob Landers @ Withinboredom. Login to one of our Arm devices. In my case I chose the Odroid-XU4 as it’s the most powerful, but these steps can be done on any of them and only need to be done on one, not all.

git clone https://github.com/imikushin/s6-builder
cd s6-builder
touch .docker-env.arm
./build.arm.sh
docker run -it rancher/s6-builder:v2.2.4.3_arm /opt/build.sh
cd ../
git clone https://github.com/rancher/agent-instance.git
cd agent-instance/
git remote add imikushin https://github.com/imikushin/agent-instance.git
git checkout v0.8.3
git fetch imikushin
git merge imikushin/arm --no-ff
./build-image.sh
cd ..
git clone https://github.com/rancher/rancher.git
cd rancher/agent
git remote add imikushin https://github.com/imikushin/rancher.git
git fetch imikushin
git merge imikushin/multiarch-hosts
vi run.sh #use any editor you prefer`

Note: This is the step that no longer works. You will need to checkout version 1.0.2

Now we need to edit this run.sh file and make 2 changes. We need to find where verify_docker_client_server_version is called. Not where it is defined, which is the first search result you’ll come to if you search that name, but where it’s actually called and comment it out by adding a # to the beginning, in my file it was line 453. Then find where var_lib_docker is set to resolve_var_lib_docker and instead change it to be “local var_lib_docker=/var/lib/docker”, without quotes, and we can continue on.

./build-image.sh
docker save rancher/agent-instance_arm:v0.8.3 > agent_instance_arm.tar
docker save rancher/agent_arm:v1.0.2 > agent_arm.tar

If all went well to this point, you now have docker images saved in your current directory called agent.tar and agent_instance.tar that you can copy to all of your Arm devices. You can install them on each of those systems using docker. Note: Obviously, if you have your own Docker git repository for images you can commit them there instead of doing the manual way.

docker load < agent.tar
docker load < agent_instance.tar

Setting up the Rancher x86 Agents

Note: This may or may not be necessary. I had some issues rebooting my x86 server and rebuilt the agents but it may have been a different problem.

We need to also rebuild the Rancher x86 Agents to be multi-arch aware as well otherwise they won’t survive a reboot. It is very similar to building the arm agents above but much easier:

git clone https://github.com/rancher/rancher.git
cd rancher/agent
git remote add imikushin https://github.com/imikushin/rancher.git
git fetch imikushin
git merge imikushin/multiarch-hosts
./build-image.sh
docker save rancher/agent-instance:v0.8.3 > agent_instance.tar
docker save rancher/agent:v1.0.2 > agent.tar

Now, we should be able to start up the agents using the command provided from the server and you will see them start populating in the Hosts section of the Rancher UI.

sudo docker run -e CATTLE_HOST_LABELS='architecture=arm' -d --privileged \
-v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/rancher:/var/lib/rancher \
rancher/agent:v1.0.2 http://chromeip:8080/v1/scripts/UNIQUE_CODE`

Voila! You should now have a working Muli-architecture environment. When you deploy services, just be sure to go to the scheduling tab and define whether the container should deploy to the arm or amd64 architecture.

Notes:

  1. I have not attempted any other environment besides the default cattle. Kubernetes or Swarm may not work.
  2. In both Swarm and Kubernetes, all nodes will respond on any exposed port for a service, this seems not to be the case with Cattle. Therefore, a Load Balancer of some kind is necessary that will monitor all the devices in case a service gets moved during an upgrade or a host goes down.
  3. I have not erased and started over my cluster from scratch to test the above steps in this order. I did a lot of troubleshooting across a few weeks and I may have missed a step. If you get stuck, feel free to comment and I will assist, or join my IRC server (irc.linuxniche.net room: #chat) and ask.
  4. Internal DNS is finicky. I’ve had a service move to a different server and get a new ip address and other services that connected to it via DNS didn’t see the new address until I bounced them. I have not looked too carefully into this to see if it was just an abnormality.
  5. cAdvisor still does not work in the Rancher Agents and I’m not sure why. I have validated it downloads the correct binary and the binary runs inside the container but it keeps trying to call it from a “null/.cadvisor” location which is incorrect. Somewhere in the scripts it is copied to .cadvisor but I think a variable is missing.
  6. By default, if you have AUFS support in your kernel, Docker is going to use that. I have had several instances of containers not shutting down and hanging see here. I recommend, if you are on an older kernel with the Odroid’s, to not use AUFS storage driver and switch to overlay instead.

Update:

  1. I have since installed and deployed the built in Rancher Load Balancer, because it runs on Agent-Instance that we built above, it works on both the Arm and x86 devices. However, the x86 cannot seem to route to the Arm devices and vice-versa. Something is weird with the overlay network.

I hope this helps some others out there. There really is slim pickings for those of us wanting to run our own development clusters on inexpensive Arm hardware. While everything “sort” of works, nothing works perfect, however this is the best I’ve found so far.

You are reading this blog off this Rancher cluster.

Enjoy!