September 15, 2020

Replacing OpenWrt by solyste

For the last past years, I have been using OpenWrt on a TP-Link WDR3600 in a secondary home. It allows me to connect an entire room to the ISP box downstairs using Wifi, and in that room I can rely on cables. It is much more simple than having to setup Wifi on all machines. Sometimes, it can't be done (OpenBSD). The WDR3600 had an unusual uptime: 1502 days. It does more than four years. Even if it was quite stable, I wanted to reinstall it especially because the userland was really old. I am pretty sure the busybox version (here 1.19.4) was vulnerable.

openwrt_wdr3600

After working on my exciting project called solyste and creating a system the way I want it to be, going back to OpenWrt would be quite painful, IMHO. So, I finally decided to install solyste on my Turris Omnia, in order to replace OpenWrt on the WDR3600. Omnia hardware specifications are better and I know the platform in detail (from U-Boot to architecture).

When I started working on solyste, many people said the ideas were interesting but unfortunately, they also said it was just a concept. Since then, I attempted to improve its resilience to set up solyste on specific and critical environments. These days, it powers some highly critical gateways granting access to iDRAC or iLO networks and other servers managed by IPMI. It is not a concept anymore. In some specific aspects, we also do better than the other (64-bit kernels instead of 32-bit, 64-bit userland instead of 32-bit, improved security with read-only partitions, random default root password).

Here, the requirements are not so drastic, I only need Wifi, IP forwarding, DHCP and a firewall. I will be able to remotely backup my data on that system, using OpenSSH and remote forwarding. The backup filesystem will be encrypted with LUKS. In addition to daemons you can find on a similar router setup, I will activate Unbound for name resolution and OpenNTPD for NTP server. Here the file build.conf:

SOL_NOINTERACTIVE=YES
SOL_OTHERPKGS="openntpd vnstat curl unbound monit cryptsetup libnl wpa_supplicant dnsmasq"
SOL_BUILD_SSHCLIENT=YES
SOL_HOSTNAME="omnia.solysterocks"

The configuration is done. It's time to launch the entire solyste system build:

$ sh build.sh -c build.conf omnia
$ ./scripts/compress-fs omnia

You could ask why I launch ./scripts/compress-fs at the end and you would be right. I now only use SquashFS filesystem for the root partition. Even if it is sometimes less practical, it improves a LOT the system reliability (power outage, filesystem corruption, data loss, malicious access). You just need to be careful before deploying the system. Why do I like it SquashFS so much ? It has the following advantages:

In addition, I like the idea of having a "brand new" system after every reboot. If it was running fine just after the installation, you can be pretty sure it will behave exactly the same after the boot. When I talk about compression ratio, here the output to compare. solfs-omnia/ is the entire system, solfs-omnia.sqsh is the same system compressed with SquashFS and gzip :

$ du -sh solfs-omnia/ solfs-omnia.sqsh
22M     solfs-omnia/
13M     solfs-omnia.sqsh

Then, I had to copy this solfs-omnia.sqsh to an USB key, before switching to Omnia rescue system. The eMMC was partitioned like this :

$ fdisk -l
Disk /dev/mmcblk0: 7456 MB, 7818182656 bytes, 15269888 sectors
238592 cylinders, 4 heads, 16 sectors/track
Units: sectors of 1 * 512 = 512 bytes

Device       Boot StartCHS    EndCHS        StartLBA     EndLBA    Sectors  Size Id Type
/dev/mmcblk0p1 *  32,0,1      31,3,16           2048    1050623    1048576  512M 83 Linux
/dev/mmcblk0p2    32,0,1      31,3,16        1050624    3147775    2097152 1024M 83 Linux

The first partition will be /boot in Ext4 and the second partition will be the root SquashFS filesystem. As U-Boot on Turris Omnia supports Ext4 and FAT32, I could have also used FAT32 for the first partition. The resulting solyste SquashFS file was simply "copied" to /dev/mmcblk0p2 with cat.

The previous U-Boot variable bootargs was the following, when the entire system was formatted in Ext4 :

bootargs rootfstype=ext4 root=/dev/mmcblk0p1 rootdelay=2 rw console=ttyS0,115200 quiet

Therefore, it was changed to :

bootargs=rootfstype=squashfs root=/dev/mmcblk0p2 rootdelay=2 console=ttyS0,115200 quiet

Previously, I was activating OverlayFS for /etc and /root. I could save my specific configurations in a dedicated partitions and then, mount it over the appropriate directories. Eventually, it became harder to manage so I dropped this idea, after applying it for years. Configurations are now directly applied to the tree after the build. Here the previous OverlayFS fstab :

$ cat /etc/fstab
LABEL=USERCONF   /conf   ext4      defaults,ro                  0 2
# Overlay directories
overlay          /etc    overlay   lowerdir=/conf/etc:/etc      0 0
overlay          /root   overlay   lowerdir=/conf/root:/root    0 0

The first boot was successful. I only needed to tweak one or two files prior being deployed in production. It is perfectly stable. After almost one month, here the output :

$ yss
HOST:   omnia.solysterocks
DATE:   2020-09-12 06:41:59
OS:     Linux
KERNEL: 5.4.58-SOLYSTE
UPTIME: 24 days
LOAD:   0.00 0.00 0.00
RAM:    [*...................] 57(*)/24(o)/1006 MiB [5%(*)/2%(o)]
$ perpls -G
[+ +++ +++]  crond           uptime: 2138077s/2138077s  pids: 224/223
[+ +++ +++]  dnsmasq         uptime: 2138077s/2138077s  pids: 226/225
[+ +++ ---]  getty-ttyS0     uptime: 2138077s/-s  pids: 227/-
[+ +++ ---]  klogd           uptime: 2138077s/-s  pids: 228/-
[+ +++ +++]  monit           uptime: 2138077s/2138077s  pids: 230/229
[+ +++ +++]  openntpd        uptime: 2138077s/2138077s  pids: 232/231
[+ +++ +++]  sshd            uptime: 2138077s/2138077s  pids: 234/233
[+ +++ ---]  syslogd         uptime: 2138077s/-s  pids: 235/-
[+ +++ +++]  unbound         uptime: 2138077s/2138077s  pids: 238/237
[+ +++ +++]  vnstatd         uptime: 2138077s/2138077s  pids: 241/240
[+ +++ ---]  watchdog        uptime: 2138077s/-s  pids: 242/-
[+ +++ +++]  wpa_supplicant  uptime: 2138077s/2138077s  pids: 247/245

Actually, writing the firewall and testing it extensively, took more than two days which is a lot compared to the actual solyste build (~12 minutes). But that's another story…

To get started with solyste, visit the following page.