Zerotier & Mikrotik design concept

I’ve been happily using Zerotier for some of my own internal SD_WAN/VPN style connectivity for a while now and have found it to be a really effective solution that obviates many of the pain points of traditional VPN solutions. And I’m been a happy Mikrotik owner, specifically on the switching side of the house.

But when I discovered that there is a Zerotier package for the Mikrotik routers, that made me decide to do some more research about leveraging the fact that Zerotier isn’t just point-to-point connections between the participants, but that Zerotier clients can also be routers for complete IP subnets.

So a test project came up on the radar with a client that needs to replace an aging industrial IoT network infrastructure and they need robust equipment that can handle some (relatively) harsh conditions on some sites. They had been completely shocked by the prices quoted by their existing suppliers, so I looked at the Mikrotik offerings in this space and they seem to respond to the requirements.

The next step is how to manage and integrate this with their existing network. Historically, the remote sites are all internet-connected with mostly ADSL and the odd fiber connection here and there. Connectivity to the main sites is over IPSec VPN connections.

Basic overview :

  • Production site for centralized control and reporting

  • Disaster recovery site

  • 200+ remote sites with IPSec tunnels to the Production and DR sites

  • 1-10 devices on each site, generally low bandwidth

  • The primary routers will be staying (and don’t have any Zerotier integration)

New requirements

  • Failover connectivity on the remote sites (4G/5G)

Stage 1 - design concept

So with this in mind, I have the following theoretical design:

Stage 2 - Proof of concept - basic configuration

So here I’m scrounging around with the equipment I have on hand which is a little different from the eventual hardware solution, but not too much. I have an RB5009 which also runs RouterOS and is a little beefier than the anticipated L009UiGS-RM, but functionally pretty similar and both have ARM64 CPUs and can thus run the Zerotier package.

At the office I have my main fiber connection with firewall/routing handled by a pfSense box as well as a secondary ADSL line for failover. Both ISP boxes are set for bridging. So I’m going to use the ADSL line to simulate the remote site and use the office network as the production network along with a 4G failover connection. With this in mind, the basics are going to be:

Office network:

  • Network: 192.168.10.0/24

  • Gateway: 192.168.10.1

  • DNS:192.168.10.199

Remote Network

  • Network: 10.248.1.0/24

  • Gateway: 10.248.1.254

  • DNS: 1.1.1.1,8.8.8.8

Zerotier network:

  • Network: 10.249.0.0/16

Zerotier network

The first step involves declaring a new network on the Zerotier portal. The important pieces are :

  • Access control: private (all participants must be authorized manually)

  • IP Auto-assign: disabled (I want to manually set the IP for each participant)

Configure LTE box

In my case, I’ve got an LtAP mini LTE kit and a fresh new data service enabled SIM from Free.fr to work with. Initially, I was confronted with some errors about the modem firmware needing an update and noticed that it was running RouterOS 6.x so first order of affairs was to upgrade to 7.11. Then some time sorting out the basic configuration steps. There are a few confusing things to deal with here like selecting the SIM slot:

This should read upper/lower vs up/down as I was equating this with “bring the interface up” and “take the interface down” rather than referring to the two SIM slots stacked on top of each other. 🤷‍♂️

Then the basics of setting up the appropriate APN values and PIN to unlock the SIM and the onto moving the device into passthrough mode. Important note: once the device is in passthrough mode, that’s it as far as management goes unless you go to the bother of setting up a separate VLAN for management. Frankly I don’t see the point since once configured, it’s unlikely to change and if it does, resetting the device and updating the APN is a pretty lightweight operation (although I do need to think about the impact at scale).

In passthrough mode it’s just a dumb pipe that handles the LTE connection based on the last configuration it was given and hands off the IP given by the network to the first MAC address that makes a DHCP request on the ethernet port.

Mikrotik configuration

I have the RB5009 ready to connect to the ADSL box on the first ethernet port (but don’t connect it right away as it will be exposed to the internet with the default configuration), the 4G box ready to be connected to the second port and my computer connected to the last port.

The first steps involve installing the Zerotier package on the router. Download the version for the appropriate CPU in the router (part of the Extra packages under each CPU type), and in Winbox, upload it to the Files section and reboot the router. Once rebooted, verify that there is a Zerotier button in the left column.

With that in place, I reset the router completely back to zero with:


/system reset-configuration no-defaults=yes skip-backup=yes

Notes:

  • Winbox runs just fine in a Windows 11 VM using Parallels on a Mac, you just need to set the network to bridged

The initial connection is made using Winbox since the router reset has cleared it’s IP and you can only connect using its MAC address initially.

On any modern Mikrotik box, on the first connection it will ask you to reset the password. DO THIS!

Then I open a terminal window in Winbox for the rest of the configuration.

Much of the following basic configuration is pulled from the Mikrotik documentation, plus a number of forum posts concerning the various methods of setting up the dual-WAN failover with DHCP enabled WAN interfaces.

Then it’s just a matter of uploading and running the configuration script. Scripts can be run from the terminal with the “/import” command. Here’s a walk through of the script.

In order to make this more easily portable, I’ve set up a bunch of variables at the outset that will be used in the script, even if they’re only used once, there’s no reason to make any changes to the body of the script. I set the password as part of the script which allows me to integrate the script in the Reset Configuration and immediately run a script.

Most of the variables are pretty much self-explanatory. The routedSubnets are remote networks that will be declared in the Zerotier routing table for my remote sites. I’ve put this into an array to be able to easily add subnets as required.

The gateways are public DNS servers that will be used by the Mikrotik to check if a link is completely up. The first block are for the ECMP routing rules and the others for the recursive routing rules. These can be anything that is reliably pingable on the internet.

The zerotierNetworkID is the ID presented in the Zerotier portal as the unique ID for the network.


# set a complex password for the system

/user set admin password="<complicated password>"

:global localSubnet "10.248.1.0/24"

:global localIP "10.248.1.254"

:global localDhcpPool "10.248.1.1-10.248.1.99"

:global internalDNS "192.168.10.199,192.168.2.240"

:global zerotierSubnet "10.249.0.0/16"

:global zerotierNetworkID "<network id>"

# Array of remote subnets to be routed through the zerotier connection

:global routedSubnets { \

{ "192.168.10.0/24"; "office"}; \

{ "192.168.2.0/24";  "house"} \

}

:global wan1Gateway1 "208.67.220.220"; # OpenDNS

:global wan1Gateway2 "1.0.0.1";        # Cloudflare secondary

:global wan1GwRecursive1 "208.67.222.222";  # OpenDNS

:global wan1GwRecursive2 "9.9.9.9";        # Quad 9

:global wan2GwRecursive1 "94.140.14.14";    # AdGuard

:global wan2GwRecursive2 "149.112.112.112"; # Quad 9 Secondary

Then I create a bridge for all of the LAN interfaces (everything but the first two ports).


### Basic bridging setup ###

# Local bridge for internal LAN ports

/interface bridge add name=local

/interface bridge port add interface=ether3 bridge=local

/interface bridge port add interface=ether4 bridge=local

/interface bridge port add interface=ether5 bridge=local

/interface bridge port add interface=ether6 bridge=local

/interface bridge port add interface=ether7 bridge=local

/interface bridge port add interface=ether8 bridge=local

Some interface lists that can be used for various rulesets instead of address myself to individual ports.


### Interface lists ###

# Setup interface list for both WAN ports to use for firewall rules

/interface list add name=WAN

/interface list member add list=WAN interface=ether1

/interface list member add list=WAN interface=ether2

# Setup interface list for LAN ports to use for firewall rules

/interface list add name=LAN

/interface list member add list=LAN interface=local

Then configuration of the DNS servers used by the router itself. Note that I’m using ones that are not referenced in any of the recursive routing rules and that I’m using two separate providers (Cloudflare and Google).


# Setup DNS services, required for Zerotier to resolve servers

/ip dns set servers=1.1.1.1,8.8.8.8

Then just a few lines to setup the LAN IP configuration with DHCP.


# Configure internal LAN network subnet and DHCP

/ip address add address="$localIP/24" interface=local

/ip pool add name=lan_pool ranges=$localDhcpPool

/ip dhcp-server add address-pool=lan_pool disabled=no interface=local lease-time=1h name=dhcp_lan

/ip dhcp-server network add address=$localSubnet dns-server=$internalDNS gateway=$localIP

Next up are the basic firewall rules and the NAT enablement.


# Lock down access to the Winbox MAC access

tool mac-server set allowed-interface-list=LAN

tool mac-server mac-winbox set allowed-interface-list=LAN

/ip neighbor discovery-settings set discover-interface-list=LAN

# Basic firewall rules to lock down access from the WAN ports, but allow ping

/ip firewall filter add chain=input connection-state=established,related action=accept comment="accept established,related";

/ip firewall filter add chain=input connection-state=invalid action=drop;

/ip firewall filter add chain=input in-interface-list=WAN protocol=icmp action=accept comment="allow ICMP";

/ip firewall filter add chain=input in-interface-list=WAN action=drop comment="block everything else";

# NAT configuration and security

/ip firewall nat add chain=srcnat out-interface-list=WAN action=masquerade

/ip firewall filter add chain=forward action=fasttrack-connection connection-state=established,related comment="fast-track for established,related";

/ip firewall filter add chain=forward action=accept connection-state=established,related comment="accept established,related";

/ip firewall filter add chain=forward action=drop connection-state=invalid

/ip firewall filter add chain=forward action=drop connection-state=new connection-nat-state=!dstnat in-interface-list=WAN comment="drop access to clients behind NAT from WAN"

The next part is a little more complex and is put in place to ensure that if a WAN DHCP connection is reset or assigned a new IP with a different gateway, there’s a script that will run to ensure that the appropriate gateway entries are updated. It also resets the state table of current NAT connections.


# Configure DHCP on WAN interfaces with script to auto-update the router as required on renewals

/ip dhcp-client add interface=ether1 add-default-route=no script=":if (\$bound=1) do={\r\

    \n    /ip/route/set [find where comment=\"ISP1\"] gateway=\$\"gateway-address\"\r\

    \n}\r\

    \n\r\

    \n/ip/firewall/connection/remove [find connection-mark=\"ISP1_conn\"]\r\

    \n/ip/firewall/connection/remove [find connection-mark=\"ISP2_conn\"]\r\

    \n" use-peer-dns=no use-peer-ntp=no

/ip dhcp-client add interface=ether2 add-default-route=no script=":if (\$bound=1) do={\r\

    \n    /ip/route/set [find where comment=\"ISP2\"] gateway=\$\"gateway-address\"\r\

    \n}\r\

    \n\r\

    \n/ip/firewall/connection/remove [find connection-mark=\"ISP1_conn\"]\r\

    \n/ip/firewall/connection/remove [find connection-mark=\"ISP2_conn\"]" use-peer-dns=no use-peer-ntp=no

Then the creation of the required routing tables for the two WAN connections. Note that this process is different between RouterOS 6 and 7 which led to a lot of confusion when trying to get this to work.


# Create routing tables for each WAN interface

/routing table add fib name=to_ISP1

/routing table add fib name=to_ISP2

Then onto all of the recursive route configuration that enables the intelligent failover.


/ip route

# recursive routes for ECMP default gateways, dst-address are public DNS servers

add distance=1 dst-address="$wan1Gateway1/32" gateway=ether1 scope=10 target-scope=10 comment=ISP1

add distance=1 dst-address="$wan1Gateway2/32" gateway=ether2 scope=10 target-scope=10 comment=ISP2

# ECMP default gateways

add check-gateway=ping distance=1 dst-address=0.0.0.0/0 gateway=$wan1Gateway1 scope=10 target-scope=11

add check-gateway=ping distance=1 dst-address=0.0.0.0/0 gateway=$wan1Gateway2 scope=10 target-scope=11

# recursive routes for default gateways, dst-address are public DNS servers

add dst-address="$wan1GwRecursive1/32" gateway=ether1 scope=10 comment="ISP1"

add dst-address="$wan1GwRecursive2/32" gateway=ether1 scope=10 comment="ISP1"

add dst-address="$wan2GwRecursive1/32" gateway=ether2 scope=10 comment="ISP2"

add dst-address="$wan2GwRecursive2/32" gateway=ether2 scope=10 comment="ISP2"

# load-balanced w/ auto failover default gateways

add check-gateway=ping distance=1 dst-address=0.0.0.0/0 gateway=$wan1GwRecursive1 routing-table=to_ISP1 scope=10 target-scope=11

add check-gateway=ping distance=2 dst-address=0.0.0.0/0 gateway=$wan1GwRecursive2 routing-table=to_ISP1 scope=10 target-scope=11

add check-gateway=ping distance=1 dst-address=0.0.0.0/0 gateway=$wan2GwRecursive1 routing-table=to_ISP2 scope=10 target-scope=11

add check-gateway=ping distance=2 dst-address=0.0.0.0/0 gateway=$wan2GwRecursive2 routing-table=to_ISP2 scope=10 target-scope=11

Then the creation of some lists that I’ll be using in the firewall and routing rules. Here I leverage the array of subnets and associated names to loop over them.


/ip firewall address-list add address=$localSubnet list=local

/ip firewall address-list add address=$zerotierSubnet list=zerotier 

:foreach net in=$routedSubnets do={

  /ip firewall address-list add address=($net->0) list=($net->1);

}

Then we leverage the RouterOS Mangle rules which allow us to mark packets according to the routing rules that we want to apply. Again, I leverage the subnet array to tag packets to that destination for immediate acceptance.


### Firewall Mangle Rules ###

/ip firewall mangle

add action=accept chain=prerouting comment="bridge access" dst-address-list=local in-interface-list=LAN

# WAN to LAN

add action=mark-connection chain=prerouting connection-mark=no-mark connection-state=established,related in-interface=ether1 new-connection-mark=ISP1_conn passthrough=yes

add action=mark-connection chain=prerouting connection-mark=no-mark connection-state=established,related in-interface=ether2 new-connection-mark=ISP2_conn passthrough=yes

    

### PCC mangles ###

# allow direct routes for internal routes

add action=accept chain=prerouting dst-address-list=zerotier

add action=accept chain=prerouting dst-address-list=ztrouted

:foreach net in=$routedSubnets do={

  add action=accept chain=prerouting dst-address-list=($net->1);

}

Then the NAT rules to route between the two WAN interfaces.


# NAT marks for traffic destined for internet 

add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-list=!local dst-address-type=!local in-interface-list=LAN new-connection-mark=ISP1_conn passthrough=yes per-connection-classifier=both-addresses-and-ports:2/0

add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-list=!local dst-address-type=!local in-interface-list=LAN new-connection-mark=ISP2_conn passthrough=yes per-connection-classifier=both-addresses-and-ports:2/1

    

add action=mark-routing chain=prerouting connection-mark=ISP1_conn in-interface-list=LAN new-routing-mark=to_ISP1 passthrough=yes

add action=mark-routing chain=prerouting connection-mark=ISP2_conn in-interface-list=LAN new-routing-mark=to_ISP2 passthrough=yes

add action=mark-routing chain=output connection-mark=ISP1_conn dst-address-list=!local new-routing-mark=to_ISP1 passthrough=yes

add action=mark-routing chain=output connection-mark=ISP2_conn dst-address-list=!local new-routing-mark=to_ISP2 passthrough=yes

Finally we get to the Zerotier stage which (after everything else) is trivial. We enable the service add a network connection, wait a couple of seconds and add the associated firewall rules.


/zerotier/enable zt1

/zerotier/interface/add network= instance=zt1

:delay 4000ms;

/ip firewall filter add action=accept chain=forward in-interface=zerotier1 place-before=0

/ip firewall filter add action=accept chain=input in-interface=zerotier1 place-before=0

At this stage, I have a fully functional router with dual-WAN failover ready to be integrated in the rest of my network.

Stage 3 - Mikrotik Zerotier integration

The final local steps are to approve this router to participate in the Zerotier network and setup the required static routes.

At this stage we have the Zerotier client on the router requesting access to the network and being denied (the network is private).

Back to the Zerotier portalwhere I have an entry waiting for approval:

Click on the authorize checkbox, manually assign an IP (Zerotier can do this automatically, but I’m doing this manually to more easily keep track of the association between Zerotier and the remote subnets).

Then I need to explain to the other Zerotier network members that this is the IP to use for routing towards the local subnet of 10.248.1.0/24. This is done at the top of the page under the section named “Managed Routes”.

Stage 4 - Central site Zerotier integration

This part is a little more complicated due to the fact that my main router is not a Mikrotik, but a pfSense. Here I need a router with Zerotier installed and joined to the network, plus some static routes from the existing router.

I tried installing RouterOS on a physical server and using the x86 Zerotier container but without much luck. For some reason the container downloads from the registry were incredibly slow and the container never started. In this case, there’s no real advantage to having a full RouterOS install for this part so I simply installed a bare bones Ubuntu 22.04 Server VM with a fixed IP.

The configuration steps are trivial:


sudo snap install zerotier

sudo zerotier join <network id>

sudo vi /etc/sysctl.conf

	# uncomment net.ipv4.ip_forward=1

	# uncomment net.ipv6.conf.all.forwarding=1

sudo sysctl -p

Then back to the Zerotier portal to authorize this device and assign it an IP. I used an IP outside of the range expected for the remote sites, 10.249.1.1 in this case. Then I need to add an additional static route using this IP as the route for the subnet 192.168.10.0/24.

At this stage, we have defined half of the required routing required. Devices from the remote site now have the possibility of sending packets to the office network:

  • 10.248.1.x -> 10.248.1.254 -> 10.249.0.1 -> 10.249.1.1 -> 192.168.10.0/24

However there’s no return route. For this I need to add a static route using the Linux machine as the gateway for the 10.248.0.0/16 and 10.249.0.0/16 on my router. I could simply create static routes on the machines in this network if there aren’t too many, but it’s cleaner to do this on the existing router.

Depending on your existing router this may be simple or not. RTFM. But for pfSense you need to do the following:

Enable System > Advanced >Firewall & NAT > Static route filtering: Bypass firewall rules for traffic on the same interface

Firewall > Rules > Floating > Add

  • Pass, Quick, Interface LAN, Direction any, Address family IPv4, Protocol Any

  • Source LAN net, Destination any

System > Routing > Gateways : Create a gateway on the LAN interface with the IP address of the Linux box

System > Routing > Static Routes : Create a static route for each of the /16 subnets pointing to the Linux box as the gateway

With that in place:


traceroute 10.248.1.99

traceroute to 10.248.1.99 (10.248.1.99), 64 hops max, 52 byte packets

 1  gwoffice (192.168.10.1)  4.834 ms  0.365 ms  0.322 ms

 2  192.168.10.254 (192.168.10.254)  2.520 ms  0.783 ms  0.538 ms

 3  10.249.0.1 (10.249.0.1)  124.529 ms  131.270 ms  100.607 ms

 4  10.248.1.99 (10.248.1.99)  39.097 ms  40.377 ms  37.542 ms

[]

Final thoughts

At this point I have a fully functional solution that is looking pretty good. In fact I’m seriously thinking about reworking my internal networking connections to use this architecture and replace a couple of pfSense boxes with Mikrotik routers. I just need to dig into how they work with OpenVPN site-to-site connections so I can handle the connectivity to sites where I can’t replace the pfSense boxes.

I’m still not convinced by the x86 RouterOS bare metal setup, but I’m going to see if the CHR behaves better with the Docker registry. But frankly for the needs of the central routing pivot machine, adding the whole RouterOS stack is overkill.

Other issues are that while poking around the Zerotier portal, I noticed that there seems to be a limit of 128 static routes per network, so I may need to multiply the number of networks to distribute the 200+ sites, but that’s just an implementation detail (one central VM with connections to multiple networks, or multiple VMs?).

I’ve posted the complete script as a gist on Github. If you update the variables to match your environment, upload the script and run it as part of a configuration reset it should handle everything.