Short Note: docker-compose and Tailscale connectivity issues

TL;DR
The article discusses connectivity issues between Docker Compose and services exposed via Tailscale after the OS package updates. After troubleshooting, the issue was identified as Docker Compose not running container in bridged mode. The solution involved modifying the docker-compose.yml file and using the docker compose command.

I had a very old Caddy service setup using docker-compose that was "just working" for years, and after the weekly GH Actions Ansible-driven update of all apps (including docker / docker-compose / tailscale) I've noticed all sorts of alerts popping up since Caddy couldn't access services exposed via Tailscale on other nodes.

Apparently couple of things changed:

  1. Tailscale was updated

  2. Docker was updated

  3. Docker Compose was updated

Now, the first reaction was to restart docker on all nodes, but that helped fix other services (probably some kind of breaking change), but no amount of restarting didn't help to recover the Caddy.

It was a very simple docker compose setup, just this:

version: '3'

services:
  caddy:
    image: milanaleksic/caddy-cloudflare:2.7.6
    ports:
      - 80:80
      - 443:443
    volumes:
      - ./data/caddy-config:/config
      - ./data/caddy-data:/data
      - ./config:/etc/caddy

And Caddy was reporting bunch of (for each attempt to access URLs which were proxied into somewhere internally within the Tailscale network):

... dial tcp 100.85.131.92:22487: i/o timeout ...

What made this issue extremely interesting is that I had to try various things until I have figured out what is going on:

From this node a normal curl 100.85.131.92:22487 just worked.

When I start a simple docker container it also just worked.

Even when I manually start the docker image from above: milanaleksic/caddy-cloudflare:2.7.6 also curl just worked!

I thought I was going crazy but then I had to get the big guns and run diff analysis of the outputs of docker inspect container1 and docker inspect container2 commands (where the 2 containers were the one that I started manually vs the one compose started). And the problem exposed it self: the docker-compose didn't run network in bridged mode. That was the difference between manually started container and the one started by docker-compose.

Solution

What I ended up doing was changing the docker-compose.yml to:

- version: '3'
- 
services:
  caddy:
    image: milanaleksic/caddy-cloudflare:2.7.6
+     networks:
+       - caddy
    ports:
      - 80:80
      - 443:443
    volumes:
      - ./data/caddy-config:/config
      - ./data/caddy-data:/data
      - ./config:/etc/caddy
+ 
+ networks:
+   caddy:
+     driver: bridge

To make this new docker-compose.yml file work I had to actually stop using docker-compose to run it since it was deprecated for a while now and just ran the docker compose command (the former was still written in Python, and the feature was migrated into Go CLI docker command).

💣, it just works now!