Milan's blog

Short Note: control diffuser via Stream Deck

Milan Aleksić — Sat, 10 Feb 2024 23:00:00 GMT

What and why?

Well, it took me forever to figure it out, but apparently during the winter months I have problems with my nose not because I became super-sensitive as I age, but simply because humidity at my home office is not as good as it is was in the "real" office.

I, therefore, bought a simple smart diffuser by a company "Gologi" which looks cool I guess and fits well into my office. So, basically, how do I turn it on/off? No fancy scheduling required, mind you, just a small on/off switch on my stream deck (I don't want to use their smartphone app).

TL;DR

You need to:

rewire this IoT device into Tuya cloud via Tuya smart app to get the remote access capability
connect Home Assistant with Tuya smart app (not via the Tuya cloud API, that's enterprise-level pricing)
add Home Assistant control buttons to Stream Deck
profit!

Step by step

Tuya smart app

Gologi has a smart app already that can control the humidifer, but if you use that one you can't actually do anything in regards to the cloud-driven control.

You need to install another smartphone app: Tuya Smart App. Then, you need to go through the "add the device" flow to be able to get control over it from the Tuya app.

Interestingly, even after I have uninstalled the Gologi app I got the same interface on my smartphone to handle the Gologi humidifier from within Tuya 💣 . This brings me to the conclusion that Gologi is just one of the brands that built their IoT integration on the Tuya cloud and that's why it was so easy to add it into the Tuya app.

Now, do you see the "Stop diffuser" and "Start diffuser" buttons on the top? Those are there since I have added Tuya "Scenes" that trigger On/Off buttons of the diffuser. These will become important momentarily, since they will allow the actions from the Home Assistant even though HA doesn't support this type of Tuya devices, so you better add them:

Open Tuya App
Go to "Scene"
Add a "Tap-to-run" action
Choose "Control Single Device" -> Smart diffuser -> "Switch" -> "ON"
Do the same thing for "OFF" switch

Home Assistant integration

You don't necessarily have to use HA, but it's the thing I have found is so ubiquitous these days that you might be probably missing a lot if you don't have it in your home network and still want to play around with IoT devices. YMMV if you are using some other hub.

Within Home Assistant you just need to add Tuya integration. That's it.

Important: If your HA is older than 2024.0 please upgrade, since earlier versions demanded more complex setup using Tuya Cloud and developer accounts; newer version integrates on a higher level without any need to go into the cloud - it just uses the Tuya Smart app QR code scanner to integrate, easy-peasy!Scene

Scene after adding the Tuya integration shows "2 entities" which are the 2 scenes we've added in the Tuya smart app!

Stream Deck setup

Stream Deck is amazing since, even though it was meant for the streamers (obviously) it's an extremely hack-friendly device and can execute anything you can think of. In this case I want to have a panel setup like shown on the screenshot below, one for ON and one for OFF button.

Some important parts:

you need to install official HA plugin for the stream deck
you need to provide a "long-lived access token" from HA and expose (if you haven't already) the home assistant websocket URL
both Keypad Appearance and the Keypad Action need to be selected and configured, otherwise button will do nothing

Note: in my case the HA is behind Tailscale, but since Stream Deck talks to the server software within your connected computer, and then that software talks to the HA, you do not need to expose HA to Stream Deck via a public API, you can just use internal IP

And the "Short press" Keypad action is configured like this:

Conclusion

This was kind of a multi-day exercise to figure out what works and what doesn't work. There are, like with any integration, many moving pieces, expecting no breaking API changes on Tuya Cloud (cloud service... I hope it remains stable), smartphone app, HA plugin, HA itself... but that's just normal day in IT I guess.

It's amazing how little thing like Stream Deck button helps removing context-switching tasks out of your daily life. I'm very happy with my Stream Deck and are continuously thinking "what else can I use it for" and definitely would recommend it for your homelab / office setup since it is very extensible tool that is super useful.

When Nomad misses a (heart)beat

Milan Aleksić — Sun, 26 Mar 2023 16:58:53 GMT

In my homelab I have a hybrid setup (nodes both in the cloud and in the basement), and I use Tailscale to bridge the physical gap in the network.

What I have noticed, though (actually, for a while already, just didn't bother to investigate) is the following mystery:

One node receives a lot of work to do (think: request for multi-platform Docker build via Gitea Actions)
Docker containers got restarted on that node amass
Nomad restarts all the jobs and everything "just works" again

Now, because of point 3) I never really had an incentive to find and fix the problem since Nomad just stabilizes the system rather quickly (1 min for example). This problem was occurring and reoccurring for months, but I didn't care much (pets vs cattle and all that).

What eventually turned me around is the fact that this problem occurred during my introduction of a new CI/CD platform (I am replacing Drone CI with Gitea Actions), and debugging long builds that also fail because Docker containers running those builds die is not the most optimal use case of my free time.

Now, my assumption was that the node would just work with the basic Linux (Debian) setup without any system tinkering (Ansible sets up the ssh server, my user account with a bunch of dotfiles, etc, but that's all just normal customization everyone does). That was a standard, but a lousy assumption.

Down the rabbit hole, we go...

Is it OOM?

I've noticed in my dmesg output [1] that some Docker processes were taken down by the OOM manager and immediately added swap to the system.

It's not that hard to add a swap file to a Debian system, for example, Digital Ocean has nice and very readable articles that handle basic administration tasks for standard Linux packages, so they have an article for the swap as well.

Now, it is said that in the cloud age one shouldn't really depend on swap and the machine workload should just be stable enough to work out of RAM (because of various reasons), but we are talking about a small 16GB laptop [2] so it makes sense for me to still resort to the swap for those peak moments.

This helped a bit and I had the docker work much more stable. Containers still disappeared, though, so it definitely wasn't the root cause, it was just a reason for the failures to happen (even) more often.

Is it the Docker service?

Now, the next suspect was Docker Daemon itself.

I've noticed this strange message appearing over and over again in the logs:

Your kernel does not support memory swappiness capabilities or the cgroup is not mounted. Memory swappiness discarded

But this is a standard Debian Linux, why wouldn't that cgroup be enabled/mounted? Weird. But, diving into the Net we find indeed that Docker says this is a normal thing, they even have part of the docs only for this specific message.

Needless to say, their attempt didn't work unfortunately for the current stable Debian (11). I had to research further and found this great thread for microk8s that exposed a change in Debian and later in the thread the way how to work around it until cgroups v2 are supported finally my hero who had the same exact problem and the solution presented itself to finally get read of the message in the logs:

# set in /etc/default/grubGRUB_CMDLINE_LINUX="cgroup_enable=memory cgroup_memory=1 systemd.unified_cgroup_hierarchy=0"

Then just do an update-grub and restart the laptop [3]. Of course, this removed the predominant error message, but (you guessed probably), the containers continue to die under large pressure.

Is it Nomad?

But then, I figured out something (at that time, completely obvious): the containers that were restarted were all Nomad jobs. So, no other container running in that Docker container was ever restarted. 🤦

I refocused now on the Nomad setup: what could have gone wrong there?

Nomad has very complex machinery behind the job scheduling simplicity and they have great documentation.

What I have missed so far is the fact that although no containers failed, there were effectively killed by the scheduler. If the Nomad agent on that node doesn't communicate with the server using a heartbeat mechanism.

Another research and another great Github issue thread and here we go: Nomad team lead simply says that there is a way to go around this problem:

Just curious is there any way to increase heartbeat manually?
There are a few heartbeat related settings on the server: https://www.nomadproject.io/docs/agent/configuration/server.html#heartbeat_grace

So, the solution on my network setup is that I had to increase the heartbeat to something longer than the default (default value of 10s for the heartbeat_grace in the server block of the server Nomad configuration was replaced with extremely large 120s, but I'm all down with tiny hammers at this point).

Increasing the grace setting is probably the most straightforward way to give clients more time to recover in the cases when CPU is under a very heavy load. In this case, I also assume that the heartbeat going over Tailscale VPN and the server being a rather old and weak Intel chip also doesn't help.

Many Gitea Actions later with very heavy CPU & memory workloads, and Nomad still didn't decide to shut any node down.

So, so far, so good: no more restarts noticed. Let's hope it stays stable! I definitely don't want to see this problem again & hope will not see it occurring ever again.

Or was it journalctl -xn? Not sure - this was a long time ago at this point. But, definitely, there were oom strings in the logs and I had that problem
I know, I should replace it with an Intel NUC. I am just waiting for the damn thing to fail... it's working well enough for 6 years already and I don't like replacing stuff that just works
Or, in my case, use my Telegram bot that runs laptop-booter to go through Intel AMT for a power cycle and then Dropbear SSH port to run cryptroot-unlock for my full-disk-encryption setup (but you know, that's just me)

Short Note: Sync Cloudflare DNS targeting Caddy within Nomad

Milan Aleksić — Thu, 23 Jun 2022 06:38:05 GMT

What

Let's say, for the sake of this short note, that you have:

Cloudflare DNS which you use to expose your service to the world (or to your internal tailscale/VPN/home network)
Caddy is your reverse proxy of choice
Nomad is your deployment system of choice

Well, now, here is how I synchronize DNS records on Cloudflare based on the service discovery in Nomad / Consul using some shortcuts and a very small Python script...

Result of this process is that if I have a service with name "chronograf" my DNS record chronograf.mycooldomain.com gets updated.

How

First, the nomad job definition:

job "internal-proxy" {  datacenters = ["DC1"]  type        = "service"  constraint {    attribute = "${attr.unique.hostname}"    value     = "pluto"  }  group "main" {    ephemeral_disk {      migrate = true      size    = 150      sticky  = true    }    task "caddy" {        # define the Caddy task. not relevant for this short note,         #perhaps a nice topic for another one    }    task "syncer" {      driver = "docker"      config {        image   = "python:3.10.4-slim-bullseye"        volumes = [          "local/:/etc/dns-sync"        ]        command = "python3"        args    = ["/etc/dns-sync/syncer.py"]      }      env {        ZONE_ID_FILE      = "/etc/dns-sync/cf_zone_id"        CF_API_TOKEN_FILE = "/etc/dns-sync/cf_api_key"        DNS_MAPPING_FILE  = "/etc/dns-sync/records.txt"      }      template {        data        = <"syncer.py" ]]EOF        destination = "local/syncer.py"      }      template {        data          = "{{ key \"cloudFlare/zoneId\" }}"        destination   = "local/cf_zone_id"        change_mode   = "signal"        change_signal = "SIGUSR1"      }      template {        data          = "{{ key \"cloudFlare/cfApi\" }}"        destination   = "local/cf_api_key"        change_mode   = "signal"        change_signal = "SIGUSR1"      }      template {        data          = <$tag, $services := services | byTag}}{{ if eq $tag "expose-internal" }}{{range $services}}{{ .Name }}.{{ key "cloudFlare/domain" }}|{{ with node "pluto" }}{{ .Node.Address }}{{ end }}{{end}}{{end}}{{end}}EOF        destination   = "local/records.txt"        change_mode   = "signal"        change_signal = "SIGUSR1"      }      resources {        cpu    = 100        memory = 100      }    }  }}

Some assumptions I took in the job above:

I am using Consul as key/value store and utilizing it to get the services I am interested in and I am using Consul Templating to get this going
I only want to expose services that are tagged with expose-internal
operationally, I "know" that this will be deployed only on a node with name pluto,
my personal Cloudflare API token and the zone ID which I plan on syncing are stored in Consul K/V storage
also, my domain under which I am exposing services is also in Consul K/V storage
I construct files which are updated automatically by Consul when/if K/V settings change OR if a new service mapping becomes available

Now, the script that does the syncing Nomad -> Cloudflare using pure Python 3.

import argparseimport jsonimport loggingimport osimport signalimport sysimport urllib.parseimport urllib.requestfrom typing import Dict, List, Optionaldef read_secret_multi(secret_name_file: str) -> List[str]:    with open(read_env_or_fail(secret_name_file), 'r') as env_file_file:        return [x.strip() for x in env_file_file.readlines()]def read_secret(secret_name_file: str) -> str:    with open(read_env_or_fail(secret_name_file), 'r') as env_file_file:        return env_file_file.readline()def read_env_or_fail(secret_name_file):    env_file = os.getenv(secret_name_file, None)    if env_file is None:        logging.error(f"Environment variable {secret_name_file} was not present, giving up")        sys.exit(1)    return env_fileclass Syncer:    def __init__(self, zone_id: str, cf_api_token: str, dns_records: List[str]):        self.zone_id = zone_id        self.cf_api_token = cf_api_token        self.mappings = {x.split('|')[0]: x.split('|')[1] for x in dns_records if x}    def cf_api(self, path: str, method: Optional[str] = 'GET', data: Optional[Dict] = None):        req = urllib.request.Request(            url=f'https://api.cloudflare.com/client/v4/{path}',            headers={                "Authorization": f"Bearer {self.cf_api_token}",                "Content-Type": "application/json",            },            data=json.dumps(data).encode('utf-8') if data else None,            method=method,        )        with urllib.request.urlopen(req) as f:            return json.loads(f.read().decode('utf-8'))    def sync(self):        logging.info("Syncing records...")        records = self.cf_api(f'zones/{self.zone_id}/dns_records')['result']        cf_records = {rec['name']: rec for rec in records}        for our_mapping, target in self.mappings.items():            cf_mapping: Dict = cf_records.get(our_mapping, None)            if cf_mapping:                if cf_mapping['content'] == target and cf_mapping['type'] == 'A':                    logging.info(f"Identical mapping on CF for: {our_mapping}, ignoring")                else:                    logging.info(f"Updating existing mapping on CF for: {our_mapping} -> {target}")                    self.cf_api(path=f"zones/{self.zone_id}/dns_records/{cf_mapping['id']}", method='PATCH',                                data=self.make_cf_record_dto(our_mapping, target))            else:                logging.info(f"Adding new mapping on CF for: {our_mapping} -> {target}")                self.cf_api(path=f'zones/{self.zone_id}/dns_records', method='POST',                            data=self.make_cf_record_dto(our_mapping, target))    @staticmethod    def make_cf_record_dto(our_mapping, target):        return {            "type": "A",            "name": our_mapping,            "content": target,            "proxied": False        }def run():    Syncer(        zone_id=read_secret("ZONE_ID_FILE"),        cf_api_token=read_secret("CF_API_TOKEN_FILE"),        dns_records=read_secret_multi("DNS_MAPPING_FILE"),    ).sync()if __name__ == '__main__':    parser = argparse.ArgumentParser(prog='syncer.py', description="Automation script for Cloudflare record syncing")    parser.add_argument('--debug', default=False, required=False, action='store_true', dest="debug",                        help='debug flag')    args = parser.parse_args()    if args.debug:        logging.getLogger().setLevel(logging.DEBUG)    else:        logging.getLogger().setLevel(logging.INFO)    # initial run    run()    # subsequent run    signal.signal(signal.SIGUSR1, lambda sig, frame: run())    # exit gracefully    signal.signal(signal.SIGINT, lambda sig, frame: sys.exit(0))    signal.signal(signal.SIGTERM, lambda sig, frame: sys.exit(0))    while True:        logging.info('Waiting for signals...')        signal.pause()

Using Ansible & Nomad for a homelab (part 2)

Milan Aleksić — Thu, 17 Mar 2022 09:35:10 GMT

This is a continuation of my previous article "Using Ansible & Nomad for a homelab (part 1)" which you'd probably want to read first to follow up where I left off there.

Nomad

Nomad is a well-known workload orchestrator. I have decided to automate my homelab cluster using it. I will through this blog post try to walk you through some discoveries I made on the way during the previous couple of months.

Features that drove me to Nomad:

Conciseness,
Evolvable setup (constraints, static ports are there for simple setups for example),
I already had knowledge of kubernetes and wanted to try something else
I had experience with Terraform and Consul so I was sure Nomad is probably a good choice to at least try it out.

I wanted to share with you how I configured couple of different services, just so you can get a feeling of the freedom that Nomad gives.

For experienced DevOps people some of the choices will be painful because of the shortcuts taken - but that is exactly the point: what I will try to prove is that Nomad is a perfect match for an ad-hoc homelab and that it allows you to evolve it into the more and more serious setup as your knowledge of the principals of workload orchestration in Nomad grows.

I chose a path of explaining the principles by showing some code examples from my own homelab and following the path of the most trivial examples towards the more complicated ones, expanding the coverage of possibilities as we go. So, I tried to make a story out of it instead of just showing you the end result :) But, if you want to visualize the end result - here it is, as of 16/03/2022 the topology of the deployed homelab:

So, without any further ado let's start...

Example: a cron bash script

It can't be any simpler than this: you have some bash script that you want to be executed from time to time. In my case, I have some background processes I need to kill (workaround, until I fix the go app that creates those zombie processes, ... somewhere in this decade I guess).

Nomad has a concept of a sysbatch that is basically a glorified cron executor. In combination with a raw_exec driver which is meant to run the lowest possible native OS code (no chroot, fast startup, surely some kind of protection but still the least recommended approach from all the drivers).

When moving things from conf management tools (like Chef/Ansible) into Nomad that's probably a very nice thing to have. Later on, you'd probably want to lower the number of jobs defined in this way and go more into a fully managed approach (which we will shortly touch), though. Still, the freedom of just defining it like this and then, later on, evolving it into the proper approach is the reason why I chose Nomad over k8s.

job "batler_cleanup_periodic" {  datacenters = ["KOEK1"]  type        = "sysbatch"  periodic {    cron             = "25 21 * * * *"    prohibit_overlap = true    time_zone        = "UTC"  }  constraint {    attribute = "${attr.unique.hostname}"    value     = "pluto"  }  group "main" {    task "script" {      driver = "raw_exec"      config {        command  = "/usr/bin/bash"        args     = ["-c", "pkill -u batler_remoteexec || true"]      }      resources {        cpu    = 500        memory = 128      }    }  }}

You might have also noticed the "constraint" stanza, where I fix the location of the script into the place I have it currently defined in Chef. This way I was able to evolve node by node from the old solution into the new one. Therefore the migration process was:

move all the stuff that can't be put into Nomad into Ansible (including the Nomad itself);
for all other things that can be moved to Nomad, move them into the simplest possible abstraction in Nomad (lowest possible hanging fruit being raw_exec task type);
remove chef-client, Ruby, git repo for chef-zero execution etc etc;
move to another node, repeat 1-3;
look how to improve migrated jobs into "better" / more professional implementations, learning as you go.

Example: excalidraw

Excalidraw is a very nice drawing tool I like using from time to time. It's open source and even has a free online version of it. I thought it was awesome and just decided on deploying it myself in the homelab! It's just a simple completely stateless service with a docker container, it can't be easier than that.

Here it is:

job "excalidraw" {  # ... superfluous things, already presented previously, commented out  group "main" {    network {      port "http" {        static       = 2734        to           = 80        host_network = "tailscale"      }    }    task "excalidraw" {      driver = "docker"      config {        # SHA of what was latest on 04/11/2022        image   = "excalidraw/excalidraw:sha-4bfc5bb"        ports = [          "http"        ]      }      service {        name = "excalidraw"        port = "http"        check {          type     = "tcp"          port     = "http"          interval = "10s"          timeout  = "2s"        }      }    }  }}

What you can see here is the following step: how to deploy an online service in Nomad. Since it was deployed as a docker-compose service managed by systemD previously, it was only natural to use the docker driver and put it on the same machine.

You will notice that the stanza network hardcodes the port occupied by the service on the tailscale network, defined inside the Nomad configuration on that node as:

client {  host_network "tailscale" {    cidr = "100.65.51.119/32"    reserved_ports = "22"  }  // other non-relevant configuration options}

This makes sure that the port will be exposed only on the Tailscale network. Obviously, the Tailscale agent is installed and fully operational at this point, but I used Ansible to set up that part before I even tried running any Nomad job on the machine.

The service is registered on Consul via the service stanza and is nicely exposed in the consul listing

Consul and Nomad work together to make sure that the service is always online via Consul checks and if anything goes bad (like OOM in docker because of over-provisioning which may or may have not happened) the services will just be restarted as nothing happened. This example just does a raw TCP connection test, but you should try to go an extra mile to add a full HTTP verification path that the application exposes (if possible, only locally) and which, behind the curtains, makes sure the application is running in a stable fashion.

Example: Resilio

Next example represents my Resilio File Sync job installation. Now, this one has its share of new concepts I had to understand and apply:

job "resilio" {  # ... superfluous things, already presented previously, commented out  group "main" {    ephemeral_disk {      migrate = true      size    = 150      sticky  = true    }    volume "btsync" {      type      = "host"      read_only = false      source    = "btsync"    }    task "download" {      driver = "raw_exec"      lifecycle {        hook = "prestart"        sidecar = false      }      artifact {        source = "https://download-cdn.resilio.com/[[ consulKey "resilio/version" ]]/Debian/resilio-sync_[[ consulKey "resilio/version" ]]-1_arm64.deb"      }      config {        command  = "/usr/bin/bash"        args     = ["-c", "7z x -y local/resilio-sync_[[ consulKey "resilio/version" ]]-1_arm64.deb && tar xvf data.tar ./usr/bin/rslsync && rm data.tar && mv usr/bin/rslsync ../alloc/data/"]      }    }    task "main" {      driver = "raw_exec"      config {        command  = "../alloc/data/rslsync"        args     = ["--nodaemon", "--config", "local/config.json"]      }      template {        data = <"config.json.tpl" ]]        EOF        destination = "local/config.json"      }    }  }}

Job uses ephemeral_disk stanza to try (best-effort, so don't count on it 100% of the time) to maintain filesystem state across job re-deployments. I have learned indeed not to depend on it, but to use my NFS share from my NAS for anything

Little digression, albeit an important one, now: I hope you have a NAS? I mean, what kind of homelab cluster do you think you have if you haven't bought (or, in case you are an adventurous type, self-built a RaspberryPi-based system, soldered to your wall, or whatever else rocks your boat) a network-attached storage device, aka NAS? I bought my ancient Synology DS413j many years ago and am just from time to time extending/buying new disks. Of course, it's ancient, which means I have a chroot Debian Wheezy (that's version 7) folks, 2016 got the last update), but it does its job still well enough that it doesn't deserve any kind of upgrade. Probably my best buy ever because it enables so many uses cases of the homelab...

Further, you can see how I leverage here volume stanza to mount a host directory (backed by Ansible-driven NFS share) defined like this inside the Nomad config file:

client {  host_volume "btsync" {    path      = "/mnt/btsync"    read_only = false  }}

This basically allows the Nomad job to access files stored remotely on a shared NAS drive.

Off-topic, but perhaps relevant to complete the picture, if you'd like to know how this mount is set up in my Ansible part of the setup, it's basically a dumb-down variant of an open-source ansible role from the OpenStack project.

Now, since this job doesn't use Docker (yet), I chose to utilize an init container sidecar pattern in the form of "Nomad prestart task" that just fetches a binary for this system architecture from the official web site and store it in alloc/data (backed by ephemeral disk). Then, the real service starts the service and keeps it online.

Astute reader might ask why I am not doing a check if the identical file wasn't already downloaded, but I think I will just move to Docker later anyway.

Finally, you might have noticed weird constructs in the job file: [[ consulKey ... ]]. What are those now? Well, I am using levant tool which is a layer around the nomad CLI. I have found this tool very useful since it, at least:

allows for CI/CD setup (you externalize parts you want to be pushed from CI/CD, like image version for a docker image),
allows templating parts of job definition that would otherwise have to be hardcoded or embedded, (like file contents - in case of files that span hundreds of lines you really want them out of the job file).

Config file for Resilio sync

Just to complete the picture here, here is the attached config.json.tpl file to see how I have configured one of my Resilio sync target directories:

{    "device_name": "{{ env "attr.unique.hostname" }}",    "use_gui": false,    "log_size": 30,    "listening_port": {{ env "NOMAD_PORT_http" }},    "shared_folders": [      {        "dir": "{{ key "resilio/syncs/haumea/dir" }}",        "overwrite_changes": false,        "search_lan": true,        "secret": "{{ key "resilio/syncs/haumea/secret" }}",        "use_dht": false,        "use_relay_server": true,        "use_sync_trash": true,        "use_tracker": true      }    ],    "storage_path": "../alloc/data"}

I could've chosen the path of just embedding the template in the Nomad job, but my preference is to always lower the noise in the Nomad job if I can help it, and, therefore, levant provides a nice fileContents template function.

In that file there will be no special surprises, but you can see how you I combined both:

environmental settings (dynamically changed by Nomad when (re)deploying) using env template variables; and
consul-driven keys, like the location and secret needed for the Resilio to handshake through its system with all the other clients about the state of my files - my phone, laptop etc.

By the way, using Consul keys lets me easily issue a redeployment of this Nomad job just by changing the key in consul KV store. Although, in this case, it makes no sense since both directory and secrets are there for the reasons of "externalization from the template", not because of security purposes.

Example: ThoughtTrain

So far, the nomad job files were based on well-known off-the-shelf software packages. ThoughTrain is my own OSS Go app hosted on Github which is totally unknown to broader audience but in short it's my own variant of "Read it Later".

Deployment is done using levant from within the code repository of that project. This is, thus, different from the previously mentioned jobs since they are part of the homelab repo. But, if you are maintaining the source code of a project it simply makes sense to keep the infra part inside that same repository.

This time I will not share code of a nomad job since it's open source & inside that project, I just wanted to share the mechanics of deployment, built on previous concepts.

The CI platform is Github Actions workflow, build is driven by make so it might not be the easiest to follow, but effectively it all boils down to a simple concept: when I push a tag to GH this command is executed.

export VERSION=<TAG> && ./levant deploy \    -log-level=WARN \    -consul-address=<consul location>     .github/workflows/thoughttrain.nomad

I use the log level WARN so the secrets read from Consul are not put in the output.

I put the "consul location" as a secret into my Consul server.

Of course, this can only work if GH Actions connects temporarily to my homelab network as a temporary node using ephemeral tailscale nodes in combination with the Tailscale GH Actions action.

Example: internal-proxy

Finally, the crown jewel for a homelab is a single centralized place where all your services get exposed using domain names. I have chosen Caddy since it's so easy to setup and has very rich ecosystem of support of various flows:

I prefer using Let's Encrypt (even for the internal homelab services) and Caddy just supports it out of the box!
Cloudflare is also supported, albeit with a need to build custom ARM docker image (details shortly)
all my markdown notes are just exposed as a readable web site.

I am actually so happy with what Caddy does for me after 7-8 years of using nginx that I even plan on writing a mini-blog post in the future to go through my favorite features of Caddy

Here is my Caddyfile template I use in the inverse-proxy nomad job, shortened a bit to remove redundancy and non-critical parts:

*.milanaleksic.net {  encode gzip  tls milanaleksic@gmail.com {    dns cloudflare {{ key "cloudFlare/cfApiMilanaleksicNet" }}  }  @chronograf host chronograf.milanaleksic.net  reverse_proxy @chronograf {{range service "chronograf"}} {{.Address}}:{{.Port}} {{end}}}

Why do I need Cloudflare integration? Well, Let's Encrypt uses DNS verification process to verify I am a domain owner, therefore there is a need for a short handshake between Let's Encrypt servers and my own DNS records, and Caddy does it all by itself. Just this fact removed a need for a cron job, python script and more complex setup to handle certificate renewal process in nginx I had to do while in Chef.

Additionally, please observe how I refer to the location of Chronograf service: I do not reserve the port statically in chronograf.nomad:

job "chronograf" {  // ...  group "main" {      port "http" {        to           = 8888        host_network = "tailscale"      }    // ...    task "chronograf" {      // ...      service {        name = "chronograf"        port = "http"        check {          type     = "tcp"          port     = "http"          interval = "10s"          timeout  = "2s"        }      }    }  }}

and I just let Consul and Nomad negotiate which port should be taken. Each time chronograf gets (re)deployed this port is chosen a new and reverse-proxy gets restarted. Easy-peasy, nothing for me to do there, just like in k8s.

Here is the job specification for inverse-proxy:

job "internal-proxy" {    task "caddy" {      driver = "docker"      config {        image   = "milanaleksic/caddy-cloudflare:2.4.6"        volumes = [          "../alloc/data/caddy-config:/config",          "../alloc/data/caddy-data:/data",          "local/Caddyfile:/etc/caddy/Caddyfile"        ]        ports   = ["http", "https"]      }      env {        ACME_AGREE = "true"      }      template {        data = <<EOF[[ fileContents "Caddyfile.tpl" ]]        EOF        destination = "local/Caddyfile"      }      service {        name = "internal-proxy-http"        port = "http"        check {          type         = "tcp"          port         = "80"          interval     = "10s"          timeout      = "2s"          address_mode = "driver"        }      }      service {        name = "internal-proxy-https"        port = "https"        check {          type         = "tcp"          port         = "443"          interval     = "10s"          timeout      = "2s"          address_mode = "driver"        }      }    }  }}

Now, this job spec is interesting because of couple of reasons.

Building ARM Docker images

This is a custom (public) Docker image built for aarch64 architecture on my Mac Book Pro.

by the way, you get to use aarch64 architecture not only on modern cloud ARM servers, but also if you install 64bit ARM OS on a Raspberry Pi 4, for example.

To build and push this I have utilized the Docker Desktop for Mac "buildx" feature. My build script for this image is basically this:

#!/usr/bin/env bash# latest on 10/02/2022export CADDY_VERSION=2.4.6# This depends on Docker for Desktop (Mac), because that one supports multi-arch output# Create that builder with "docker buildx create --name multiarch"# Alternative: use DOCKER_HOST to point to a remote arm / x64 nodedocker buildx use multiarchdocker buildx build \  --platform linux/amd64,linux/arm64 \  --build-arg CADDY_VERSION=${CADDY_VERSION} \  -t milanaleksic/caddy-cloudflare:${CADDY_VERSION} \  --push .

and my Dockerfile is:

ARG CADDY_VERSION=0.0.0FROM caddy:builder AS builderRUN caddy-builder \    github.com/caddy-dns/cloudflareFROM caddy:${CADDY_VERSION}COPY --from=builder /usr/bin/caddy /usr/bin/caddy

I was previously very spectical about using Docker containers on a Raspberry Pi. But, these new 4s don't mind. Things just work, even backed by cheap SD card. Very nice. I plan on changing all my binary and script jobs with ARM / x64 Docker images in the future.

But, as I have said many times above - the nice thing with Nomad is that it allows you to start with low-hanging fruits and build slowely more professional setup.

Conclusion

We have reached the ending of what I thought was reasonable enough to follow. I have actually cut quite a few points on the road, but hopefully this blog post was enough to get you interested. I got brand new interest in my homelab after doing this migration. I don't dred any more logging in into the nodes just to discover my Chef scripts got again broken after OS upgrade.

All in all, I think this migration was a very good exercise in complexity with a lot of things I have learned on the way. Would definitely recommend for devops enthusiasts as an alternative to fully-managed k8s cluster.

Actually, even fully acknowledging the likes of k3s allowing a single binary/SQlite setup, I still prefer Nomad path just because of simplicity and clarity of the job specification format instead of YAMLs. But that's just a personal preference, there is nothing stopping you in doing this entire exercise using SaltStack+k8s if you prefer it. But, just don't forget about there being perhaps a simpler approach that has all the benefits.

Bad sides of Nomad

There are plenty. Nothing comes without bad sides, there is no perfection.

I, for once, really disliked the fact that I can't expose job's task port statically on IP 0.0.0.0 (this exposes the port on all IPs). It's impossible, try it (if you find a way, please reach out!). It is a must for cloud deployments when you can't know the public IP of your node. Nomad just defaults to a first network interface it encounters (or specific ones if you set it up like that). There are some issues and discussions but no definitive answer yet. You just have to use consul gateway ingress, which I didn't have a time to explore (yet). Currently, I just start another out-of-Nomad Caddy reverse proxy using Ansible that then pokes into services deployed inside Nomad (sad 🐼).

Personally, I would prefer also that levant gets merged into nomad binary. It just seems that there is no need to externalize that functionality, just like kustomize got merged into kubectl as a subcommand kubectl kustomize. I know HashiCorp guys know what they're doing and am aware of a concept of splitting of concerns, but as an end-user I'd just prefer that I don't need 2 binaries.

Further work ideas

Where will I go further from these examples? Well, I see at least these avenues of improvement:

utilize Vault instead of Consul for secrets
- this is more for learning purposes than it is for security reasons since we are still talking just about a homelab
expose metrics from nomad jobs into grafana
- still not sure what's the best way here, I did notice at least one attempt using Prometheus either directly or via Telegraf intermediary in front of InfluxDB which I already have in the cluster...
expose Nomad jobs' logs into loki
- currently I only push ansible apps and system logs into Loki
try out all the interesting services now when workload management is in place: Redis, PlantUML, Grist, PiHole, private Docker repository...
- this is now way less work than before since, while building all the above examples and some more I haven't mentioned, I gathered enough know-how to add more quickly new things
try out Consul Connect service mesh

Using Ansible & Nomad for a homelab (part 1)

Milan Aleksić — Sat, 26 Feb 2022 09:59:27 GMT

Part 1 of 2

This article will be split into 2:

one about the motivation for the migration away from Chef and how I did it using Ansible;
another one about Nomad, hopefully soon.

What?

I have a small cluster of computers which I have been maintaining over the last 10 years. I've learned an amazing amount of Linux operations thanks to those first Raspberry Pi's but most of all the biggest learning asset I acquired was understanding of the pain of making stuff work manually.

Back then I decided to use Chef as my Config Management (I remember I considered Puppet as well). Don't get me wrong, Ruby is still alive and well (I wouldn't say it's exactly thriving but there are for sure millions of Ruby programmers) - but my engineering career path took me more into direction Java as the main language (with short but interesting excursions into Go and Rust) and Python for everything else.

So, the writings on the wall were there for a long time:

my Chef cookbooks had to be vendored and manually maintained,
weird Ruby concoctions had to be used more and more often because I hated maintaining the chef-repo,
it got harder and harder to match the target Chef version with the available system Ruby version (rvm helped there with unnatural life extension),
I simply didn't have time to maintain my personal deployment framework BADUC.

It stood for "Bastion/Drone/Usher/Consul" where distributes consul locks were used with node agents and ingress token-driven service triggered deployment. BTW, don't do this. Don't waste time doing something like that, unless you have an abundant amount of free time. It was a major mistake on my side, albeit a nice learning experience. I deleted it all with a laugh on my face - I spent many hours on maintaining it, knowing it will be thrown at some moment in time anyway. Just use off-the-shelf platforms like Nomad or K8s.

I waited for a long time, but finally, I did it: the last remnants of Chef code and setup have been removed and the cluster has moved to Ansible foundation. It took me around 2 months (a couple of hours per week, as much as family life can provide, I guess).

Cluster is hybrid in every sense (unusual for companies but quite a normal thing for home labs):

part is in the cloud and part on my premises (basement and office);
there are ARM servers (arm5, aarch64, arm7) and there are the "normal" x64 machines
some software is set up using Ansible, but mostly I try to target Nomad for new services
some services are deployed from Github Actions and some from within the network using self-hosted Drone.

The path

Here is my attempt in explaining what I figured out to be the path:

figure out reasonable Linux OS distribution
define foundational aspects and deploy them using Ansible
everything else should work as Nomad job(s).

Linux distribution

There is not a lot of thinking here - if you wish to mix arm servers with x64... you just have to go with Debian (or its derivatives, like Ubuntu).

I know you can set up Ansible to work with any distribution, but for a home lab... it just makes no sense to do it: just expect apt, systemd, and the friends are all there.

This also drastically relaxed Ansible role expectations: my roles are mostly trivial because of this decision.

Foundations

I considered the following things as the most inner ring (or the castle foundations if you like metaphors) of the setup:

ability to access the SSH port via the default user provided by the distribution using Ansible
run Ansible roles
seal the system (no public SSH ever again).

Your cloud provider or your ISP might provide you with direct public IP access which is nice. But it makes no sense for a home lab anymore in 2022 with Cloudflare tunnel, Twingate, Tailscale, etc.

My cloud nodes are currently in the Oracle cloud because of their generous "always free" offering. But, that's just me being a cheap ass.

Standard roles

How to prepare a node for the seal? What are the needed steps to make it happen? For me, the minimum list of the things that must be present on any node in the home lab cluster are:

add the new system user for Ansible,
checkout dotfiles from internal Git (this step might fail so it's important to allow for it in case the git server itself is part of the cluster)
add some "default" packages, (from ncdu to tcpdump, everyone has their favorites here I guess);
change some system files, like motd messages just to mark the territory and point out which node you are connected to after SSH login or sshd config to block password login and all the other system configurations;
start Tailscale agent and join my tailnet using auto-join feature;
add promtail to push logs (Loki will be used as the centralized logging service);
use collectd for metric publishing (InfluxDB will be used for the metrics);
start Consul agent;
start Nomad agent.

The actions listed above must be done on all the servers as Ansible roles and only if they are applied can a server be declared as the "homelab cluster member node".

Monitoring

You might ask yourself why would I consider monitoring foundational.

Well, for me as a backend engineer, observability is a paramount concern when building systems. You don't guess what the system is doing, you observe it.

Logging seems like a first thing to look at, for sure, but I find metrics way more interesting for a home lab. You get to see the usage and problems first-hand, like for example situations where a node is over-committed and some swap is needed or when network partitioning shows itself, and so on.

I was focused on lightweight processes and figured out that the smallest footprint comes from Go-driven apps, so I prefer the Grafana stack (Tempo, Loki, Grafana itself), with InfluxDB for metric storage using a 30 days retention period. I'm quite happy and would suggest this stack to everyone.

These Go services are deployed using Nomad, but the agents themselves are for me foundational layer since they should be available and running even when collector services are down. But of course, if you want to throw money at the problem you can avoid collectd/promtail and the monitoring collector services by using their managed offering or just go for Datadog - that's an amazing monitoring suite.

Collectd is something I chose simply because it can be installed on any machine, even my old arm5 Debian. And InfluxDB has support out of the box without any need for additional converters or transformers. Easy on those Raspberry Pi or low-cost cloud node CPUs.

Tailscale

The magic here is Tailscale - after the nodes join the tailnet all of them can talk to each other, regardless of their physical location of NAT or whatever you in front of them.

Nothing comes into the nodes unless it's via the Tailscale network.

If nodes talk to each other, it's via Tailscale.

If there are ingress, publicly open ports, they are quite limited and only on specially marked and isolated nodes. But SSH or other administration ports are never left open for the public.

Thank you Tailscale for such an amazing product!

From the HackerNews geeks I've learned about "Headscale" that is an OSS implementation of the Tailscale server (the only thing which is closed-source in Tailscale setup). I will surely check it out at some point but for now... Tailscale has generous enough free offering that is more than enough for me. For now 😀

Consul/Nomad

I guess you might as why not K8s, right? Well, I know K8s enough, have been using it with one of my previous employers, and am OK with it. But, I wanted something simple for my home lab. Nomad was just a clear winner because of its simplicity, a no-brainer. But, more about how I use it in the next post.

Introducing Nomad demands a Consul cluster up and running, with the agents deployed across the servers.

I knew Consul for a long period and I consider it a nice building block for distributed computing:

consistent quorum-driven Key/Value storage;
service discovery;
Connect service mesh (still not using it, yet).

Non-foundational Ansible services

Some services simply could not be migrated efficiently to Nomad jobs and then they were pushed into a "foundational" setup.

I could choose to force them into Nomad, but I decided to cut my losses here - instead of targeting purity, I decided to have something running and come back later to rethink the approach if the services themselves and/or Nomad further evolve to allow me to fix the problems that blocked me from declaring these workloads as Nomad jobs.

Gitea

This is a popular small-scale self-hosted Git server.

There were too many things that ended up being weird when I tried to put it into a Nomad job:

creators blocking the idea of the job turning as a root user (even leaving messages in the source code for the smart asses) which is such a pain if you have to use exec / raw_exec Nomad drivers
git hooks auto-created by gitea hardcode the physical location of the binary (there is an admin action to update the location, but that was just a bit ridiculous and the last drop in the glass full of problems so I just gave up)

PostgreSQL

I prefer running the system packaged version for that platform, although in theory, I should be able to just run a Docker container. PostgreSQL is a database that is suitable for wide number of use cases, from Raspberry Pis to huge RAM-rich sharded cloud clusters.

Public ingress reverse proxy

This will be replaced by an official nomad/Consul gateway envoy proxy at some moment, but for now, I simply had to timebox things in the cluster to non-Connect setup.

I chose Caddy proxy instead of Nginx with which I have more operational experience since it handled HTTPS certificate maintenance without any scripts by me.

To be continued...

In the next installment: some Nomad remarks and personal experiences, including both the good and the bad parts!

Remote laptop power management using Intel AMT and Dropbear (initramfs) SSH

Milan Aleksić — Thu, 16 Aug 2018 07:11:48 GMT

I like having my own lightweight datacenter at home using ARM servers. It was (and still is) highly effective way of learning how to manage multiple servers using Linux: how to setup networking, security, use config management (I use Chef but today I wouldve chosen Ansible I guess), remote access patterns, installing and maintaining multiple applications, etc. But, when I got my Dell laptop from my previous job as a departure artifact I got into a small predicament: how to safely, securely and efficiently utilize it in my setup which until then only had cheap small ARM servers, not a 1K laptop which can be used for many more things (like running Jupyter notebooks, DroneCI docker containers etc).

To be clear, in the context of this blog and my requirements, safe means fully disk encrypted laptop (if a thief takes it, data cant be extracted), efficient means turning it off and on remotely, securely means that only I can do it, using my Yubikey, no matter where I am in the world.

For quite some time I did it in a simple way: it was always ON (with battery out as some people suggested it on the Internetz for fire safety reasons) but the feeling of doubt was omnipresent: this wastes electricity 99% of the time in this way. I wanted the damn thing on only when I wanted it! Its the cloud age when things should run only when needed, a small laptop in my basement should be trivial!

I did notice that people are able to do a very cool thing: use something called initramfs/dropbear to access fully encrypted laptop. I also figured out that Intel AMT can also turn on or off a modern PC so I googled, Ive written some Go code and Ive decided to write this follow up blog to connect all Ive figured out for the next one wanting the same thing to do it more simply.

Manual setup stage

Here we need to understand what we are dealing with in my setup to know how applicable it is for you.

FDE

My laptop has full disk encryption. I have Debian Stretch (9) on it during the installation (but derivateslike Ubuntu should also support FDE). You cant turn it on after installation AFAIK, at least not the default ecryptfs. You should set it up as you please and at least one decryption method should be passphrase-driven so that you can send it as cleartext over SSH as a parameter to the decryption command. I will not explain how to do it, please google it. A hit I got is for example: https://xo.tc/setting-up-full-disk-encryption-on-debian-9-stretch.html

Intel AMT

I have Intel AMT which I had to activate and set an admin password. Depending on your PC vendor AMT can be activated with an empty password or pre-set with a fixed one etc. You should be able to set it up and allow for power management usage commands. This effectively means that ports 16992 and/or 16993 are active on a certain IP that you set up manually. On Linux, you can verify using amttool that AMT is correctly set up using another machine. You should be able to do a powerup and powerdown commands. You can even do it from any linux machine. For example: article 1 and article 2

Dropbear SSH

Dropbear is another very interesting piece of software: it is minimalistic collection of some basic utilities for Linux related to SSH, all of which having minimal dependencies and (important!) being able to run in initramfs which is activated after initial boot sequence in Linux is complete and before disk is decrypted.

Dropbear allows SSH connection to be established to it before your Linux distro is fully up! Dont forget that no full real drive mount points are available in dropbear

In latest Debian/Ubuntu releases it is not that complicated to set it all up. There are many not up-to-date (and thuscomplex) Dropbear initramfs SSH setup blogs to be found on Internet but I found them deprecated. A correct and simple one is for example here: https://hamy.io/post/0005/remote-unlocking-of-luks-encrypted-root-in-ubuntu-debian/

Effectively you need to:

Install dropbear package for your distro
Add your public key (your home is not accessible, so another path on the system is used for dropbear authorized keys)
(Optionally, but recommended) Change the port for SSH

You might need this to know to which SSH service you are talking to: the real one (part of your Linux distro) or dropbear one, running in initramfs; both cant be running at the same time

Update/rebuild initramfs

Bastion host

I dont let my servers appear on Internet except one, so called dmz or bastion node. This is the node that is used primarily for things like SSH tunneling and nginx proxying to other servers. I recommend having one since it allows easier setup (you can use it to setup vpn also for example either using OpenVPN or more simply using ngrok as I have explained in another article).

SSH agent

I stopped using private keys in a form of files. I bought myself a Yubikey in the last Amazon Prime day and Im loving it. I went through the magnificent manual to setup the GPG and SSH using Yubikey and since then Ive removed my previous flow and replaced it with a yubikey-driven GPG agent (that can and does correctly work in SSH agent mode also).

I hope you are using some sort of SSH agent, right? Even if you use private key files, you dont type in your password every time you need that private key, right? You would of course also never dream of keeping unencrypted private key laying around? Right?

Sanity check

Now is the time to verify all is set up correctly:

Make sure you can connect to AMT via amttool package from another host;
Make sure your passphrase is correct and that you can type it in and unlock the laptop after boot is complete;
Make sure after booting and before you unlock the drive you can connect to dropbear ssh port and that you can run decryption command crypt-unlock;
Now rinse & repeat with an SSH tunnel through your bastion host.

Automation

Finally, you might consider that all of this is fine and nice but it takes a lot to turn on: you need to start the machine, connect remotely to dropbear ssh, issue unlock procedure etc. Thats correct, its tedious. Thats why I wrote https://github.com/milanaleksic/laptop-booter/blob/master/README.md to do all the steps above. I recommend setting up aliases to boot up certain servers.

Many things can be improved here! You might for example wish for:

only partial support for my flow: for example avoid bastion ssh tunnel or use local file for ssh private key etc;
env variables or config files instead of CLI args;
omit some warning messages or introduce logging levels.

I can only say: PRs are welcome, requirements in the form of GitHub issues also (although depending on priority it might take some time for me to do sth about them).

InformIT ebook deal of the day → pushbullet

Milan Aleksić — Sun, 11 Sep 2016 09:10:13 GMT

In case you are like me and like reading, www.informit.com has a rather large library of ebooks, some of which they offer in daily deals.

These daily deals are pretty good thing because you can end up getting a classic like Fowlers Refactoring or GoFs Design Patterns with a 50% discount.

But, the publisher wants you to visit their page every day. They dont offer a newsletter for example (and their Twitter is overloaded with corporate information so its hard to see only the tweets related to the DoDs).

So, how can we automate getting the information? Ive made a one-liner bash command that sends Pushbullet notifications via cron. Lets go step by step

Matching the link

If you open the HTML source code of the page you can see that the link to the book is the first link that starts with:

href="/store

So, the simplest way is to download the page and just extract the first link that matches this expectation. Of course, it will not work forever, but you can always come back and report it here, I might need to update this pattern

Simplest way I found to extract the script is:

curl www.informit.com/deals/ 2>/dev/null | \  sed -n 's/.*href="".*/http:\/\/www.informit.com\1/p' | \  head -1

The upper script will:

download the page, hiding the progress bar
it will replace all lines that have store links with the link content only
show only the first match

Sending the link

How do we now get the link? You can choose to send an email, but thats so 1990's! Lets find a more modern approach for this!

I have a Pushbullet app on my phone and, interestingly enough, guys keep it quite cheap (until you start sending a lot of notifications), so lets use that one!

| xargs -I {} curl \  --header "Access-Token: $PUSHBULLET_TOKEN" \  --header 'Content-Type: application/json' \  --data-binary '{"body":"Deal of the day is: {}","title":"InformIT deal of the day","type":"note"}' \  --request POST https://api.pushbullet.com/v2/pushes

The command is almost a literal copy&paste from the official API documentationyou need though a Bash environment property PUSHBULLET_TOKEN which carries the private access token provided on the page https://www.pushbullet.com/#settings and thats it.

Putting it in a script

Everything done until now is just a preparation for the last step: we need a way to check this daily instead of us executing the command (otherwise it wouldve just been easier to open the browser, right?).

Lets save the command weve built in previous 2 steps into a file /home/informit_to_pushbullet.sh (with executable bit turned on of course).

#!/usr/bin/env bashPUSHBULLET_TOKEN=""# command we've built above>

I dont really do it like this though in my machines: I tend to extract all of the important environment properties into a separate file I keep in Git, but of course in this article Im keeping it simple.

Using cron to get daily notification

I will use a standard crontab to do this. Depending on your setup you can use different ways: supervisord or systemD just to name a few. I still think crontab is the simplest/most generic way.

crontab -e

And, finally, add this line in the editor presented to you and its done!

5 10 * * * /home/informit_to_pushbullet.sh 2>&1 > /tmp/pushbullet.log

Since I live in Brussels, I chose 10:05 AM to generate notification, around 1 hour after it is published. You, of course, need to adapt the time to your time zone.

Of course, while experimenting, you might want to replace 5 10 with */1 * so you can send notification every minute, until youre sure everything works.

Using AWS Lambda to verify site uptime

Milan Aleksić — Sun, 10 Jul 2016 08:52:40 GMT

Recently I had to start looking into the AWS Lambda as it might become part of a portfolio of cloud services we shall start depending on.

As you might already know currently Lambdas can be written only using:

node.js (which I passionately hate);
Java (boring: Id need to package a jar);
Python (which I chose not to learn);
no native Go support as of now (although at some point in time maybe Amazon reacts to this forum post requesting exactly that). Yes I can package my Go binary but that seems a bit non-native way of doing it.

I wanted the simplest, quickest POC so, node it is.

After I have defined a hammer, then came the search for a perfect candidate :). I chose a very simple use case: testing if my website is still up.

Lambda code

This is what I came up after couple of hours.

This code is also copy/paste friendlyI didnt use any fancy extra library like async, Future or whatever is considered cool by the hipster node developers, sorry about that (but these kinds of choices is exactly the reason why I hate node that much).

What it does is:

goes through the list of sites, verifies response is as expected
expects all sites to be https (this might not be suitable your caseyou might want to use require(http) in place of https below)
sends an email if at least one failure is detected

Trigger

As a trigger, I chose CloudWatch EventsSchedule which seems to be a cron-like way of triggering Lambdas in AWS. You just set up the timeout of for example 15 minutes (1 minute is minimum though).

Maybe the coolest thing about it is that you get automatically logging since all CloudWatch rule executions are being logged. Cool, right?

Role

You might also need to add suitable role, since sending emails is of course not enabled by default, I ended up writing a quick policy extension like this:

{  "Statement": [    {      "Action": [        "logs:CreateLogGroup",        "logs:CreateLogStream",        "logs:PutLogEvents"      ],      "Effect": "Allow",      "Resource": "arn:aws:logs:*:*:*"    },    {      "Action": [        "ses:SendEmail"      ],      "Effect": "Allow",      "Resource": "arn:aws:ses:us-east-1::identity/alarmemail@gmail.com"    }  ],  "Version": "2012-10-17"}

Billing

Seems like the price for Lambdas (well, at least for now) is so low that this kind of scripts basically ends up being executed for free.

Of course, it goes without saying that you should be limiting the execution time to something reasonable just to be sure, and that you dont trigger it too often, but thats itI stay far far far below $1 per month and thats good!

Profit and have fun

I hope Lambda stays cheap as it is since this is very simple/extensive way of doing simple stuff like checking a web site!

I still think it sucks for complex use cases, but thats a completely different rant there!

Ngrok vs dynamic DNS for remote Linux home server access

Milan Aleksić — Sat, 04 Jun 2016 08:11:04 GMT

Imagine you dont want to expose all your Raspberry Pies fully on the Internet (probably you shouldnt ever do that in fact) but still want to be able to reach them from outside of your home.

Imagine you want to access your NAS to schedule a download of a huge file you dont want to download from office etc.

I do things like this on daily basis

Here I want to present 2 successful and simple ways I use to access active network nodes remotely and my thoughts on when is one approach better than the other.

Level 1: use noip since millions already do that

I have couple of Rpi 1A, 1B and 3 (and couple of Radxa Rock Pros and a Synology, but thats a different story altogether). Amongst other things I have my web site on one of them (https://www.milanaleksic.net) and I use noip (http://www.noip.com) to do it. The normal way is to expose a single node fully or set up your router to pass-through to ports on selected servers.

In my particular case, and since micro computers like RPi and above-mentioned RRPs are relatively cheap, I use one server as a partial DMZ, which basically meansonly some ports are published. This allows me to do risky stuff on other nodes while Im at home without endangering the rest.

Setting up thing like this is not that difficult: you just search for a suitable client from the dynamic DNS provider you use, or use some 3rd party app which supports your dynamic DNS provider.

I use for example inadyn ( https://github.com/troglobit/inadyn) since I found it to be easier to configure via Chef which I use to setup all my home nodes and laptops.

After client starts updating DNS with your hosts IP you can quickly start accessing your home servers services exposed on your router, if you are allowed to do that of course and only if youve setup your router correctly.

Caveat: check if your internet provider allows you to access your home server remotely on certain ports. Im in Belgium and Telenet here doesnt allow ports 80,443 but Belgacom does

I used this method as a first choice since it was logical and simple. Everyone is doing it, it cant be wrong, right?

Dynamic DNS is perfect if you have something that constantly needs to be available on Internet (like a personal web site for example).

The only problem I see is that this approach is acceptable only on an Internet-facing node. That means: tough luck if you are connected via WiFi in a hotel or if you are behind corporate firewall for example. Im quite sure provider/hotel will not give you an IP, thus putting IP of the endpoint of your provider in your dynamic DNS server is pointless.

Maybe time for Level 2?

Level 2: ngrok as an ber tool for the geeks

I really like using ngrok ( https://ngrok.com/). In case you dont know whats it about: its like a TeamViewer/LogMeIn, but for geeks: it allows you to not worry about not having a stable public IP. Its like a hand that takes your hosts port, publishes it on Internet.

Thats in fact all you need, right? Exposed port should be all you need since then you can expose ssh and do everything you need!

The following tips expect some level of knowledge of Linux

You want to open VPN tunnel through that port? Here is what you need to do:

#!/bin/bashif [ "$NGROK_PORT" == "" ]; thenNGROK_PORT=$(go run ngrok_port.go -email=$NGROK_USERNAME -password=$NGROK_PASSWORD)fisudo sshuttle --auto-hosts --dns --exclude $SUBNET_HOME \-r $USERNAME@0.tcp.ngrok.io:$NGROK_PORT-e 'ssh -i /home/$USERNAME/.ssh/id_rsa -o \UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no' \$SUBNET_REMOTE

This uses an sshtunnel application that should be available as part of your Linux distribution. Probably not the fastest or the most optimal. You should use OpenVPN for that, plenty of blogs available that explain how toit just takes more time to set it up and explain.

The ngrok_port.go is a script I made in Go language (https://golang.org) which takes your ngrok username and password as parameters, logs you in into ngrok and fetches/scraps the IP exposed on their server.

The script is as simple as it gets, which means that it doesnt cover all possible cases: different server or different layout of the page and so on.

Ngrok free mode always has only one single tunnel allowed. If you use TCP tunnel, this script
*Ngrok free mode always has only one single tunnel allowed. If you use TCP tunnel, this script extracts the value (since*gist.github.com

So, these 2 scripts combined should allow you to quickly setup remote VPN connection and you are allowed to do whatever youd like on the remote server: connect via VNC, SSH to another host, access the sites available only in that remote network etc. And all of this without support from system admins.

When you are done working with the remote host you can quickly just ssh to remote host and shutdown ngrok server which will allow you not to worry that someone might try to access your computer:

ssh 'killall ngrok'

Caveat: nothing perfect in the world is for free: ngrok comes with free plan that allows only one port to be exposed, thats why the sshtunnel trick is so valuable: you get a lot through a single port

Is there Level 3?

Of course there is.

Although I havent found any problem not covered by the above-mentioned two solutions, that doesnt mean there are no places for improvement.

If someone cant help you with starting an ngrok server remotely you can of course use TeamViewer or LogMeIn for quick remote access, but that kind of defeats the entire approach since you end up mixing full desktop solution just to start a server on remote system.

I was thinking about pull daemon which would check every 15 minutes somewhere if a need exist for ngrok to be activated on the remote host. That way you keep the ngrok server active only when you really need it. From where a pull can be done is tricky: it needs to be a safe HTTPS place, either an S3 file as simplest solution or an online service, I havent decided which way I want to go. Probably left for another experiment when I get time🤓.

Conclusion

Noip or similar dynamic DNS provider is a good choice when the host is publicly reachable but you have no idea whats the IP.

You should try to use ngrok service for other common case: quick access to a remote host behind a firewall or NAT mapping.