<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Milan's blog]]></title><description><![CDATA[Milan's blog]]></description><link>https://blog.aleksic.dev</link><generator>RSS for Node</generator><lastBuildDate>Tue, 14 Apr 2026 21:20:15 GMT</lastBuildDate><atom:link href="https://blog.aleksic.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Short Note: faster loading of credentials from Bitwarden using rbw]]></title><description><![CDATA[I have been playing around with the https://ergaster.org/posts/2025/07/28-direnv-bitwarden-integration/. It is a necessary read before you read this note since it explains the problem in quite nice details and builds the proposed solution step by ste...]]></description><link>https://blog.aleksic.dev/short-note-faster-loading-of-credentials-from-bitwarden-using-rbw</link><guid isPermaLink="true">https://blog.aleksic.dev/short-note-faster-loading-of-credentials-from-bitwarden-using-rbw</guid><category><![CDATA[direnv]]></category><category><![CDATA[Bitwarden]]></category><category><![CDATA[cli]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Thu, 21 Aug 2025 07:19:53 GMT</pubDate><content:encoded><![CDATA[<p>I have been playing around with the <a target="_blank" href="https://ergaster.org/posts/2025/07/28-direnv-bitwarden-integration/">https://ergaster.org/posts/2025/07/28-direnv-bitwarden-integration/</a>. It is a necessary read before you read this note since it explains the problem in quite nice details and builds the proposed solution step by step.</p>
<p>It works just as advertised, but a bit slow for me - in case I have a list of credentials I want to load in my project it takes quite a bit of time to make it work. For example, in my homelab I have around 8 secret variables that I want to export each time.</p>
<p>So, I found a new solution using very similar approach - instead of using <code>bw</code> (which takes ~3 seconds per exported credential), the solution is using <code>rbw</code> (<a target="_blank" href="https://github.com/doy/rbw">https://github.com/doy/rbw</a>) which takes just a second longer to load the entire project, but if you use IDs to fetch secrets - it is instantaneous from that point on.</p>
<p>The approach proposed below migrates the source of truth about the list of the credentials from the envrc file into Bitwarden / Vaultwarden (I use the latter). Now I can create a folder and put secrets in it on my server and the envrc will just pull them all in, without having to list them one by one in the file. This allows me to more easily “scale” the project setups by just changing things in one place</p>
<p>Additional benefit I also noticed with <code>rbw</code> is that it is present in nix packages, so I can use my Jetify devbox flow out of the box (<code>bw</code> is there as well, but it just doesn’t compile or is broken most of the time on my Apple Silicon MBP). I guess Rust code is easier to deploy across platforms?</p>
<h2 id="heading-solution">Solution</h2>
<p><strong>EDIT (01/09/2025):</strong> I have adapted the code to reflect longer-term usage of the tool. Apparently if the vault is locked the rbw-agent will be spawned (that’s OK) but (FD?) leaks will remain and direnv will never finish loading. The solution is simple: once finished just terminate the newly created agent (if one exists), which will lock the vault, remove the leak and allow direnv to proceed with loading of the <code>.envrc</code> file</p>
<p>My <code>.envrc</code> file:</p>
<pre><code class="lang-plaintext">rbw_export_folder personal homelab
</code></pre>
<p>The function (defined in <code>~/.config/direnv/lib/bw_to_env.sh</code>) looks like this:</p>
<pre><code class="lang-bash"><span class="hljs-meta">#!/usr/bin/env bash</span>

<span class="hljs-function"><span class="hljs-title">rbw_export_folder</span></span>() {
  <span class="hljs-keyword">if</span> [[ <span class="hljs-string">"<span class="hljs-variable">$#</span>"</span> -lt 2 ]]; <span class="hljs-keyword">then</span>
    <span class="hljs-built_in">echo</span> <span class="hljs-string">"You must specify profile for rbw, and a folder as two arguments"</span> &gt;&amp;2
    <span class="hljs-built_in">return</span>
  <span class="hljs-keyword">fi</span>

  <span class="hljs-built_in">local</span> profile=<span class="hljs-variable">$1</span>
  <span class="hljs-built_in">local</span> folder=<span class="hljs-variable">$2</span>
  <span class="hljs-built_in">echo</span> <span class="hljs-string">"🔍 Exporting secrets from profile=<span class="hljs-variable">$profile</span>, folder: <span class="hljs-variable">$folder</span>"</span>

  <span class="hljs-built_in">local</span> existing_agents=$(pgrep -f <span class="hljs-string">"rbw-agent"</span> 2&gt;/dev/null || <span class="hljs-literal">true</span>)
  <span class="hljs-keyword">while</span> <span class="hljs-built_in">read</span> -r folder name id; <span class="hljs-keyword">do</span>
    <span class="hljs-built_in">export</span> <span class="hljs-string">"<span class="hljs-variable">$name</span>=<span class="hljs-subst">$(RBW_PROFILE=$profile rbw get <span class="hljs-string">"<span class="hljs-variable">$id</span>"</span>)</span>"</span>
    <span class="hljs-built_in">echo</span> <span class="hljs-string">"✅️ Exported <span class="hljs-variable">$name</span>"</span>
  <span class="hljs-keyword">done</span> &lt; &lt;(RBW_PROFILE=<span class="hljs-variable">$profile</span> rbw list --fields folder --fields name --fields id 2&gt;/dev/null | grep <span class="hljs-string">"^<span class="hljs-variable">${folder}</span>"</span>)

  <span class="hljs-comment"># Kill only the NEW rbw-agent for this profile</span>
  <span class="hljs-built_in">local</span> current_agents=$(pgrep -f <span class="hljs-string">"rbw-agent"</span> 2&gt;/dev/null || <span class="hljs-literal">true</span>)
  <span class="hljs-keyword">for</span> pid <span class="hljs-keyword">in</span> <span class="hljs-variable">$current_agents</span>; <span class="hljs-keyword">do</span>
    <span class="hljs-keyword">if</span> [[ ! <span class="hljs-string">"<span class="hljs-variable">$existing_agents</span>"</span> =~ <span class="hljs-variable">$pid</span> ]]; <span class="hljs-keyword">then</span>
      <span class="hljs-built_in">kill</span> <span class="hljs-string">"<span class="hljs-variable">$pid</span>"</span> 2&gt;/dev/null || <span class="hljs-literal">true</span>
      <span class="hljs-built_in">echo</span> <span class="hljs-string">"🔒 Locked the vault again (rbw-agent <span class="hljs-variable">$pid</span> stopped)"</span>
    <span class="hljs-keyword">fi</span>
  <span class="hljs-keyword">done</span>
}
</code></pre>
<p>When I enter my folder the output is like this:</p>
<pre><code class="lang-bash">🔍 Exporting secrets from profile=personal, folder: homelab
✅️ Exported ANSIBLE_VAULT_PASSWORD
✅️ Exported RESTIC_PASSWORD
✅️ Exported RESTIC_REPOSITORY
✅️ Exported SOL_PASSWORD
✅️ Exported SOL_USERNAME
✅️ Exported VAULT_ADDR
✅️ Exported VAULT_TOKEN
🔒 Locked the vault again (rbw-agent 92468 stopped)
milan@mbp ~/SourceCode/personal/homelab →
</code></pre>
]]></content:encoded></item><item><title><![CDATA[Nomad is dead, long live Kubernetes]]></title><description><![CDATA[The time has come: my homelab, for which I have been writing how-tos and occasional status updates on this blog had / has been serving me very well, but k8s ecosystem has reached maturity levels and broad acceptance and my Nomad cluster raises more q...]]></description><link>https://blog.aleksic.dev/nomad-is-dead-long-live-kubernetes</link><guid isPermaLink="true">https://blog.aleksic.dev/nomad-is-dead-long-live-kubernetes</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[nomad]]></category><category><![CDATA[ArgoCD]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Sun, 20 Apr 2025 11:44:33 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/d2uHXWTkGn4/upload/cd1c4fa7b9378356206a100a308e180a.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The time has come: my homelab, for which I have been writing how-tos and occasional status updates on this blog had / has been serving me very well, but k8s ecosystem has reached maturity levels and broad acceptance and my Nomad cluster raises more questions and brings up raised eyebrows more often than I'd like to admit. I think the time has come to face the music…</p>
<p>Here is my Nomad cluster, spread around Oracle Cloud and my basement.</p>
<blockquote>
<p>Actually, “basement” is not really the case as of today, since we are still converting it into fitness+movie room, so my remote-working office (in the attict) is the actual “basement” in this story, temporarily</p>
</blockquote>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745137785080/e9f16111-1795-481c-8346-ff33c49dd7a6.png" alt class="image--center mx-auto" /></p>
<p>Nodes I use:</p>
<ul>
<li><p><code>cloud4</code> is a big arm64 that Oracle gives for free</p>
</li>
<li><p><code>cloud5</code> is a small free amd64 that Oracle gives for free</p>
</li>
<li><p><code>ixion</code> and <code>pluto</code> are RPi 4s</p>
</li>
<li><p><code>oberon</code> is a RPi 5</p>
</li>
<li><p><code>io</code> is a VM on my Proxmox (explained later).</p>
</li>
</ul>
<p>What are the steps I made so far in the migration process?</p>
<h2 id="heading-introduce-an-empty-k8s-cluster">Introduce an empty k8s cluster</h2>
<p>This has been done. I got myself a <a target="_blank" href="https://www.minisforum.com/products/minisforum-um870-slim?variant=49690738131250">Ryzen 7 mini PC</a> last year, configured ProxMox on it and segmented its 64GB/16 vCPUs into:</p>
<ol>
<li><p>one more Linux VM for Nomad since I was short on the resources already (that’s the <code>io</code> from the screenshot)</p>
</li>
<li><p>one “<em>control</em>” k3s node (using Proxmox support for LXC);</p>
</li>
<li><p>one “<em>worker</em>” k3s node where work will be scheduled (LXC).</p>
</li>
</ol>
<h2 id="heading-basic-infra-setup">Basic infra setup</h2>
<p>Some steps I did so far, after configuring k3s nodes on my Proxmox using a <a target="_blank" href="https://garrettmills.dev/blog/2022/04/18/Rancher-K3s-Kubernetes-on-Proxmox-Container/">tutorial I found</a>, follow. In not really realistic order, but close to reality enough that it makes sense…</p>
<p>I installed the latest <strong>Helm</strong> as of today. It's pretty important part of the modern k8s experience and it just makes sense to have if from the start.</p>
<p>I installed then the <code>cert-manager</code> using Helm as well. Very cool, but took couple of hairs out until I made it work. It made me <em>rethink the entire damn migration</em> since so many sources talked about different aspects/approaches/ingresses. But, I persevered and it… just works now. Certificates are provided via Cloudflare which was there already, so I can easily access the server.</p>
<p>Then I <a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/operator-manual/core/#installing">installed an ArgoCD</a> with <a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/operator-manual/ingress/">nginx ingress</a> since my understanding is that it basically became super-popular for GitOps. I think there's also Flux, but we're using ArgoCD on my work, so I thought “no, Milan, you will not <em>again</em> choose another non-mainstream approach just for the sake of a principal”.</p>
<blockquote>
<p>In hindsight, I should’ve used Helm for setting up ArgoCD. I just used their default install scripts. I will migrate later, I guess</p>
</blockquote>
<p>Gitea remains my personal self-hosted git VCS, also for this new use case for ArgoCD - so that's where Argo will find its CRDs. It's still deployed in my homelab, just using Ansible as one of those core things, that for some specific reason I couldn't configure in Nomad job (I forgot why, it’s like that for years in my homelab, which translates in centuries in normal people).</p>
<p>I then added the latest <strong>HashiCorp Vault</strong> as well using Helm. I am still not happy about this since I simplified helm values probably too much and at this time it still can’t provide secrets inllfor other pods, but that’s just because I needed to learn Vault for a project at work (I needed a service ip and running with exposed API on ingress, and that's it).</p>
<blockquote>
<p>Interesting tidbit: I used Cursor for the first time in my life for Vault Helm setup 🎉.</p>
</blockquote>
<p>I added <a target="_blank" href="https://tailscale.com/kb/1437/kubernetes-operator-api-server-proxy">tailscale operator</a> so that I can access the cluster out of home as well. This is a cool thing since I can access it even from my Androind smartphone using e.g. <a target="_blank" href="https://play.google.com/store/apps/details?id=io.kubenav.kubenav&amp;hl=en-US&amp;pli=1">kubenav</a> app. This is where I was saying to myself actually</p>
<p>I added <a target="_blank" href="https://grafana.com/docs/alloy/latest/set-up/install/kubernetes/">Grafana Alloy</a> so that it can push all the logs into my existing Grafana Loki (self-hosted in my main Nomad cluster). I used <strong>Promtail</strong> already in my (deprecating) cluster for system stuff and <strong>Vector</strong> for nomad logs, but apparently <strong>Alloy</strong> is the new future-safe thing in Grafana stack so I went with it.</p>
<p>Finally, I thought what to do with my Grafana and my InfluxDB. I thought for a long time about these… the decision I made is: migrate Grafana just as any other app into k8s gradually, but don’t go with InfluxDB. I have already <strong>Prometheus</strong> so I will go with that for my k8s metrics monitoring. I just exposed it on the internal ingress and registered it in the existing Grafana as a data source.</p>
<h2 id="heading-setup-poc-with-a-small-service">Setup PoC with a small service</h2>
<p>I wasn't sure what to migrate. Simple thing? At some point, nothing is simple, even stateless services appear in my ingress <code>Caddyfile</code> via Consul Templates and get registered into Cloudflare DNS for Tailscale IPs and local <code>dnsmasque</code> for LAN IP overrides.</p>
<p>I decided to go with <code>n8n</code> service since it is just complex enough to holistically check many things:</p>
<ul>
<li><p>logs must appear in my Grafana Loki;</p>
</li>
<li><p>certificate needs to be issued;</p>
</li>
<li><p>external DNS and Caddy should forward webhooks into n8n;</p>
</li>
<li><p>internal DNS should expose the Web UI;</p>
</li>
<li><p>deployment should work automatically using ArgoCD;</p>
</li>
<li><p><a target="_blank" href="https://github.com/n8n-io/n8n/issues/863#issuecomment-699556998">bugs like this one</a> should be fixed by renaming the service from <code>n8n</code>;</p>
</li>
<li><p>NFS on my ancient Synology should be used for the persistent volume;</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745149289704/28abd3a9-1374-4f41-9e80-1767c7f50ec3.png" alt class="image--center mx-auto" /></p>
<blockquote>
<p>I still have a small but important optimization: instead of using NFS as a backup destination (to which I push the locally mounted files once per day), I should “just trust the cluster” and let it use NFS always.</p>
</blockquote>
<h2 id="heading-future-plan">Future plan</h2>
<p>The master plan is to <em>drain</em> the <code>ixion</code> node of all Nomad jobs and introduce it into the cluster as a new k3s node. Basically, let it disappear from Consul and from Nomad node listings.</p>
<p>To do that, I will have to migrate complex work into another node (there’s still space for that extra work, luckily) and migrate simple work into k8s just as I did with <code>n8n</code> service. Finally, I will then be able to introduce <code>ixion</code> as a new worker k3s node into the Kubernetes cluster.</p>
<p>Then, one by one, all the rest. I think this will be done gradually during 2025 when I find time, but the idea is to go into 2026 without consul/nomad. That’s the plan at least…</p>
]]></content:encoded></item><item><title><![CDATA[Simple CPU usage tracking in Linux using SQLite and Python]]></title><description><![CDATA[I figured that over night one of my PCs is spending time on “something”. System logs don’t show anything, app logs neither. So, I spent like 30min and made this tiny script which might be useful for you as well. I connect to my host using SSH, and I ...]]></description><link>https://blog.aleksic.dev/simple-cpu-usage-tracking-in-linux-using-sqlite-and-python</link><guid isPermaLink="true">https://blog.aleksic.dev/simple-cpu-usage-tracking-in-linux-using-sqlite-and-python</guid><category><![CDATA[Linux]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Sat, 30 Nov 2024 07:19:20 GMT</pubDate><content:encoded><![CDATA[<p>I figured that over night one of my PCs is spending time on “something”. System logs don’t show anything, app logs neither. So, I spent like 30min and made this tiny script which might be useful for you as well. I connect to my host using SSH, and I let this run over night in a tmux session. The day after I can analyze all the stored data in the SQLite database and find the problematic app.</p>
<p><strong>Requirements</strong>:</p>
<ul>
<li><p>Python 3</p>
</li>
<li><p><code>screen</code> / <code>tmux</code> to keep the session going even when you are not logged in</p>
</li>
<li><p>(optional) SQLite3 system package so that you can read the recorded data on the system</p>
</li>
</ul>
<p><strong>Source code</strong> (<code>activity-tracker.py</code>)</p>
<pre><code class="lang-python"><span class="hljs-comment">#!/usr/bin/env python3</span>
<span class="hljs-keyword">import</span> sys
<span class="hljs-keyword">import</span> time

<span class="hljs-keyword">import</span> datetime
<span class="hljs-keyword">import</span> logging
<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> sqlite3
<span class="hljs-keyword">import</span> subprocess


<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ActivityTracker</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        script_directory = os.path.dirname(os.path.abspath(sys.argv[<span class="hljs-number">0</span>]))
        db = os.path.join(script_directory, <span class="hljs-string">'activity-tracker.db'</span>)
        conn = sqlite3.connect(db)
        cursor = conn.cursor()
        cursor.execute(<span class="hljs-string">'''CREATE TABLE IF NOT EXISTS activity_log
                     (date text, app text, pid int, user text, usage real)'''</span>)
        conn.commit()
        self._db = conn

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">sync</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
            command = [<span class="hljs-string">"bash"</span>, <span class="hljs-string">"-c"</span>, <span class="hljs-string">"ps -eo pcpu,pid,user,args | grep -v '[p]s -eo' | grep -v '[a]ctivity-tracker.py' | tail -n +2 | sort -k1 -r -n | head -10"</span>]
            logging.info(<span class="hljs-string">"Running command %s"</span> % <span class="hljs-string">" "</span>.join(command))

            result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=<span class="hljs-literal">True</span>, check=<span class="hljs-literal">True</span>)
            <span class="hljs-keyword">for</span> line <span class="hljs-keyword">in</span> result.stdout.split(<span class="hljs-string">"\n"</span>):
                components = list(filter(<span class="hljs-keyword">lambda</span> x: x != <span class="hljs-string">""</span>, line.split(<span class="hljs-string">" "</span>)))
                <span class="hljs-keyword">if</span> len(components) == <span class="hljs-number">0</span>:
                    <span class="hljs-keyword">continue</span>
                <span class="hljs-keyword">if</span> len(components) &lt; <span class="hljs-number">4</span>:
                    logging.warning(<span class="hljs-string">"Line has wrong format: %s"</span> % components)
                    <span class="hljs-keyword">continue</span>
                usage = float(components[<span class="hljs-number">0</span>])
                pid = int(components[<span class="hljs-number">1</span>])
                user = components[<span class="hljs-number">2</span>]
                app = <span class="hljs-string">" "</span>.join(components[<span class="hljs-number">3</span>:])
                moment = datetime.datetime.now().isoformat()

                <span class="hljs-comment"># store in db</span>
                cursor = self._db.cursor()
                sql = <span class="hljs-string">"INSERT INTO activity_log (date, app, pid, user, usage) VALUES (?, ?, ?, ?, ?)"</span>
                cursor.execute(sql, (moment, app, pid, user, usage))
                self._db.commit()
            <span class="hljs-keyword">try</span>:
                time.sleep(<span class="hljs-number">10</span>)
            <span class="hljs-keyword">except</span> KeyboardInterrupt:
                logging.info(<span class="hljs-string">"Exiting"</span>)
                self._db.close()
                <span class="hljs-keyword">break</span>


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run</span>():</span>
    logging.getLogger().setLevel(logging.INFO)
    ActivityTracker().sync()


<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>:
    run()
</code></pre>
<p>Finally, you can track the biggest users using simple <code>watch</code> command:</p>
<pre><code class="lang-plaintext">watch -n 30 "sqlite3 activity-tracker.db -table \
  'select * from activity_log order by usage desc limit 50'"

Every 30.0s: sqlite3 activity-tracker.db -table   'select * from activity_log order by usage desc limit...

+----------------------------+------------------------------------------------------------------------+-------+--------+-------+
|            date            |                             app                                        |  pid  |  user  | usage |
+----------------------------+------------------------------------------------------------------------+-------+--------+-------+
| 2024-11-30T08:02:10.361848 | /opt/nomad/nomad agent -config=/opt/nomad/nomad.conf                   | 86884 | root   | 24.2  |
| 2024-11-30T08:01:59.840202 | /opt/consul/consul agent -config-file=/opt/consul/consul.json -rejoin  | 86428 | consul | 6.6   |
| 2024-11-30T08:02:10.383667 | /opt/consul/consul agent -config-file=/opt/consul/consul.json -rejoin  | 86428 | consul | 3.6   |
...
+----------------------------+------------------------------------------------------------------------+-------+--------+-------+
</code></pre>
]]></content:encoded></item><item><title><![CDATA[Short Note: when Spring Boot and Privoxy don't like each other]]></title><description><![CDATA[I figured out after full day of work a very peculiar behavior in Spring Boot and wanted to write this one down since it was a pretty annoying one.
Let’s say you are:

using Spring Boot and need to make HTTP client connections (using RestTemplate or R...]]></description><link>https://blog.aleksic.dev/short-note-when-spring-boot-and-privoxy-dont-like-each-other</link><guid isPermaLink="true">https://blog.aleksic.dev/short-note-when-spring-boot-and-privoxy-dont-like-each-other</guid><category><![CDATA[Java]]></category><category><![CDATA[Springboot]]></category><category><![CDATA[Spring]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Thu, 14 Nov 2024 17:25:03 GMT</pubDate><content:encoded><![CDATA[<p>I figured out after full day of work a very peculiar behavior in Spring Boot and wanted to write this one down since it was a pretty annoying one.</p>
<p>Let’s say you are:</p>
<ul>
<li><p>using Spring Boot and need to make HTTP client connections (using <code>RestTemplate</code> or <code>RestClient</code>, etc) to some server</p>
</li>
<li><p>you innocently want to use out-of-the-box Marshaller like Jackson for example (because why not)</p>
</li>
<li><p>you must utilize HTTP proxy between your app and the target server</p>
<ul>
<li>proxy causing issues is Privoxy, but… hey who knows if some others are (not) impacted</li>
</ul>
</li>
<li><p>that HTTP proxy must (for some very important enterprisey reason) inspect HTTPS connections</p>
</li>
</ul>
<p>You might notice that all clients fail in this scenario with an early EOF: Jetty, Apache HTTP Client, native JDK client… all of them.</p>
<pre><code class="lang-java"><span class="hljs-comment">// run with </span>
<span class="hljs-comment">// -Dhttps.proxyHost=my.very.nice.proxy.local</span>
<span class="hljs-comment">// -Dhttps.proxyPort=my.innocent.proxy</span>

ResponseEntity&lt;LoginResponse&gt; response = restClient.post()
  .uri(<span class="hljs-string">"https://myserver.com"</span>)
  .body(<span class="hljs-keyword">new</span> LoginRequest(<span class="hljs-string">"username"</span>, <span class="hljs-string">"password"</span>))
  .retrieve()
  .toEntity(LoginResponse.class);

// Caused by: org.springframework.web.client.ResourceAccessException: 
// I/O error on POST request <span class="hljs-keyword">for</span> "https://myserver.com": 
// HttpConnectionOverHTTP@<span class="hljs-number">6d</span>c73294::SslEndPoint@<span class="hljs-number">7</span>c52e37e[{...}]
<span class="hljs-comment">// Caused by: java.io.EOFException: HttpConnectionOverHTTP@6dc73294::SslEndPoint@7c52e37e[{...}]</span>
</code></pre>
<p>After some time I figured out it was the marshaller that was flushing the connection. This is never a problem, mind you - but if HTTPS inspection is being used, the request is sent and this triggers Privoxy to flip and not try to send the response back and just… EOFing you as well :)</p>
<p>The fix is straight-forward: avoid <code>flush()</code> done by the marshaller and just send the body transformed into string yourself:</p>
<pre><code class="lang-java"><span class="hljs-comment">// run with </span>
<span class="hljs-comment">// -Dhttps.proxyHost=my.very.nice.proxy.local</span>
<span class="hljs-comment">// -Dhttps.proxyPort=my.innocent.proxy</span>

ResponseEntity&lt;LoginResponse&gt; response = restClient.post()
  .uri(<span class="hljs-string">"https://myserver.com"</span>)
  .body(objectMapper.writeValueAsString(<span class="hljs-keyword">new</span> LoginRequest(<span class="hljs-string">"username"</span>, <span class="hljs-string">"password"</span>)))
  .retrieve()
  .toEntity(LoginResponse.class);
</code></pre>
<h2 id="heading-last-words">Last words</h2>
<p>Is this now a bug in Privoxy or in Spring? I think it’s in Spring since cURL doesn’t make this assumption that <code>flush</code> is safe. All the clients, used directly, also don’t have this issue - the proxy terminates connection correctly after sending the response. If you just turn off the marshaller in Spring behavior is correct.</p>
]]></content:encoded></item><item><title><![CDATA[Short Note: auto-tag commits on main branch]]></title><description><![CDATA[In case you use some tools (like goreleaser) that depend on running in the case of tagged commits only, but you wish at the same time not to bother with manually tagging your commits but just to push to master… I found a nice & easy way to auto-tag c...]]></description><link>https://blog.aleksic.dev/short-note-auto-tag-commits-on-main-branch</link><guid isPermaLink="true">https://blog.aleksic.dev/short-note-auto-tag-commits-on-main-branch</guid><category><![CDATA[Git]]></category><category><![CDATA[short]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Sat, 21 Sep 2024 11:23:29 GMT</pubDate><content:encoded><![CDATA[<p>In case you use some tools (like <code>goreleaser</code>) that depend on running in the case of tagged commits only, but you wish at the same time not to bother with manually tagging your commits but just to push to master… I found a nice &amp; easy way to auto-tag commits locally with a “post commit” Git hook.</p>
<p>TL;DR: If you wish to use this approach, here is the post commit hook:</p>
<pre><code class="lang-bash"><span class="hljs-meta">#!/bin/sh</span>

COMMIT=$(git rev-parse --short HEAD)
DATE=$(git <span class="hljs-built_in">log</span> -1 --format=%<span class="hljs-built_in">cd</span> --date=format:<span class="hljs-string">"%Y%m%d"</span>)
git tag -a <span class="hljs-string">"<span class="hljs-variable">$DATE</span>-<span class="hljs-variable">$COMMIT</span>"</span> -m <span class="hljs-string">"Tagging <span class="hljs-variable">$DATE</span>-<span class="hljs-variable">$COMMIT</span>"</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Code commited and tagged as <span class="hljs-variable">$DATE</span>-<span class="hljs-variable">$COMMIT</span>"</span>

<span class="hljs-built_in">echo</span> <span class="hljs-string">"Removing deprecated un-pushed tags..."</span>
git show-ref --tags | \
  grep -v -F <span class="hljs-string">"<span class="hljs-subst">$(git ls-remote --tags origin | grep -v '\^{}' | cut -f 2)</span>"</span> | \
  grep -v <span class="hljs-string">"<span class="hljs-subst">$(git rev-parse --short HEAD)</span>"</span> | \
  awk -F<span class="hljs-string">'/'</span> <span class="hljs-string">'{print $3}'</span> | \
  xargs -I{} bash -c <span class="hljs-string">"git tag --delete {}"</span>
</code></pre>
<h2 id="heading-installation-of-the-hook">Installation of the hook</h2>
<p>To install it, you need to save this file as <code>.git/hooks/post-commit</code> file and make it executable with <code>chmod +x .git/hooks/post-commit</code>.</p>
<p>What does it actually do:</p>
<h2 id="heading-step-1-create-tag-on-each-commit">Step 1 - create tag on each commit</h2>
<p>The following part will generate a reasonable tag value, like <code>20240921-4318a7c</code>, by combining current date with the short value of the last commit (the one that has just created, and for which the <code>post-commit</code> hook was triggered).</p>
<pre><code class="lang-bash">COMMIT=$(git rev-parse --short HEAD)
DATE=$(git <span class="hljs-built_in">log</span> -1 --format=%<span class="hljs-built_in">cd</span> --date=format:<span class="hljs-string">"%Y%m%d"</span>)
git tag -a <span class="hljs-string">"<span class="hljs-variable">$DATE</span>-<span class="hljs-variable">$COMMIT</span>"</span> -m <span class="hljs-string">"Tagging <span class="hljs-variable">$DATE</span>-<span class="hljs-variable">$COMMIT</span>"</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Code commited and tagged as <span class="hljs-variable">$DATE</span>-<span class="hljs-variable">$COMMIT</span>"</span>
</code></pre>
<h2 id="heading-step-2-remove-deprecated-local-tags">Step 2 - remove “deprecated” local tags</h2>
<p>In case you created locally 3 commits, you most probably want to only tag the last one, right? That’s why we need to list all the tags that are present only locally and to remove all the other tags previosly locally created by this post commit hook</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> <span class="hljs-string">"Removing deprecated un-pushed tags..."</span>
git show-ref --tags | \
  grep -v -F <span class="hljs-string">"<span class="hljs-subst">$(git ls-remote --tags origin | grep -v '\^{}' | cut -f 2)</span>"</span> | \
  grep -v <span class="hljs-string">"<span class="hljs-subst">$(git rev-parse --short HEAD)</span>"</span> | \
  awk -F<span class="hljs-string">'/'</span> <span class="hljs-string">'{print $3}'</span> | \
  xargs -I{} bash -c <span class="hljs-string">"git tag --delete {}"</span>
</code></pre>
<p>What does this part exactly do:</p>
<ol>
<li><p><code>git show-ref --tags</code> - list all the known tags</p>
</li>
<li><p><code>grep -v -F "$(git ls-remote --tags origin | grep -v '\^{}' | cut -f 2)"</code> - find the tags that are known on the <code>origin</code> remote (you might want to replace <code>origin</code> with the name of your remote in case you don’t use the default one), and create a negative regex filter for those for the incoming stream</p>
</li>
<li><p><code>grep -v "$(git rev-parse --short HEAD)"</code> - identify the current tag (which we want to keep since we’ve just made it) and create a negative filter to exclude it from the incoming stream</p>
</li>
<li><p><code>awk -F'/' '{print $3}'</code> - keep only the last part of the tag names only by removing all the text before the last <code>/</code> character</p>
</li>
<li><p><code>xargs -I{} bash -c "git tag --delete {}"</code> - run the tag deletion of the tags that remain</p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[Short Note: use DuckDB as (pipe) jq replacement]]></title><description><![CDATA[The usage of the jq can be substituted sometimes by DuckDB with a bit cleaner syntax.
For example, to get properties Id and State.Status from a list of active containers in docker one might do this:
➜ docker inspect f42 | jq '.[] | "\(.Id) \(.State.S...]]></description><link>https://blog.aleksic.dev/short-note-use-duckdb-as-pipe-jq-replacement</link><guid isPermaLink="true">https://blog.aleksic.dev/short-note-use-duckdb-as-pipe-jq-replacement</guid><category><![CDATA[Bash]]></category><category><![CDATA[jq]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Fri, 17 May 2024 08:03:20 GMT</pubDate><content:encoded><![CDATA[<p>The usage of the <code>jq</code> can be substituted sometimes by DuckDB with a bit cleaner syntax.</p>
<p>For example, to get properties <code>Id</code> and <code>State.Status</code> from a list of active containers in docker one might do this:</p>
<pre><code class="lang-bash">➜ docker inspect f42 | jq <span class="hljs-string">'.[] | "\(.Id) \(.State.Status)"'</span> -r
f42d97a8670846a737a972329cf44e5b092954ef7e30681f73d70963ad971e43 running
</code></pre>
<p>but it might be easier to write a modern SQL JSON:</p>
<pre><code class="lang-bash">➜ docker inspect f42 | mjq <span class="hljs-string">"select Id, State-&gt;&gt;'Status' from JQ"</span>
┌──────────────────────────────────────────────────────────────────┬──────────────────────┐
│                                Id                                │ (State -&gt;&gt; <span class="hljs-string">'Status'</span>) │
│                             varchar                              │       varchar        │
├──────────────────────────────────────────────────────────────────┼──────────────────────┤
│ f42d97a8670846a737a972329cf44e5b092954ef7e30681f73d70963ad971e43 │ running              │
└──────────────────────────────────────────────────────────────────┴──────────────────────┘
</code></pre>
<p>How does <code>mjq</code> work? It's this function basically:</p>
<pre><code class="lang-bash"><span class="hljs-function"><span class="hljs-title">mjq</span></span>() {
  <span class="hljs-built_in">local</span> TFILE=<span class="hljs-string">"/tmp/mjq-<span class="hljs-subst">$((RANDOM % 100)</span>).json"</span>
  cat &gt;  <span class="hljs-variable">${TFILE}</span>
  <span class="hljs-built_in">local</span> SQL=<span class="hljs-string">"<span class="hljs-variable">${1//JQ/read_json('${TFILE}</span>')}"</span>
  duckdb -c <span class="hljs-string">"<span class="hljs-variable">${SQL}</span>"</span>
}
</code></pre>
<p>I used <code>mjq</code> because that's a prefix for my tools. You can build any SQL you want as arg, and <code>JQ</code> string will be replaced with the <code>read_json</code> output of the file provided as the input.</p>
<p>Further ideas:</p>
<ul>
<li>the file was used since then further debugging can also be done without re-doing the upstream bash command. YMMV</li>
</ul>
<hr />
<p>More information about usage of DuckDB and the motivation for this script: <a target="_blank" href="https://www.pgrs.net/2024/03/21/duckdb-as-the-new-jq/">https://www.pgrs.net/2024/03/21/duckdb-as-the-new-jq/</a></p>
<p>More information about DuckDB JSON extension: <a target="_blank" href="https://duckdb.org/docs/extensions/json.html">https://duckdb.org/docs/extensions/json.html</a></p>
]]></content:encoded></item><item><title><![CDATA[Short Note: docker-compose and Tailscale connectivity issues]]></title><description><![CDATA[TL;DRThe article discusses connectivity issues between Docker Compose and services exposed via Tailscale after the OS package updates. After troubleshooting, the issue was identified as Docker Compose not running container in bridged mode. The soluti...]]></description><link>https://blog.aleksic.dev/short-note-docker-compose-and-tailscale-connectivity-issues</link><guid isPermaLink="true">https://blog.aleksic.dev/short-note-docker-compose-and-tailscale-connectivity-issues</guid><category><![CDATA[Docker]]></category><category><![CDATA[networking]]></category><category><![CDATA[tailscale]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Fri, 10 May 2024 04:00:35 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/4h0HqC3K4-c/upload/00060303cf0680da9c3f1a5dd70ddc3a.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p>TL;DR<br />The article discusses connectivity issues between Docker Compose and services exposed via Tailscale after the OS package updates. After troubleshooting, the issue was identified as Docker Compose not running container in bridged mode. The solution involved modifying the docker-compose.yml file and using the docker compose command.</p>
</blockquote>
<p>I had a very old Caddy service setup using <code>docker-compose</code> that was "just working" for years, and after the weekly GH Actions Ansible-driven update of all apps (including <code>docker</code> / <code>docker-compose</code> / <code>tailscale</code>) I've noticed all sorts of alerts popping up since Caddy couldn't access services exposed via Tailscale on other nodes.</p>
<p>Apparently couple of things changed:</p>
<ol>
<li><p>Tailscale was updated</p>
</li>
<li><p>Docker was updated</p>
</li>
<li><p>Docker Compose was updated</p>
</li>
</ol>
<p>Now, the first reaction was to restart docker on all nodes, but that helped fix other services (probably some kind of breaking change), but no amount of restarting didn't help to recover the Caddy.</p>
<p>It was a very simple docker compose setup, just this:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">version:</span> <span class="hljs-string">'3'</span>

<span class="hljs-attr">services:</span>
  <span class="hljs-attr">caddy:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">milanaleksic/caddy-cloudflare:2.7.6</span>
    <span class="hljs-attr">ports:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-number">80</span><span class="hljs-string">:80</span>
      <span class="hljs-bullet">-</span> <span class="hljs-number">443</span><span class="hljs-string">:443</span>
    <span class="hljs-attr">volumes:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">./data/caddy-config:/config</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">./data/caddy-data:/data</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">./config:/etc/caddy</span>
</code></pre>
<p>And Caddy was reporting bunch of (for each attempt to access URLs which were proxied into somewhere internally within the Tailscale network):</p>
<pre><code class="lang-plaintext">... dial tcp 100.85.131.92:22487: i/o timeout ...
</code></pre>
<p>What made this issue extremely interesting is that I had to try various things until I have figured out what is going on:</p>
<p><strong>From this node</strong> a normal <code>curl 100.85.131.92:22487</code> just worked.</p>
<p>When I start <strong>a simple docker container</strong> it also just worked.</p>
<p>Even when I manually start the docker image from above: <code>milanaleksic/caddy-cloudflare:2.7.6</code> also <code>curl</code> just worked!</p>
<p>I thought I was going crazy but then I had to get the big guns and run diff analysis of the outputs of <code>docker inspect container1</code> and <code>docker inspect container2</code> commands (where the 2 containers were the one that I started manually vs the one compose started). And the problem exposed it self: the docker-compose didn't run network in <em>bridged mode</em>. That was the difference between manually started container and the one started by <code>docker-compose</code>.</p>
<h3 id="heading-solution">Solution</h3>
<p>What I ended up doing was changing the <code>docker-compose.yml</code> to:</p>
<pre><code class="lang-diff"><span class="hljs-deletion">- version: '3'</span>
<span class="hljs-deletion">- </span>
services:
  caddy:
    image: milanaleksic/caddy-cloudflare:2.7.6
<span class="hljs-addition">+     networks:</span>
<span class="hljs-addition">+       - caddy</span>
    ports:
      - 80:80
      - 443:443
    volumes:
      - ./data/caddy-config:/config
      - ./data/caddy-data:/data
      - ./config:/etc/caddy
<span class="hljs-addition">+ </span>
<span class="hljs-addition">+ networks:</span>
<span class="hljs-addition">+   caddy:</span>
<span class="hljs-addition">+     driver: bridge</span>
</code></pre>
<p>To make this new <code>docker-compose.yml</code> file work I had to actually stop using <code>docker-compose</code> to run it since it was deprecated for a while now and just ran the <code>docker compose</code> command (the former was still written in Python, and the feature was migrated into Go CLI <code>docker</code> command).</p>
<p>💣, it just works now!</p>
]]></content:encoded></item><item><title><![CDATA[Short Note: control diffuser via Stream Deck]]></title><description><![CDATA[What and why?
Well, it took me forever to figure it out, but apparently during the winter months I have problems with my nose not because I became super-sensitive as I age, but simply because humidity at my home office is not as good as it is was in ...]]></description><link>https://blog.aleksic.dev/short-note-control-diffuser-via-stream-deck</link><guid isPermaLink="true">https://blog.aleksic.dev/short-note-control-diffuser-via-stream-deck</guid><category><![CDATA[Homelab]]></category><category><![CDATA[stream deck]]></category><category><![CDATA[Home Assistant]]></category><category><![CDATA[iot]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Sat, 10 Feb 2024 23:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/Q1p7bh3SHj8/upload/3d9a8c7fff858a27a637480951825b43.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-what-and-why">What and why?</h2>
<p>Well, it took me forever to figure it out, but apparently during the winter months I have problems with my nose not because I became super-sensitive as I age, but simply because humidity at my home office is not as good as it is was in the "real" office.</p>
<p>I, therefore, bought a simple smart diffuser by a company "Gologi" which looks cool I guess and fits well into my office. So, basically, how do I turn it on/off? No fancy scheduling required, mind you, just a small on/off switch on my stream deck (I don't want to use their smartphone app).</p>
<h2 id="heading-tldr">TL;DR</h2>
<p>You need to:</p>
<ol>
<li><p>rewire this IoT device into Tuya cloud via Tuya smart app to get the remote access capability</p>
</li>
<li><p>connect Home Assistant with Tuya smart app (not via the Tuya cloud API, that's enterprise-level pricing)</p>
</li>
<li><p>add Home Assistant control buttons to Stream Deck</p>
</li>
<li><p>profit!</p>
</li>
</ol>
<h2 id="heading-step-by-step">Step by step</h2>
<h3 id="heading-tuya-smart-app">Tuya smart app</h3>
<p>Gologi has a <a target="_blank" href="https://play.google.com/store/apps/details?id=com.app.gologi&amp;pli=1">smart app already that can control the humidifer</a>, but if you use that one you can't actually do anything in regards to the cloud-driven control.</p>
<p>You need to install another smartphone app: <a target="_blank" href="https://play.google.com/store/apps/details?id=com.tuya.smart">Tuya Smart App</a>. Then, you need to go through the "add the device" flow to be able to get control over it from the Tuya app.</p>
<p>Interestingly, even after I have uninstalled the Gologi app I got the same interface on my smartphone to handle the Gologi humidifier from within Tuya 💣 . This brings me to the conclusion that Gologi is just one of the brands that built their IoT integration on the Tuya cloud and that's why it was so easy to add it into the Tuya app.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707730137418/0f9de54f-a731-44a6-86b0-5c773e7b4f5b.jpeg" alt class="image--center mx-auto" /></p>
<p>Now, do you see the "Stop diffuser" and "Start diffuser" buttons on the top? Those are there since I have added Tuya "Scenes" that trigger On/Off buttons of the diffuser. These will become important momentarily, since they will allow the actions from the Home Assistant <em>even though HA doesn't support this type of Tuya devices</em>, so you better add them:</p>
<ol>
<li><p>Open Tuya App</p>
</li>
<li><p>Go to "Scene"</p>
</li>
<li><p>Add a "Tap-to-run" action</p>
</li>
<li><p>Choose "Control Single Device" -&gt; Smart diffuser -&gt; "Switch" -&gt; "ON"</p>
</li>
<li><p>Do the same thing for "OFF" switch</p>
</li>
</ol>
<h3 id="heading-home-assistant-integration">Home Assistant integration</h3>
<p>You don't necessarily have to use HA, but it's the thing I have found is so ubiquitous these days that you might be probably missing a lot if you don't have it in your home network <em>and still want to play around with IoT devices.</em> YMMV if you are using some other hub.</p>
<p>Within Home Assistant you just need to add Tuya integration. That's it.</p>
<blockquote>
<p>Important: If your HA is older than 2024.0 please upgrade, since earlier versions demanded more complex setup using Tuya Cloud and developer accounts; newer version integrates on a higher level without any need to go into the cloud - it just uses the Tuya Smart app QR code scanner to integrate, easy-peasy!Scene</p>
</blockquote>
<p>Scene after adding the Tuya integration shows "2 entities" which are the 2 scenes we've added in the Tuya smart app!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707730604969/7c1d9800-a9b3-40cb-b417-e62a955e288f.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-stream-deck-setup">Stream Deck setup</h3>
<p>Stream Deck is amazing since, even though it was meant for the streamers (obviously) it's an extremely hack-friendly device and can execute anything you can think of. In this case I want to have a panel setup like shown on the screenshot below, one for ON and one for OFF button.</p>
<p>Some important parts:</p>
<ol>
<li><p>you need to install official HA plugin for the stream deck</p>
</li>
<li><p>you need to provide a "long-lived access token" from HA and expose (if you haven't already) the home assistant websocket URL</p>
</li>
<li><p>both Keypad Appearance and the Keypad Action need to be selected and configured, otherwise button will do nothing</p>
</li>
</ol>
<blockquote>
<p>Note: in my case the HA is behind Tailscale, but since Stream Deck talks to the server software within your connected computer, and then that software talks to the HA, you do not need to expose HA to Stream Deck via a public API, you can just use internal IP</p>
</blockquote>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707730845601/0a2d6ebf-9d5c-441c-b8f8-7d5f08416735.png" alt class="image--center mx-auto" /></p>
<p>And the "Short press" Keypad action is configured like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707730823061/c4dc5588-7e97-4623-89d6-1e18ddd554cd.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-conclusion">Conclusion</h3>
<p>This was kind of a multi-day exercise to figure out what works and what doesn't work. There are, like with any integration, many moving pieces, expecting no breaking API changes on Tuya Cloud (cloud service... I hope it remains stable), smartphone app, HA plugin, HA itself... but that's just normal day in IT I guess.</p>
<p>It's amazing how little thing like Stream Deck button helps removing context-switching tasks out of your daily life. I'm very happy with my Stream Deck and are continuously thinking "what else can I use it for" and definitely would recommend it for your homelab / office setup since it is very extensible tool that is super useful.</p>
]]></content:encoded></item><item><title><![CDATA[When Nomad misses a (heart)beat]]></title><description><![CDATA[In my homelab I have a hybrid setup (nodes both in the cloud and in the basement), and I use Tailscale to bridge the physical gap in the network.
What I have noticed, though (actually, for a while already, just didn't bother to investigate) is the fo...]]></description><link>https://blog.aleksic.dev/when-nomad-misses-a-heartbeat</link><guid isPermaLink="true">https://blog.aleksic.dev/when-nomad-misses-a-heartbeat</guid><category><![CDATA[nomad]]></category><category><![CDATA[distributed system]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Sun, 26 Mar 2023 16:58:53 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/Y9mWkERHYCU/upload/f47161627d899fa7d47ec9ed24d0c098.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In my homelab I have a hybrid setup (nodes both in the cloud and in the basement), and I use <em>Tailscale</em> to bridge the physical gap in the network.</p>
<p>What I have noticed, though (actually, for a while already, just didn't bother to investigate) is the following mystery:</p>
<ol>
<li><p>One node receives a lot of work to do (think: request for multi-platform Docker build via Gitea Actions)</p>
</li>
<li><p>Docker containers got restarted on that node amass</p>
</li>
<li><p>Nomad restarts all the jobs and everything "just works" again</p>
</li>
</ol>
<p>Now, because of point 3) I never really had an incentive to find and fix the problem since Nomad just stabilizes the system rather quickly (1 min for example). This problem was occurring and reoccurring for <em>months</em>, but I didn't care much (pets vs cattle and all that).</p>
<p>What eventually turned me around is the fact that this problem occurred during my introduction of a new CI/CD platform (I am replacing Drone CI with Gitea Actions), and debugging long builds that <em>also fail because Docker containers running those builds die</em> is not the most optimal use case of my free time.</p>
<p>Now, my assumption was that the node would just work with the basic Linux (Debian) setup without any system tinkering (Ansible sets up the ssh server, my user account with a bunch of dotfiles, etc, but that's all just normal customization everyone does). That was a standard, but a lousy assumption.</p>
<p>Down the rabbit hole, we go...</p>
<h2 id="heading-is-it-oom">Is it OOM?</h2>
<p>I've noticed in my <code>dmesg</code> output [1] that some Docker processes were taken down by the OOM manager and immediately added swap to the system.</p>
<p>It's not that hard to add a swap file to a Debian system, for example, Digital Ocean has nice and very readable articles that handle basic administration tasks for standard Linux packages, so they have <a target="_blank" href="https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-20-04">an article for the swap as well</a>.</p>
<blockquote>
<p>Now, it is said that in the cloud age one shouldn't really depend on swap and the machine workload should just be stable enough to work out of RAM (because of various reasons), but we are talking about a small 16GB laptop [2] so it makes sense <em>for me</em> to still resort to the swap for those peak moments.</p>
</blockquote>
<p>This helped a bit and I had the docker work much more stable. Containers still disappeared, though, so it definitely wasn't the root cause, it was just a reason for the failures to happen (even) more often.</p>
<h2 id="heading-is-it-the-docker-service">Is it the Docker service?</h2>
<p>Now, the next suspect was Docker Daemon itself.</p>
<p>I've noticed this strange message appearing over and over again in the logs:</p>
<pre><code class="lang-plaintext">Your kernel does not support memory swappiness capabilities or the cgroup is not mounted. Memory swappiness discarded
</code></pre>
<p>But this is a standard Debian Linux, why wouldn't that <code>cgroup</code> be enabled/mounted? Weird. But, diving into the Net we find indeed that Docker says this is a normal thing, they even have <a target="_blank" href="https://docs.docker.com/engine/install/troubleshoot/#kernel-cgroup-swap-limit-capabilities">part of the docs only for this specific message</a>.</p>
<p>Needless to say, their attempt didn't work unfortunately for the current stable Debian (11). I had to research further and found this <a target="_blank" href="https://github.com/canonical/microk8s/issues/1691#issuecomment-977543458">great thread for microk8s</a> that exposed <a target="_blank" href="https://github.com/canonical/microk8s/issues/1691#issuecomment-977543458">a change in Debian</a> and later in the thread the way how to work around it until cgroups v2 are supported finally my hero who had <a target="_blank" href="https://github.com/canonical/microk8s/issues/1691#issuecomment-1279774327">the same exact problem</a> and the solution presented itself to finally get read of the message in the logs:</p>
<pre><code class="lang-ini"><span class="hljs-comment"># set in /etc/default/grub</span>
<span class="hljs-attr">GRUB_CMDLINE_LINUX</span>=<span class="hljs-string">"cgroup_enable=memory cgroup_memory=1 systemd.unified_cgroup_hierarchy=0"</span>
</code></pre>
<p>Then just do an <code>update-grub</code> and restart the laptop [3]. Of course, this removed the predominant error message, but (you guessed probably), the containers continue to die under large pressure.</p>
<h2 id="heading-is-it-nomad">Is it Nomad?</h2>
<p>But then, I figured out something (at that time, completely obvious): the containers that were restarted were <em>all</em> <em>Nomad jobs</em>. So, no other container running in that Docker container was ever restarted. 🤦</p>
<p>I refocused now on the Nomad setup: what could have gone wrong there?</p>
<p>Nomad has very complex machinery behind the job scheduling simplicity and they have great documentation.</p>
<p>What I have missed so far is the fact that although no containers failed, there were effectively killed by the scheduler. If the Nomad agent on that node doesn't communicate with the server using a heartbeat mechanism.</p>
<p>Another research and another great Github issue thread and here we go: Nomad team lead simply says that <a target="_blank" href="https://github.com/hashicorp/nomad/issues/3289#issuecomment-332884234">there is a way to go around</a> this problem:</p>
<blockquote>
<blockquote>
<p>Just curious is there any way to increase heartbeat manually?</p>
</blockquote>
<p>There are a few heartbeat related settings on the server: <a target="_blank" href="https://www.nomadproject.io/docs/agent/configuration/server.html#heartbeat_grace">https://www.nomadproject.io/docs/agent/configuration/server.html#heartbeat_grace</a></p>
</blockquote>
<p>So, the solution on my network setup is that I had to increase the heartbeat to something longer than the default (default value of <code>10s</code> for the <code>heartbeat_grace</code> in the <code>server</code> block of the server Nomad configuration was replaced with <em>extremely</em> large <code>120s</code>, but I'm all down with tiny hammers at this point).</p>
<p>Increasing the grace setting is probably the most straightforward way to give clients more time to recover in the cases when CPU is under a very heavy load. In this case, I also assume that the heartbeat going over Tailscale VPN and the server being a rather old and weak Intel chip also <em>doesn't help</em>.</p>
<p>Many Gitea Actions later with very heavy CPU &amp; memory workloads, and Nomad still didn't decide to shut any node down.</p>
<p>So, so far, so good: no more restarts noticed. Let's hope it stays stable! I definitely don't want to see this problem again &amp; hope will not see it occurring ever again.</p>
<hr />
<ol>
<li><p>Or was it <code>journalctl -xn</code>? Not sure - this was a long time ago at this point. But, definitely, there were oom strings in the logs and I had that problem</p>
</li>
<li><p>I know, I should replace it with an Intel NUC. I am just waiting for the damn thing to fail... it's working well enough for 6 years already and I don't like replacing stuff that just works</p>
</li>
<li><p>Or, in my case, use my Telegram bot that runs <a target="_blank" href="https://github.com/milanaleksic/laptop-booter/">laptop-booter</a> to go through Intel AMT for a power cycle and then Dropbear SSH port to run <code>cryptroot-unlock</code> for my full-disk-encryption setup (but you know, that's just me)</p>
</li>
</ol>
<hr />
]]></content:encoded></item><item><title><![CDATA[Short Note: Sync Cloudflare DNS targeting Caddy within Nomad]]></title><description><![CDATA[What
Let's say, for the sake of this short note, that you have:

Cloudflare DNS which you use to expose your service to the world (or to your internal tailscale/VPN/home network)
Caddy is your reverse proxy of choice
Nomad is your deployment system o...]]></description><link>https://blog.aleksic.dev/short-note-sync-cloudflare-dns-targeting-caddy-within-nomad</link><guid isPermaLink="true">https://blog.aleksic.dev/short-note-sync-cloudflare-dns-targeting-caddy-within-nomad</guid><category><![CDATA[nomad]]></category><category><![CDATA[Python]]></category><category><![CDATA[cloudflare]]></category><category><![CDATA[Caddy]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Thu, 23 Jun 2022 06:38:05 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/Oz_J_FXKvIs/upload/v1655969152366/3TalVqC0j.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-what">What</h2>
<p>Let's say, for the sake of this short note, that you have:</p>
<ul>
<li><strong>Cloudflare DNS</strong> which you use to expose your service to the world (or to your internal tailscale/VPN/home network)</li>
<li><strong>Caddy</strong> is your reverse proxy of choice</li>
<li><strong>Nomad</strong> is your deployment system of choice</li>
</ul>
<p>Well, now, here is how I synchronize DNS records on Cloudflare based on the service discovery in Nomad / Consul using some shortcuts and a very small Python script...</p>
<p>Result of this process is that if I have a service with name "chronograf" my DNS record <code>chronograf.mycooldomain.com</code> gets updated.</p>
<h2 id="heading-how">How</h2>
<p>First, the nomad job definition:</p>
<pre><code>job <span class="hljs-string">"internal-proxy"</span> {
  datacenters = [<span class="hljs-string">"DC1"</span>]
  <span class="hljs-built_in">type</span>        = <span class="hljs-string">"service"</span>

  constraint {
    attribute = <span class="hljs-string">"<span class="hljs-variable">${attr.unique.hostname}</span>"</span>
    value     = <span class="hljs-string">"pluto"</span>
  }

  group <span class="hljs-string">"main"</span> {
    ephemeral_disk {
      migrate = <span class="hljs-literal">true</span>
      size    = 150
      sticky  = <span class="hljs-literal">true</span>
    }

    task <span class="hljs-string">"caddy"</span> {
        <span class="hljs-comment"># define the Caddy task. not relevant for this short note, </span>
        <span class="hljs-comment">#perhaps a nice topic for another one</span>
    }

    task <span class="hljs-string">"syncer"</span> {
      driver = <span class="hljs-string">"docker"</span>

      config {
        image   = <span class="hljs-string">"python:3.10.4-slim-bullseye"</span>
        volumes = [
          <span class="hljs-string">"local/:/etc/dns-sync"</span>
        ]
        <span class="hljs-built_in">command</span> = <span class="hljs-string">"python3"</span>
        args    = [<span class="hljs-string">"/etc/dns-sync/syncer.py"</span>]
      }

      env {
        ZONE_ID_FILE      = <span class="hljs-string">"/etc/dns-sync/cf_zone_id"</span>
        CF_API_TOKEN_FILE = <span class="hljs-string">"/etc/dns-sync/cf_api_key"</span>
        DNS_MAPPING_FILE  = <span class="hljs-string">"/etc/dns-sync/records.txt"</span>
      }

      template {
        data        = &lt;&lt;EOF
[[ fileContents <span class="hljs-string">"syncer.py"</span> ]]
EOF
        destination = <span class="hljs-string">"local/syncer.py"</span>
      }

      template {
        data          = <span class="hljs-string">"{{ key \"cloudFlare/zoneId\" }}"</span>
        destination   = <span class="hljs-string">"local/cf_zone_id"</span>
        change_mode   = <span class="hljs-string">"signal"</span>
        change_signal = <span class="hljs-string">"SIGUSR1"</span>
      }

      template {
        data          = <span class="hljs-string">"{{ key \"cloudFlare/cfApi\" }}"</span>
        destination   = <span class="hljs-string">"local/cf_api_key"</span>
        change_mode   = <span class="hljs-string">"signal"</span>
        change_signal = <span class="hljs-string">"SIGUSR1"</span>
      }

      template {
        data          = &lt;&lt;EOF
{{range <span class="hljs-variable">$tag</span>, <span class="hljs-variable">$services</span> := services | byTag}}{{ <span class="hljs-keyword">if</span> eq <span class="hljs-variable">$tag</span> <span class="hljs-string">"expose-internal"</span> }}{{range <span class="hljs-variable">$services</span>}}{{ .Name }}.{{ key <span class="hljs-string">"cloudFlare/domain"</span> }}|{{ with node <span class="hljs-string">"pluto"</span> }}{{ .Node.Address }}{{ end }}
{{end}}{{end}}{{end}}
EOF
        destination   = <span class="hljs-string">"local/records.txt"</span>
        change_mode   = <span class="hljs-string">"signal"</span>
        change_signal = <span class="hljs-string">"SIGUSR1"</span>
      }

      resources {
        cpu    = 100
        memory = 100
      }
    }
  }
}
</code></pre><p>Some assumptions I took in the job above:</p>
<ul>
<li>I am using Consul as key/value store and utilizing it to get the services I am interested in and I am using Consul Templating to get this going</li>
<li>I only want to expose <a target="_blank" href="https://www.nomadproject.io/docs/job-specification/service#tags">services that are tagged</a> with <code>expose-internal</code></li>
<li>operationally, I "know" that this will be deployed only on a node with name <code>pluto</code>,</li>
<li>my <em>personal Cloudflare API token</em> and the <em>zone ID</em> which I plan on syncing are stored in Consul K/V storage</li>
<li>also, my domain under which I am exposing services is also in Consul K/V storage</li>
<li>I construct files which are updated automatically by Consul when/if K/V settings change OR if a new service mapping becomes available</li>
</ul>
<p>Now, the script that does the syncing Nomad -&gt; Cloudflare using pure Python 3.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> argparse
<span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> logging
<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> signal
<span class="hljs-keyword">import</span> sys
<span class="hljs-keyword">import</span> urllib.parse
<span class="hljs-keyword">import</span> urllib.request
<span class="hljs-keyword">from</span> typing <span class="hljs-keyword">import</span> Dict, List, Optional


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">read_secret_multi</span>(<span class="hljs-params">secret_name_file: str</span>) -&gt; List[str]:</span>
    <span class="hljs-keyword">with</span> open(read_env_or_fail(secret_name_file), <span class="hljs-string">'r'</span>) <span class="hljs-keyword">as</span> env_file_file:
        <span class="hljs-keyword">return</span> [x.strip() <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> env_file_file.readlines()]


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">read_secret</span>(<span class="hljs-params">secret_name_file: str</span>) -&gt; str:</span>
    <span class="hljs-keyword">with</span> open(read_env_or_fail(secret_name_file), <span class="hljs-string">'r'</span>) <span class="hljs-keyword">as</span> env_file_file:
        <span class="hljs-keyword">return</span> env_file_file.readline()


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">read_env_or_fail</span>(<span class="hljs-params">secret_name_file</span>):</span>
    env_file = os.getenv(secret_name_file, <span class="hljs-literal">None</span>)
    <span class="hljs-keyword">if</span> env_file <span class="hljs-keyword">is</span> <span class="hljs-literal">None</span>:
        logging.error(<span class="hljs-string">f"Environment variable <span class="hljs-subst">{secret_name_file}</span> was not present, giving up"</span>)
        sys.exit(<span class="hljs-number">1</span>)
    <span class="hljs-keyword">return</span> env_file


<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Syncer</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, zone_id: str, cf_api_token: str, dns_records: List[str]</span>):</span>
        self.zone_id = zone_id
        self.cf_api_token = cf_api_token
        self.mappings = {x.split(<span class="hljs-string">'|'</span>)[<span class="hljs-number">0</span>]: x.split(<span class="hljs-string">'|'</span>)[<span class="hljs-number">1</span>] <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> dns_records <span class="hljs-keyword">if</span> x}

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">cf_api</span>(<span class="hljs-params">self, path: str, method: Optional[str] = <span class="hljs-string">'GET'</span>, data: Optional[Dict] = None</span>):</span>
        req = urllib.request.Request(
            url=<span class="hljs-string">f'https://api.cloudflare.com/client/v4/<span class="hljs-subst">{path}</span>'</span>,
            headers={
                <span class="hljs-string">"Authorization"</span>: <span class="hljs-string">f"Bearer <span class="hljs-subst">{self.cf_api_token}</span>"</span>,
                <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span>,
            },
            data=json.dumps(data).encode(<span class="hljs-string">'utf-8'</span>) <span class="hljs-keyword">if</span> data <span class="hljs-keyword">else</span> <span class="hljs-literal">None</span>,
            method=method,
        )
        <span class="hljs-keyword">with</span> urllib.request.urlopen(req) <span class="hljs-keyword">as</span> f:
            <span class="hljs-keyword">return</span> json.loads(f.read().decode(<span class="hljs-string">'utf-8'</span>))

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">sync</span>(<span class="hljs-params">self</span>):</span>
        logging.info(<span class="hljs-string">"Syncing records..."</span>)
        records = self.cf_api(<span class="hljs-string">f'zones/<span class="hljs-subst">{self.zone_id}</span>/dns_records'</span>)[<span class="hljs-string">'result'</span>]
        cf_records = {rec[<span class="hljs-string">'name'</span>]: rec <span class="hljs-keyword">for</span> rec <span class="hljs-keyword">in</span> records}
        <span class="hljs-keyword">for</span> our_mapping, target <span class="hljs-keyword">in</span> self.mappings.items():
            cf_mapping: Dict = cf_records.get(our_mapping, <span class="hljs-literal">None</span>)
            <span class="hljs-keyword">if</span> cf_mapping:
                <span class="hljs-keyword">if</span> cf_mapping[<span class="hljs-string">'content'</span>] == target <span class="hljs-keyword">and</span> cf_mapping[<span class="hljs-string">'type'</span>] == <span class="hljs-string">'A'</span>:
                    logging.info(<span class="hljs-string">f"Identical mapping on CF for: <span class="hljs-subst">{our_mapping}</span>, ignoring"</span>)
                <span class="hljs-keyword">else</span>:
                    logging.info(<span class="hljs-string">f"Updating existing mapping on CF for: <span class="hljs-subst">{our_mapping}</span> -&gt; <span class="hljs-subst">{target}</span>"</span>)
                    self.cf_api(path=<span class="hljs-string">f"zones/<span class="hljs-subst">{self.zone_id}</span>/dns_records/<span class="hljs-subst">{cf_mapping[<span class="hljs-string">'id'</span>]}</span>"</span>, method=<span class="hljs-string">'PATCH'</span>,
                                data=self.make_cf_record_dto(our_mapping, target))
            <span class="hljs-keyword">else</span>:
                logging.info(<span class="hljs-string">f"Adding new mapping on CF for: <span class="hljs-subst">{our_mapping}</span> -&gt; <span class="hljs-subst">{target}</span>"</span>)
                self.cf_api(path=<span class="hljs-string">f'zones/<span class="hljs-subst">{self.zone_id}</span>/dns_records'</span>, method=<span class="hljs-string">'POST'</span>,
                            data=self.make_cf_record_dto(our_mapping, target))

<span class="hljs-meta">    @staticmethod</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">make_cf_record_dto</span>(<span class="hljs-params">our_mapping, target</span>):</span>
        <span class="hljs-keyword">return</span> {
            <span class="hljs-string">"type"</span>: <span class="hljs-string">"A"</span>,
            <span class="hljs-string">"name"</span>: our_mapping,
            <span class="hljs-string">"content"</span>: target,
            <span class="hljs-string">"proxied"</span>: <span class="hljs-literal">False</span>
        }


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run</span>():</span>
    Syncer(
        zone_id=read_secret(<span class="hljs-string">"ZONE_ID_FILE"</span>),
        cf_api_token=read_secret(<span class="hljs-string">"CF_API_TOKEN_FILE"</span>),
        dns_records=read_secret_multi(<span class="hljs-string">"DNS_MAPPING_FILE"</span>),
    ).sync()


<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>:
    parser = argparse.ArgumentParser(prog=<span class="hljs-string">'syncer.py'</span>, description=<span class="hljs-string">"Automation script for Cloudflare record syncing"</span>)
    parser.add_argument(<span class="hljs-string">'--debug'</span>, default=<span class="hljs-literal">False</span>, required=<span class="hljs-literal">False</span>, action=<span class="hljs-string">'store_true'</span>, dest=<span class="hljs-string">"debug"</span>,
                        help=<span class="hljs-string">'debug flag'</span>)
    args = parser.parse_args()
    <span class="hljs-keyword">if</span> args.debug:
        logging.getLogger().setLevel(logging.DEBUG)
    <span class="hljs-keyword">else</span>:
        logging.getLogger().setLevel(logging.INFO)

    <span class="hljs-comment"># initial run</span>
    run()
    <span class="hljs-comment"># subsequent run</span>
    signal.signal(signal.SIGUSR1, <span class="hljs-keyword">lambda</span> sig, frame: run())
    <span class="hljs-comment"># exit gracefully</span>
    signal.signal(signal.SIGINT, <span class="hljs-keyword">lambda</span> sig, frame: sys.exit(<span class="hljs-number">0</span>))
    signal.signal(signal.SIGTERM, <span class="hljs-keyword">lambda</span> sig, frame: sys.exit(<span class="hljs-number">0</span>))
    <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
        logging.info(<span class="hljs-string">'Waiting for signals...'</span>)
        signal.pause()
</code></pre>
]]></content:encoded></item><item><title><![CDATA[Using Ansible & Nomad for a homelab (part 2)]]></title><description><![CDATA[This is a continuation of my previous article "Using Ansible & Nomad for a homelab (part 1)" which you'd probably want to read first to follow up where I left off there.
Nomad
Nomad is a well-known workload orchestrator. I have decided to automate my...]]></description><link>https://blog.aleksic.dev/using-ansible-and-nomad-for-a-homelab-part-2</link><guid isPermaLink="true">https://blog.aleksic.dev/using-ansible-and-nomad-for-a-homelab-part-2</guid><category><![CDATA[automation]]></category><category><![CDATA[smart home]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Thu, 17 Mar 2022 09:35:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/lVZjvw-u9V8/upload/v1647420429831/71AtmIhsZ.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is a continuation of my previous article "<a target="_blank" href="https://blog.aleksic.dev/using-ansible-and-nomad-for-a-homelab-part-1">Using Ansible &amp; Nomad for a homelab (part 1)</a>" which you'd probably want to read first to follow up where I left off there.</p>
<h1 id="heading-nomad">Nomad</h1>
<p><a target="_blank" href="https://www.nomadproject.io/">Nomad</a> is a well-known workload orchestrator. I have decided to automate my homelab cluster using it. I will through this blog post try to walk you through some discoveries I made on the way during the previous couple of months.</p>
<p>Features that drove me to Nomad:</p>
<ul>
<li>Conciseness,</li>
<li>Evolvable setup (constraints, static ports are there for simple setups for example),</li>
<li>I already had knowledge of kubernetes and wanted to try something else</li>
<li>I had experience with Terraform and Consul so I was sure Nomad is probably a good choice to at least try it out.</li>
</ul>
<p>I wanted to share with you how I configured couple of different services, just so you can get a feeling of the freedom that Nomad gives. </p>
<p>For experienced DevOps people some of the choices will be <em>painful</em> because of the shortcuts taken - but that is exactly the point: what I will try to prove is that Nomad is a perfect match for an ad-hoc homelab and that it allows you to <em>evolve</em> it into the more and more serious setup as your knowledge of the principals of workload orchestration in Nomad grows.</p>
<p>I chose a path of explaining the principles by showing some code examples from my own homelab and following the path of the most trivial examples towards the more complicated ones, expanding the coverage of possibilities as we go. So, I tried to make a story out of it instead of just showing you the end result :) But, if you want to visualize the end result - here it is, as of 16/03/2022 the topology of the deployed homelab:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1647421528226/zx3b1Q9WP.png" alt="The current state of the topology of my hybrid Nomad cluster as of 16/03/2022" />.</p>
<p>So, without any further ado let's start...</p>
<h2 id="heading-example-a-cron-bash-script">Example: a cron bash script</h2>
<p>It can't be any simpler than this: you have some bash script that you want to be executed from time to time. In my case, I have some background processes I need to kill (workaround, until I fix the go app that creates those zombie processes, ... somewhere in this decade I guess).</p>
<p>Nomad has a concept of a <code>sysbatch</code> that is basically a glorified cron executor. In combination with a <code>raw_exec</code> driver which is meant to run the lowest possible native OS code (no chroot, fast startup, surely some kind of protection but still the least recommended approach from all the drivers).</p>
<p>When moving things from conf management tools (like Chef/Ansible) into Nomad that's probably a very nice thing to have. Later on, you'd probably want to lower the number of jobs defined in this way and go more into a fully managed approach (which we will shortly touch), though. Still, <em>the freedom</em> of just defining it like this and then, later on, evolving it into the proper approach is <em>the reason</em> why I chose Nomad over k8s. </p>
<pre><code>job <span class="hljs-string">"batler_cleanup_periodic"</span> {
  datacenters <span class="hljs-operator">=</span> [<span class="hljs-string">"KOEK1"</span>]
  <span class="hljs-keyword">type</span>        <span class="hljs-operator">=</span> <span class="hljs-string">"sysbatch"</span>

  periodic {
    cron             <span class="hljs-operator">=</span> <span class="hljs-string">"25 21 * * * *"</span>
    prohibit_overlap <span class="hljs-operator">=</span> <span class="hljs-literal">true</span>
    time_zone        <span class="hljs-operator">=</span> <span class="hljs-string">"UTC"</span>
  }

  constraint {
    attribute <span class="hljs-operator">=</span> <span class="hljs-string">"${attr.unique.hostname}"</span>
    value     <span class="hljs-operator">=</span> <span class="hljs-string">"pluto"</span>
  }

  group <span class="hljs-string">"main"</span> {
    task <span class="hljs-string">"script"</span> {
      driver <span class="hljs-operator">=</span> <span class="hljs-string">"raw_exec"</span>

      config {
        command  <span class="hljs-operator">=</span> <span class="hljs-string">"/usr/bin/bash"</span>
        args     <span class="hljs-operator">=</span> [<span class="hljs-string">"-c"</span>, <span class="hljs-string">"pkill -u batler_remoteexec || true"</span>]
      }

      resources {
        cpu    <span class="hljs-operator">=</span> <span class="hljs-number">500</span>
        <span class="hljs-keyword">memory</span> <span class="hljs-operator">=</span> <span class="hljs-number">128</span>
      }
    }
  }
}
</code></pre><p>You might have also noticed the <code>"constraint"</code> stanza, where I fix the location of the script into the place I have it currently defined in Chef. This way I was able to evolve <em>node by node</em> from the old solution into the new one. Therefore the migration process was:</p>
<ol>
<li>move all the stuff that can't be put into Nomad into Ansible (including the Nomad itself);</li>
<li>for all other things that <em>can be moved</em> to Nomad, move them into the simplest possible abstraction in Nomad (lowest possible hanging fruit being <code>raw_exec</code> task type);</li>
<li>remove chef-client, Ruby, git repo for chef-zero execution etc etc;</li>
<li>move to another node, repeat 1-3;</li>
<li>look how to improve migrated jobs into "better" / more professional implementations, learning as you go.</li>
</ol>
<h2 id="heading-example-excalidraw">Example: excalidraw</h2>
<p><a target="_blank" href="https://github.com/excalidraw/excalidraw/">Excalidraw</a> is a very nice drawing tool I like using from time to time. It's open source and even has a <a target="_blank" href="https://excalidraw.com/">free online version of it</a>. I thought it was awesome and just decided on deploying it myself in the homelab! It's just a simple completely stateless service with a docker container, it can't be easier than that.</p>
<p>Here it is:</p>
<pre><code><span class="hljs-attribute">job</span> <span class="hljs-string">"excalidraw"</span> {

  <span class="hljs-comment"># ... superfluous things, already presented previously, commented out</span>

  <span class="hljs-attribute">group</span> <span class="hljs-string">"main"</span> {
    <span class="hljs-section">network</span> {
      <span class="hljs-attribute">port</span> <span class="hljs-string">"http"</span> {
        <span class="hljs-attribute">static</span>       = <span class="hljs-number">2734</span>
        to           = <span class="hljs-number">80</span>
        host_network = <span class="hljs-string">"tailscale"</span>
      }
    }

    task <span class="hljs-string">"excalidraw"</span> {
      <span class="hljs-attribute">driver</span> = <span class="hljs-string">"docker"</span>

      config {
        <span class="hljs-comment"># SHA of what was latest on 04/11/2022</span>
        <span class="hljs-attribute">image</span>   = <span class="hljs-string">"excalidraw/excalidraw:sha-4bfc5bb"</span>
        ports = [
          <span class="hljs-string">"http"</span>
        ]
      }

      service {
        <span class="hljs-attribute">name</span> = <span class="hljs-string">"excalidraw"</span>
        port = <span class="hljs-string">"http"</span>

        check {
          <span class="hljs-attribute">type</span>     = <span class="hljs-string">"tcp"</span>
          port     = <span class="hljs-string">"http"</span>
          interval = <span class="hljs-string">"10s"</span>
          timeout  = <span class="hljs-string">"2s"</span>
        }
      }
    }
  }
}
</code></pre><p>What you can see here is the following step: how to deploy an online service in Nomad. Since it was deployed as a docker-compose service managed by systemD previously, it was only natural to use the <code>docker</code> driver and put it on the same machine.</p>
<p>You will notice that the stanza <code>network</code> hardcodes the port occupied by the service on the tailscale network, defined inside the Nomad configuration on that node as:</p>
<pre><code>client {
  host_network "tailscale" {
    <span class="hljs-type">cidr</span> = "100.65.51.119/32"
    reserved_ports = "22"
  }
  // other non-relevant <span class="hljs-keyword">configuration</span> <span class="hljs-keyword">options</span>
}
</code></pre><p>This makes sure that the port will be exposed only on the Tailscale network. Obviously, the Tailscale agent is installed and fully operational at this point, but I used Ansible to set up that part <em>before</em> I even tried running any Nomad job on the machine.</p>
<p>The service is registered on Consul via the <code>service</code> stanza and is nicely exposed in the consul listing</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1647423139496/T8DIzoO7-.png" alt="CleanShot 2022-03-16 at 10.31.32@2x.png" /></p>
<p>Consul and Nomad work together to make sure that the service is always online via Consul checks and if anything goes bad (like OOM in docker because of over-provisioning which may or may have not happened) the services will just be restarted as nothing happened. This example just does a raw TCP connection test, but you should try to go an extra mile to add a full HTTP verification path that the application exposes (if possible, only locally) and which, behind the curtains, makes sure the application is running in a stable fashion.</p>
<h2 id="heading-example-resilio">Example: Resilio</h2>
<p>Next example represents my Resilio File Sync job installation. Now, this one has its share of new concepts I had to understand and apply:</p>
<pre><code><span class="hljs-attribute">job</span> <span class="hljs-string">"resilio"</span> {

  <span class="hljs-comment"># ... superfluous things, already presented previously, commented out</span>

  <span class="hljs-attribute">group</span> <span class="hljs-string">"main"</span> {
    <span class="hljs-section">ephemeral_disk</span> {
      <span class="hljs-attribute">migrate</span> = <span class="hljs-literal">true</span>
      size    = <span class="hljs-number">150</span>
      sticky  = <span class="hljs-literal">true</span>
    }

    volume <span class="hljs-string">"btsync"</span> {
      <span class="hljs-attribute">type</span>      = <span class="hljs-string">"host"</span>
      read_only = <span class="hljs-literal">false</span>
      source    = <span class="hljs-string">"btsync"</span>
    }

    task <span class="hljs-string">"download"</span> {
      <span class="hljs-attribute">driver</span> = <span class="hljs-string">"raw_exec"</span>

      lifecycle {
        <span class="hljs-attribute">hook</span> = <span class="hljs-string">"prestart"</span>
        sidecar = <span class="hljs-literal">false</span>
      }

      artifact {
        <span class="hljs-attribute">source</span> = <span class="hljs-string">"https://download-cdn.resilio.com/[[ consulKey "</span>resilio/version<span class="hljs-string">" ]]/Debian/resilio-sync_[[ consulKey "</span>resilio/version<span class="hljs-string">" ]]-1_arm64.deb"</span>
      }

      config {
        <span class="hljs-attribute">command</span>  = <span class="hljs-string">"/usr/bin/bash"</span>
        args     = [<span class="hljs-string">"-c"</span>, <span class="hljs-string">"7z x -y local/resilio-sync_[[ consulKey "</span>resilio/version<span class="hljs-string">" ]]-1_arm64.deb &amp;&amp; tar xvf data.tar ./usr/bin/rslsync &amp;&amp; rm data.tar &amp;&amp; mv usr/bin/rslsync ../alloc/data/"</span>]
      }
    }

    task <span class="hljs-string">"main"</span> {
      <span class="hljs-attribute">driver</span> = <span class="hljs-string">"raw_exec"</span>

      config {
        <span class="hljs-attribute">command</span>  = <span class="hljs-string">"../alloc/data/rslsync"</span>
        args     = [<span class="hljs-string">"--nodaemon"</span>, <span class="hljs-string">"--config"</span>, <span class="hljs-string">"local/config.json"</span>]
      }

      template {
        <span class="hljs-attribute">data</span> = &lt;&lt;EOF
[[ fileContents <span class="hljs-string">"config.json.tpl"</span> ]]
        EOF
        destination = <span class="hljs-string">"local/config.json"</span>
      }
    }
  }
}
</code></pre><p>Job uses <code>ephemeral_disk</code> stanza to try (best-effort, so don't count on it 100% of the time) to maintain filesystem state across job re-deployments. I have learned indeed not to depend on it, but to use my NFS share from my NAS for anything </p>
<blockquote>
<p>Little digression, albeit an important one, now: I hope you have a NAS? 
I mean, what kind of homelab cluster do you think you have if you haven't bought (or, in case you are an adventurous type, self-built a RaspberryPi-based system, soldered to your wall, or whatever else rocks your boat) a <a target="_blank" href="https://en.wikipedia.org/wiki/Network-attached_storage">network-attached storage</a> device, aka NAS? 
I bought my ancient <em>Synology DS413j</em> many years ago and am just from time to time extending/buying new disks. Of course, it's ancient, which means I have a chroot Debian Wheezy (that's <a target="_blank" href="https://en.wikipedia.org/wiki/Debian_version_history#Debian_7_(Wheezy">version 7</a>) folks, 2016 got the last update), but it does its job still well enough that it doesn't deserve any kind of upgrade. Probably <em>my best buy ever</em> because it enables so many uses cases of the homelab...</p>
</blockquote>
<p>Further, you can see how I leverage here <code>volume</code> stanza to mount a host directory (backed by Ansible-driven NFS share) defined like this inside the Nomad config file:</p>
<pre><code><span class="hljs-section">client</span> {
  <span class="hljs-attribute">host_volume</span> <span class="hljs-string">"btsync"</span> {
    <span class="hljs-attribute">path</span>      = <span class="hljs-string">"/mnt/btsync"</span>
    read_only = <span class="hljs-literal">false</span>
  }
}
</code></pre><p>This basically allows the Nomad job to access files stored remotely on a shared NAS drive. </p>
<blockquote>
<p>Off-topic, but perhaps relevant to complete the picture, if you'd like to know how this mount is set up in my Ansible part of the setup, it's basically a dumb-down variant of <a target="_blank" href="https://github.com/openstack/ansible-role-systemd_mount/tree/00a542c7fccf56cb6314f04b949c62eede2f6f17">an open-source ansible role from the OpenStack project</a>.</p>
</blockquote>
<p>Now, since this job <em>doesn't use Docker</em> (yet), I chose to utilize an <em>init container sidecar</em> pattern in the form of "Nomad <code>prestart</code> task" that just fetches a binary for this system architecture from the official web site and store it in <code>alloc/data</code> (backed by <a target="_blank" href="https://www.nomadproject.io/docs/internals/filesystem">ephemeral disk</a>). Then, the real service starts the service and keeps it online.</p>
<blockquote>
<p>Astute reader might ask why I am not doing a check if the identical file wasn't already downloaded, but I think I will just move to Docker later anyway.</p>
</blockquote>
<p>Finally, you might have noticed weird constructs in the job file: <code>[[ consulKey ... ]]</code>. What are <em>those now</em>? Well, I am using <a target="_blank" href="https://github.com/hashicorp/levant"><code>levant</code></a> tool which is a layer around the <code>nomad</code> CLI. I have found this tool very useful since it, at least:</p>
<ul>
<li>allows for CI/CD setup (you externalize parts you want to be pushed from CI/CD, like image version for a docker image),</li>
<li>allows templating parts of job definition that would otherwise have to be hardcoded or embedded, (like file contents - in case of files that span hundreds of lines you really want them out of the job file).</li>
</ul>
<h3 id="heading-config-file-for-resilio-sync">Config file for Resilio sync</h3>
<p>Just to complete the picture here, here is the attached <code>config.json.tpl</code> file to see how I have configured one of my Resilio sync target directories:</p>
<pre><code>{
    <span class="hljs-attr">"device_name":</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ env "attr.unique.hostname" }}</span>"</span>,
    <span class="hljs-attr">"use_gui":</span> <span class="hljs-literal">false</span>,
    <span class="hljs-attr">"log_size":</span> <span class="hljs-number">30</span>,
    <span class="hljs-attr">"listening_port":</span> {{ <span class="hljs-string">env</span> <span class="hljs-string">"NOMAD_PORT_http"</span> }},
    <span class="hljs-attr">"shared_folders":</span> [
      {
        <span class="hljs-attr">"dir":</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ key "resilio/syncs/haumea/dir" }}</span>"</span>,
        <span class="hljs-attr">"overwrite_changes":</span> <span class="hljs-literal">false</span>,
        <span class="hljs-attr">"search_lan":</span> <span class="hljs-literal">true</span>,
        <span class="hljs-attr">"secret":</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ key "resilio/syncs/haumea/secret" }}</span>"</span>,
        <span class="hljs-attr">"use_dht":</span> <span class="hljs-literal">false</span>,
        <span class="hljs-attr">"use_relay_server":</span> <span class="hljs-literal">true</span>,
        <span class="hljs-attr">"use_sync_trash":</span> <span class="hljs-literal">true</span>,
        <span class="hljs-attr">"use_tracker":</span> <span class="hljs-literal">true</span>
      }
    ],
    <span class="hljs-attr">"storage_path":</span> <span class="hljs-string">"../alloc/data"</span>
}
</code></pre><blockquote>
<p>I could've chosen the path of just embedding the template in the Nomad job, but my preference is to always lower the noise in the Nomad job if I can help it, and, therefore, <code>levant</code> provides a nice <a target="_blank" href="https://github.com/hashicorp/levant/blob/main/docs/templates.md#filecontents"><code>fileContents</code> template function</a>. </p>
</blockquote>
<p>In that file there will be no special surprises, but you can see how you I combined both:</p>
<ul>
<li>environmental settings (dynamically changed by Nomad when (re)deploying) using <code>env</code> template variables; and </li>
<li>consul-driven keys, like the location and secret needed for the Resilio to handshake through its system with all the other clients about the state of my files - my phone, laptop etc. </li>
</ul>
<blockquote>
<p>By the way, using Consul keys lets me easily issue a redeployment of this Nomad job <em>just by changing the key in consul KV store</em>. Although, in this case, it makes no sense since both directory and secrets are there for the reasons of "externalization from the template", not because of security purposes.</p>
</blockquote>
<h2 id="heading-example-thoughttrain">Example: ThoughtTrain</h2>
<p>So far, the nomad job files were based on well-known off-the-shelf software packages. ThoughTrain is my own OSS Go app <a target="_blank" href="https://github.com/milanaleksic/thoughttrain">hosted on Github</a> which is totally unknown to broader audience but in short it's my own variant of "Read it Later".</p>
<p>Deployment is done using <code>levant</code> from <em>within the code repository of that project</em>. This is, thus, different from the previously mentioned jobs since they are part of the <em>homelab repo</em>. But, if you are maintaining the source code of a project it simply makes sense to keep the infra part inside that same repository. </p>
<p>This time I will not share code of a nomad job since it's <a target="_blank" href="https://github.com/milanaleksic/thoughttrain/blob/f19e99b9413031053d01c1733a73a73918a465d5/.github/workflows/thoughttrain.nomad">open source &amp; inside that project</a>, I just wanted to share the mechanics of deployment, built on previous concepts.</p>
<p>The CI platform is Github Actions <a target="_blank" href="https://github.com/milanaleksic/thoughttrain/blob/f19e99b9413031053d01c1733a73a73918a465d5/.github/workflows/build-workflow.yml">workflow</a>, build is driven by <code>make</code> so it might not be the easiest to follow, but effectively it all boils down to a simple concept: when I push a tag to GH this command is executed.</p>
<pre><code>export VERSION<span class="hljs-operator">=</span><span class="hljs-operator">&lt;</span>TAG<span class="hljs-operator">&gt;</span> <span class="hljs-operator">&amp;</span><span class="hljs-operator">&amp;</span> ./levant deploy \
    <span class="hljs-operator">-</span>log<span class="hljs-operator">-</span>level<span class="hljs-operator">=</span>WARN \
    <span class="hljs-operator">-</span>consul<span class="hljs-operator">-</span><span class="hljs-keyword">address</span><span class="hljs-operator">=</span><span class="hljs-operator">&lt;</span>consul location<span class="hljs-operator">&gt;</span> 
    .github/workflows<span class="hljs-operator">/</span>thoughttrain.nomad
</code></pre><p>I use the log level WARN so the secrets read from Consul are not put in the output.</p>
<p>I put the "consul location" as a secret into my Consul server.</p>
<p>Of course, this can only work if GH Actions connects temporarily to my homelab network as a temporary node using <a target="_blank" href="https://tailscale.com/kb/1111/ephemeral-nodes/">ephemeral tailscale nodes</a> in combination with the <a target="_blank" href="https://github.com/tailscale/github-action">Tailscale GH Actions action</a>.</p>
<h2 id="heading-example-internal-proxy">Example: internal-proxy</h2>
<p>Finally, the crown jewel for a homelab is a single centralized place where all your services get exposed using domain names. I have chosen <a target="_blank" href="https://caddyserver.com/">Caddy</a> since it's so easy to setup and has very rich ecosystem of support of various flows: </p>
<ul>
<li>I prefer using Let's Encrypt (even for the internal homelab services) and Caddy just supports it out of the box!</li>
<li>Cloudflare is also supported, albeit with a need to build custom ARM docker image (details shortly)</li>
<li>all my markdown notes are just exposed as a readable web site.</li>
</ul>
<blockquote>
<p>I am actually so happy with what Caddy does for me after 7-8 years of using nginx that I even plan on writing a mini-blog post in the future to go through my favorite features of Caddy</p>
</blockquote>
<p>Here is my Caddyfile template I use in the <code>inverse-proxy</code> nomad job, shortened a bit to remove redundancy and non-critical parts:</p>
<pre><code><span class="hljs-operator">*</span>.milanaleksic.net {
  encode gzip
  tls milanaleksic@gmail.com {
    dns cloudflare {{ key <span class="hljs-string">"cloudFlare/cfApiMilanaleksicNet"</span> }}
  }

  @chronograf host chronograf.milanaleksic.net
  reverse_proxy @chronograf {{range service <span class="hljs-string">"chronograf"</span>}} {{.Address}}:{{.Port}} {{end}}
}
</code></pre><p>Why do I need Cloudflare integration? Well, Let's Encrypt uses DNS verification process to verify I am a domain owner, therefore there is a need for a short handshake between Let's Encrypt servers and my own DNS records, and Caddy <em>does it all by itself</em>. Just this fact removed a need for a cron job, python script <em>and</em> more complex setup to handle certificate renewal process in nginx I had to do while in Chef.</p>
<p>Additionally, please observe how I refer to the location of <a target="_blank" href="https://www.influxdata.com/time-series-platform/chronograf/">Chronograf</a> service: I do not reserve the port statically in <code>chronograf.nomad</code>:</p>
<pre><code>job <span class="hljs-string">"chronograf"</span> {
  <span class="hljs-comment">// ...</span>
  group <span class="hljs-string">"main"</span> {
      port <span class="hljs-string">"http"</span> {
        to           <span class="hljs-operator">=</span> <span class="hljs-number">8888</span>
        host_network <span class="hljs-operator">=</span> <span class="hljs-string">"tailscale"</span>
      }
    <span class="hljs-comment">// ...</span>
    task <span class="hljs-string">"chronograf"</span> {
      <span class="hljs-comment">// ...</span>
      service {
        name <span class="hljs-operator">=</span> <span class="hljs-string">"chronograf"</span>
        port <span class="hljs-operator">=</span> <span class="hljs-string">"http"</span>

        check {
          <span class="hljs-keyword">type</span>     <span class="hljs-operator">=</span> <span class="hljs-string">"tcp"</span>
          port     <span class="hljs-operator">=</span> <span class="hljs-string">"http"</span>
          interval <span class="hljs-operator">=</span> <span class="hljs-string">"10s"</span>
          timeout  <span class="hljs-operator">=</span> <span class="hljs-string">"2s"</span>
        }
      }
    }
  }
}
</code></pre><p>and I just let Consul and Nomad negotiate which port should be taken. Each time <code>chronograf</code> gets (re)deployed this port is chosen a new and <code>reverse-proxy</code> gets restarted. Easy-peasy, nothing for me to do there, just like in k8s.</p>
<p>Here is the job specification for <code>inverse-proxy</code>:</p>
<pre><code>job <span class="hljs-string">"internal-proxy"</span> {

    task <span class="hljs-string">"caddy"</span> {
      driver <span class="hljs-operator">=</span> <span class="hljs-string">"docker"</span>

      config {
        image   <span class="hljs-operator">=</span> <span class="hljs-string">"milanaleksic/caddy-cloudflare:2.4.6"</span>
        volumes <span class="hljs-operator">=</span> [
          <span class="hljs-string">"../alloc/data/caddy-config:/config"</span>,
          <span class="hljs-string">"../alloc/data/caddy-data:/data"</span>,
          <span class="hljs-string">"local/Caddyfile:/etc/caddy/Caddyfile"</span>
        ]
        ports   <span class="hljs-operator">=</span> [<span class="hljs-string">"http"</span>, <span class="hljs-string">"https"</span>]
      }

      env {
        ACME_AGREE <span class="hljs-operator">=</span> <span class="hljs-string">"true"</span>
      }

      template {
        data <span class="hljs-operator">=</span> <span class="hljs-operator">&lt;</span><span class="hljs-operator">&lt;</span>EOF
[[ fileContents <span class="hljs-string">"Caddyfile.tpl"</span> ]]
        EOF
        destination <span class="hljs-operator">=</span> <span class="hljs-string">"local/Caddyfile"</span>
      }

      service {
        name <span class="hljs-operator">=</span> <span class="hljs-string">"internal-proxy-http"</span>
        port <span class="hljs-operator">=</span> <span class="hljs-string">"http"</span>

        check {
          <span class="hljs-keyword">type</span>         <span class="hljs-operator">=</span> <span class="hljs-string">"tcp"</span>
          port         <span class="hljs-operator">=</span> <span class="hljs-string">"80"</span>
          interval     <span class="hljs-operator">=</span> <span class="hljs-string">"10s"</span>
          timeout      <span class="hljs-operator">=</span> <span class="hljs-string">"2s"</span>
          address_mode <span class="hljs-operator">=</span> <span class="hljs-string">"driver"</span>
        }
      }

      service {
        name <span class="hljs-operator">=</span> <span class="hljs-string">"internal-proxy-https"</span>
        port <span class="hljs-operator">=</span> <span class="hljs-string">"https"</span>

        check {
          <span class="hljs-keyword">type</span>         <span class="hljs-operator">=</span> <span class="hljs-string">"tcp"</span>
          port         <span class="hljs-operator">=</span> <span class="hljs-string">"443"</span>
          interval     <span class="hljs-operator">=</span> <span class="hljs-string">"10s"</span>
          timeout      <span class="hljs-operator">=</span> <span class="hljs-string">"2s"</span>
          address_mode <span class="hljs-operator">=</span> <span class="hljs-string">"driver"</span>
        }
      }
    }
  }
}
</code></pre><p>Now, this job spec is interesting because of couple of reasons.</p>
<h3 id="heading-building-arm-docker-images">Building ARM Docker images</h3>
<p>This is a custom (public) Docker image built for <code>aarch64</code> architecture on my Mac Book Pro.</p>
<blockquote>
<p>by the way, you get to use <code>aarch64</code> architecture not only on modern cloud ARM servers, but also if you install 64bit ARM OS on a Raspberry Pi 4, for example.</p>
</blockquote>
<p>To build and push this I have utilized the Docker Desktop for Mac "buildx" feature. My build script for this image is basically this:</p>
<pre><code><span class="hljs-meta">#!/usr/bin/env bash</span>

<span class="hljs-comment"># latest on 10/02/2022</span>
<span class="hljs-built_in">export</span> CADDY_VERSION=2.4.6
<span class="hljs-comment"># This depends on Docker for Desktop (Mac), because that one supports multi-arch output</span>
<span class="hljs-comment"># Create that builder with "docker buildx create --name multiarch"</span>
<span class="hljs-comment"># Alternative: use DOCKER_HOST to point to a remote arm / x64 node</span>
docker buildx use multiarch
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --build-arg CADDY_VERSION=<span class="hljs-variable">${CADDY_VERSION}</span> \
  -t milanaleksic/caddy-cloudflare:<span class="hljs-variable">${CADDY_VERSION}</span> \
  --push .
</code></pre><p>and my <code>Dockerfile</code> is:</p>
<pre><code>ARG CADDY_VERSION<span class="hljs-operator">=</span><span class="hljs-number">0</span><span class="hljs-number">.0</span><span class="hljs-number">.0</span>

FROM caddy:builder AS builder

RUN caddy<span class="hljs-operator">-</span>builder \
    github.com/caddy<span class="hljs-operator">-</span>dns<span class="hljs-operator">/</span>cloudflare

FROM caddy:${CADDY_VERSION}

COPY <span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-keyword">from</span><span class="hljs-operator">=</span>builder <span class="hljs-operator">/</span>usr<span class="hljs-operator">/</span>bin<span class="hljs-operator">/</span>caddy <span class="hljs-operator">/</span>usr<span class="hljs-operator">/</span>bin<span class="hljs-operator">/</span>caddy
</code></pre><p>I was previously very spectical about using Docker containers on a Raspberry Pi. But, these new 4s don't mind. Things just work, even backed by cheap SD card. Very nice. I plan on changing all my binary and script jobs with ARM / x64 Docker images in the future. </p>
<p>But, as I have said many times above - the nice thing with Nomad is that <em>it allows you to start with low-hanging fruits and build slowely more professional setup</em>.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>We have reached the ending of what I thought was reasonable enough to follow. I have actually cut quite a few points on the road, but hopefully this blog post was enough to get you interested. I got brand new interest in my homelab after doing this migration. I don't dred any more logging in into the nodes just to discover my Chef scripts got again broken after OS upgrade.</p>
<p>All in all, I think this migration was a very good exercise in complexity with a lot of things I have learned on the way. Would definitely recommend for devops enthusiasts as an alternative to fully-managed k8s cluster.</p>
<p>Actually, even fully acknowledging the likes of <a target="_blank" href="https://homeautomation.wiki/post/k3s-the-hard-way/">k3s</a> allowing a single binary/SQlite setup, I still prefer Nomad path just because of simplicity and clarity of the job specification format instead of YAMLs. But that's just a personal preference, there is nothing stopping you in doing this entire exercise using <a target="_blank" href="https://saltproject.io/">SaltStack</a>+k8s if you prefer it. But, just don't forget about there being perhaps a simpler approach that has all the benefits.</p>
<h3 id="heading-bad-sides-of-nomad">Bad sides of Nomad</h3>
<p>There are plenty. Nothing comes without bad sides, there is no perfection.</p>
<p>I, for once, really disliked the fact that I <strong>can't expose job's task port statically</strong> on IP <code>0.0.0.0</code> (this exposes the port on all IPs). It's impossible, try it (if you find a way, please reach out!). It is a must for cloud deployments when you can't know the public IP of your node. Nomad just defaults to a first network interface it encounters (or specific ones if you set it up like that). There are <a target="_blank" href="https://github.com/hashicorp/nomad/issues/646">some</a> <a target="_blank" href="https://github.com/hashicorp/nomad/issues/12106">issues</a> and <a target="_blank" href="https://discuss.hashicorp.com/t/how-to-transfer-ports-to-0-0-0-0/33340">discussions</a> but no definitive answer yet. You just have to use consul gateway ingress, which I didn't have a time to explore (yet). Currently, I just start another out-of-Nomad Caddy reverse proxy using Ansible that then pokes into services deployed inside Nomad (sad 🐼).</p>
<p>Personally, I would prefer also that <code>levant</code> gets merged into <code>nomad</code> binary. It just seems that there is no need to externalize that functionality, just like <code>kustomize</code> got merged into <code>kubectl</code> as a subcommand <code>kubectl kustomize</code>. I know HashiCorp guys know what they're doing and am aware of a concept of splitting of concerns, but as an end-user <em>I'd just prefer that I don't need 2 binaries</em>.</p>
<h3 id="heading-further-work-ideas">Further work ideas</h3>
<p>Where will I go further from these examples? Well, I see at least these avenues of improvement:</p>
<ol>
<li>utilize Vault instead of Consul for secrets<ul>
<li>this is more for learning purposes than it is for security reasons since we are still talking just about a homelab</li>
</ul>
</li>
<li>expose metrics from nomad jobs into grafana<ul>
<li>still not sure what's the best way here, I did notice at least one attempt using Prometheus either directly or via <a target="_blank" href="https://github.com/burdandrei/nomad-monitoring">Telegraf intermediary in front of InfluxDB</a> which I already have in the cluster...</li>
</ul>
</li>
<li>expose Nomad jobs' logs into loki<ul>
<li>currently I only push ansible apps and system logs into Loki</li>
</ul>
</li>
<li>try out all the interesting services now when workload management is in place: Redis, PlantUML, Grist, PiHole, private Docker repository...<ul>
<li>this is now way less work than before since, while building all the above examples and some more I haven't mentioned, I gathered enough know-how to add more quickly new things</li>
</ul>
</li>
<li>try out Consul Connect service mesh</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[Using Ansible & Nomad for a homelab (part 1)]]></title><description><![CDATA[Part 1 of 2
This article will be split into 2: 

one about the motivation for the migration away from Chef and how I did it using Ansible;
another one about Nomad, hopefully soon.

What?
I have a small cluster of computers which I have been maintaini...]]></description><link>https://blog.aleksic.dev/using-ansible-and-nomad-for-a-homelab-part-1</link><guid isPermaLink="true">https://blog.aleksic.dev/using-ansible-and-nomad-for-a-homelab-part-1</guid><category><![CDATA[ansible]]></category><category><![CDATA[Devops]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Sat, 26 Feb 2022 09:59:27 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/M5tzZtFCOfs/upload/v1645869756571/9koPM9U-e.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-part-1-of-2">Part 1 of 2</h2>
<p>This article will be split into 2: </p>
<ol>
<li>one about the motivation for the migration away from Chef and how I did it using Ansible;</li>
<li>another one about Nomad, hopefully soon.</li>
</ol>
<h2 id="heading-what">What?</h2>
<p>I have a small cluster of computers which I have been maintaining over the last 10 years. I've learned an amazing amount of Linux operations thanks to those first Raspberry Pi's but most of all the biggest learning asset I acquired was understanding of the <em>pain</em> of making stuff work <em>manually</em>.</p>
<p>Back then I decided to use Chef as my Config Management (I remember I considered Puppet as well). Don't get me wrong, Ruby is still alive and well (I wouldn't say it's exactly thriving but there are for sure millions of Ruby programmers) - but my engineering career path took me more into direction Java as the main language (with short but interesting excursions into Go and Rust) and Python for <em>everything else</em>.</p>
<p>So, the writings on the wall were there for a long time:  </p>
<ul>
<li>my Chef cookbooks had to be vendored and manually maintained, </li>
<li>weird Ruby concoctions had to be used more and more often because I hated maintaining the chef-repo,</li>
<li>it got harder and harder to match the target Chef version with the available system Ruby version (rvm helped there with unnatural life extension), </li>
<li>I simply didn't have time to maintain my personal deployment framework BADUC.</li>
</ul>
<blockquote>
<p>It stood for "Bastion/Drone/Usher/Consul" where distributes consul locks were used with node agents and ingress token-driven service triggered deployment.  BTW, don't do this. Don't waste time doing something like that, unless you have an abundant amount of free time. It was a major mistake on my side, albeit a nice learning experience. I deleted it all with a laugh on my face - I spent many hours on maintaining it, knowing it will be thrown at some moment in time anyway. Just use off-the-shelf platforms like Nomad or K8s.</p>
</blockquote>
<p>I waited for a long time, but finally, I did it: the last remnants of Chef code and setup have been removed and the cluster has moved to Ansible foundation. It took me around 2 months (a couple of hours per week, as much as family life can provide, I guess).</p>
<p>Cluster is hybrid in every sense (unusual for companies but quite a normal thing for home labs):</p>
<ul>
<li>part is in the cloud and part on my premises (basement and office);</li>
<li>there are ARM servers (arm5, aarch64, arm7) and there are the "normal" x64 machines</li>
<li>some software is set up using Ansible, but mostly I try to target Nomad for new services</li>
<li>some services are deployed from Github Actions and some from within the network using self-hosted Drone. </li>
</ul>
<h2 id="heading-the-path">The path</h2>
<p>Here is my attempt in explaining what I figured out to be the path:</p>
<ul>
<li>figure out reasonable Linux OS distribution</li>
<li>define foundational aspects and deploy them using Ansible</li>
<li>everything else should work as Nomad job(s).</li>
</ul>
<h2 id="heading-linux-distribution">Linux distribution</h2>
<p>There is not a lot of thinking here - if you wish to mix arm servers with x64... you just have to go with Debian (or its derivatives, like Ubuntu). </p>
<p>I know you can set up Ansible to work with any distribution, but for a home lab... it just makes no sense to do it: just expect <code>apt</code>, <code>systemd</code>, and the friends are all there.</p>
<p>This also drastically relaxed Ansible role expectations: my roles are mostly trivial because of this decision.</p>
<h2 id="heading-foundations">Foundations</h2>
<p>I considered the following things as the most inner ring (or the <em>castle foundations</em> if you like metaphors) of the setup:</p>
<ul>
<li>ability to access the SSH port via the default user provided by the distribution using Ansible</li>
<li>run Ansible roles</li>
<li><strong>seal</strong> the system (no public SSH ever again).</li>
</ul>
<p>Your cloud provider or your ISP might provide you with direct public IP access which is nice. But it makes <em>no sense for a home lab</em> anymore in 2022 with Cloudflare tunnel, Twingate,  Tailscale, etc. </p>
<p>My cloud nodes are currently in the Oracle cloud because of their generous "always free" offering. But, that's just me being a cheap ass.</p>
<h3 id="heading-standard-roles">Standard roles</h3>
<p>How to prepare a node for the seal? What are the needed steps to make it happen? For me, the minimum list of the things that must be present on any node in the home lab cluster are:</p>
<ul>
<li>add the new system user for Ansible, </li>
<li>checkout dotfiles from internal Git (this step <em>might fail</em> so it's important to allow for it in case the git server itself is part of the cluster)</li>
<li>add some "default" packages, (from <code>ncdu</code> to <code>tcpdump</code>, everyone has their favorites here I guess);</li>
<li>change some system files, like <code>motd</code> messages just to mark the territory and point out which node you are connected to after SSH login or <code>sshd</code> config to block password login and all the other system configurations;</li>
<li>start <strong>Tailscale</strong> agent and join my tailnet using auto-join feature;</li>
<li>add <strong>promtail</strong> to push logs (Loki will be used as the centralized logging service);</li>
<li>use <strong>collectd</strong> for metric publishing (InfluxDB will be used for the metrics);</li>
<li>start <strong>Consul</strong> agent;</li>
<li>start <strong>Nomad</strong> agent.</li>
</ul>
<p>The actions listed above must be done on all the servers as Ansible roles and only if they are applied can a server be declared as the "homelab cluster member node".</p>
<h3 id="heading-monitoring">Monitoring</h3>
<p>You might ask yourself why would I consider monitoring foundational.</p>
<p>Well, for me as a backend engineer, observability is a paramount concern when building systems. You don't <em>guess</em> what the system is doing, you observe it.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1645870045384/ucEidfnkl.png" alt="CleanShot 2022-02-26 at 11.07.08@2x.png" /></p>
<p>Logging seems like a first thing to look at, for sure, but I find metrics way more interesting for a home lab. You get to see the usage and problems first-hand, like for example situations where a node is over-committed and some swap is needed or when network partitioning shows itself, and so on.</p>
<p>I was focused on lightweight processes and figured out that the smallest footprint comes from Go-driven apps, so I prefer the Grafana stack (Tempo, Loki, Grafana itself), with InfluxDB for metric storage using a 30 days retention period. I'm quite happy and would suggest this stack to everyone.</p>
<p>These Go services are deployed using Nomad, but the agents themselves are for me foundational layer since they should be available and running <em>even when collector services are down</em>. But of course, if you want to throw money at the problem you can avoid collectd/promtail and the monitoring collector services by using their managed offering or just go for Datadog - that's an amazing monitoring suite.</p>
<p><strong>Collectd</strong> is something I chose simply because it can be installed on <em>any machine</em>, even my old arm5 Debian. And InfluxDB has support out of the box without any need for additional converters or transformers. Easy on those Raspberry Pi or low-cost cloud node CPUs.</p>
<h3 id="heading-tailscale">Tailscale</h3>
<p>The magic here is Tailscale - after the nodes join the tailnet all of them can talk to each other, regardless of their physical location of NAT or whatever you in front of them.</p>
<p>Nothing comes into the nodes unless it's via the Tailscale network.</p>
<p>If nodes talk to each other, it's via Tailscale.</p>
<p>If there are ingress, publicly open ports, they are quite limited and only on specially marked and isolated nodes. But SSH or other administration ports are never left open for the public.</p>
<p>Thank you Tailscale for such an amazing product!</p>
<blockquote>
<p>From the <a target="_blank" href="https://news.ycombinator.com/item?id=30490116">HackerNews geeks</a> I've learned about "Headscale" that is an OSS implementation of the Tailscale server (the only thing which is closed-source in Tailscale setup). I will surely check it out at some point but for now... Tailscale has generous enough free offering that is more than enough for me. For now 😀</p>
</blockquote>
<h2 id="heading-consulnomad">Consul/Nomad</h2>
<p>I guess you might as why not K8s, right? Well, I know K8s enough, have been using it with one of my previous employers, and am OK with it. But, I wanted something simple for my home lab. Nomad was just a clear winner because of its simplicity, a no-brainer. But, more about how I use it in the next post.</p>
<p>Introducing Nomad demands a Consul cluster up and running, with the agents deployed across the servers.</p>
<p>I knew Consul for a long period and I consider it a nice building block for distributed computing: </p>
<ul>
<li>consistent quorum-driven Key/Value storage;</li>
<li>service discovery;</li>
<li>Connect service mesh (still not using it, yet).</li>
</ul>
<h2 id="heading-non-foundational-ansible-services">Non-foundational Ansible services</h2>
<p>Some services simply could not be migrated efficiently to Nomad jobs and then they were pushed into a "foundational" setup.</p>
<p>I could choose to force them into Nomad, but I decided to cut my losses here - instead of targeting purity, I decided to have something running and come back later to rethink the approach if the services themselves and/or Nomad further evolve to allow me to fix the problems that blocked me from declaring these workloads as Nomad jobs.</p>
<h3 id="heading-gitea">Gitea</h3>
<p>This is a popular small-scale self-hosted Git server.</p>
<p>There were too many things that ended up being weird when I tried to put it into a Nomad job:</p>
<ul>
<li>creators blocking the idea of the job turning as a <code>root</code> user (even leaving messages in the source code for the smart asses) which is <em>such a pain</em> if you have to use <code>exec</code> / <code>raw_exec</code> Nomad drivers</li>
<li>git hooks auto-created by gitea hardcode the physical location of the binary 
(there is an admin action to update the location, but that was just a bit ridiculous and the last drop in the glass full of problems so I just gave up) </li>
</ul>
<h3 id="heading-postgresql">PostgreSQL</h3>
<p>I prefer running the system packaged version for that platform, although in theory, I should be able to just run a Docker container. PostgreSQL is a database that is suitable for wide number of use cases, from Raspberry Pis to huge RAM-rich sharded cloud clusters. </p>
<h3 id="heading-public-ingress-reverse-proxy">Public ingress reverse proxy</h3>
<p>This will be replaced by an official nomad/Consul gateway envoy proxy at some moment, but for now, I simply had to timebox things in the cluster to non-Connect setup.</p>
<p>I chose Caddy proxy instead of Nginx with which I have more operational experience since it handled HTTPS certificate maintenance without any scripts by me.</p>
<h2 id="heading-to-be-continued">To be continued...</h2>
<p>In the next installment: some Nomad remarks and personal experiences, including both the good and the bad parts!</p>
]]></content:encoded></item><item><title><![CDATA[Remote laptop power management using Intel AMT and Dropbear (initramfs) SSH]]></title><description><![CDATA[I like having my own lightweight “datacenter” at home using ARM servers. It was (and still is) highly effective way of learning how to manage multiple servers using Linux: how to setup networking, security, use config management (I use Chef but today...]]></description><link>https://blog.aleksic.dev/remote-laptop-power-management-using-intel-amt-and-dropbear-initramfs-ssh-9b727fa6d58e</link><guid isPermaLink="true">https://blog.aleksic.dev/remote-laptop-power-management-using-intel-amt-and-dropbear-initramfs-ssh-9b727fa6d58e</guid><category><![CDATA[Homelab]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Thu, 16 Aug 2018 07:11:48 GMT</pubDate><content:encoded><![CDATA[<p>I like having my own lightweight “datacenter” at home using ARM servers. It was (and still is) highly effective way of learning how to manage multiple servers using Linux: how to setup networking, security, use config management (I use Chef but today I would’ve chosen Ansible I guess), remote access patterns, installing and maintaining multiple applications, etc. But, when I got my Dell laptop from my previous job as a departure artifact I got into a small predicament: how to <em>safely, securely</em> and <em>efficiently</em> utilize it in my setup which until then only had cheap small ARM servers, not a 1K€ laptop which can be used for many more things (like running Jupyter notebooks, DroneCI docker containers etc).</p>
<blockquote>
<p>To be clear, in the context of this blog and my requirements, <strong>safe</strong> means fully disk encrypted laptop (if a thief takes it, data can’t be extracted), <strong>efficient</strong> means turning it off and on remotely, <strong>securely</strong> means that only I can do it, using my Yubikey, no matter where I am in the world.</p>
</blockquote>
<p>For quite some time I did it in a simple way: it was always ON (with battery out as some people suggested it on the Internetz for fire safety reasons) but the feeling of doubt was omnipresent: this wastes electricity 99% of the time in this way. I wanted the damn thing on only when I wanted it! It’s the cloud age when things should run only when needed, a small laptop in my basement should be trivial!</p>
<p>I did notice that people are able to do a very cool thing: use something called initramfs/dropbear to access fully encrypted laptop. I also figured out that Intel AMT can also turn on or off a modern PC so I googled, I’ve written some Go code and I’ve decided to write this follow up blog to connect all I’ve figured out for the next one wanting the same thing to do it more simply.</p>
<h3 id="heading-manual-setup-stage">Manual setup stage</h3>
<p>Here we need to understand what we are dealing with in my setup to know how applicable it is for you.</p>
<h4 id="heading-fde">FDE</h4>
<p>My laptop has <strong>full disk encryption</strong>. I have Debian Stretch (9) on it during the installation (but derivates — like Ubuntu — should also support FDE). You can’t turn it on after installation AFAIK, at least not the default <em>ecryptfs</em>. You should set it up as you please and at least one decryption method should be passphrase-driven so that you can send it as cleartext over SSH as a parameter to the decryption command. I will not explain how to do it, please google it. A hit I got is for example: <a target="_blank" href="https://xo.tc/setting-up-full-disk-encryption-on-debian-9-stretch.html">https://xo.tc/setting-up-full-disk-encryption-on-debian-9-stretch.html</a></p>
<h4 id="heading-intel-amt">Intel AMT</h4>
<p>I have <strong>Intel AMT</strong> which I had to activate and set an “admin” password. Depending on your PC vendor AMT can be activated with an empty password or pre-set with a fixed one etc. You should be able to set it up and allow for power management usage commands. This effectively means that ports 16992 and/or 16993 are active on a certain IP that you set up manually. On Linux, you can verify using <strong>amttool</strong> that AMT is correctly set up using another machine. You should be able to do a <em>powerup</em> and <em>powerdown</em> commands. You can even do it from any linux machine. For example: <a target="_blank" href="http://jefflane.org/v2/technology/setting-up-intel-amt-to-act-as-a-remote-kvm-in-linux/">article 1</a> and <a target="_blank" href="https://linux.die.net/man/7/amt-howto">article 2</a></p>
<h4 id="heading-dropbear-ssh">Dropbear SSH</h4>
<p><strong>Dropbear</strong> is another very interesting piece of software: it is minimalistic collection of some basic utilities for Linux <em>related to SSH</em>, all of which having minimal dependencies and (important!) being able to run in initramfs which is activated after initial boot sequence in Linux is complete and before disk is decrypted.</p>
<blockquote>
<p>Dropbear allows SSH connection to be established to it before your Linux distro is fully up! Don’t forget that no full <strong>real</strong> drive mount points are available in dropbear</p>
</blockquote>
<p>In latest Debian/Ubuntu releases it is not that complicated to set it all up. There are many not up-to-date (and thus — complex) Dropbear initramfs SSH setup blogs to be found on Internet but I found them deprecated. A correct and simple one is for example here: <a target="_blank" href="https://hamy.io/post/0005/remote-unlocking-of-luks-encrypted-root-in-ubuntu-debian/">https://hamy.io/post/0005/remote-unlocking-of-luks-encrypted-root-in-ubuntu-debian/</a></p>
<p>Effectively you need to:</p>
<ul>
<li>Install dropbear package for your distro</li>
<li>Add your public key (your home is not accessible, so another path on the system is used for dropbear authorized keys)</li>
<li>(Optionally, but recommended) Change the port for SSH</li>
</ul>
<blockquote>
<p>You might need this to know to which SSH service you are talking to: the real one (part of your Linux distro) or dropbear one, running in initramfs; both can’t be running at the same time</p>
</blockquote>
<ul>
<li>Update/rebuild initramfs</li>
</ul>
<h4 id="heading-bastion-host">Bastion host</h4>
<p>I don’t let my servers appear on Internet except one, so called “dmz” or “bastion” node. This is the node that is used primarily for things like SSH tunneling and nginx proxying to other servers. I recommend having one since it allows easier setup (you can use it to setup vpn also for example either using OpenVPN or more simply using ngrok as I have explained <a target="_blank" href="https://medium.com/@milanaleksic/ngrok-vs-dynamic-dns-for-remote-linux-home-server-access-1486299502f2?source=linkShare-a5a10abd1eca-1534402393">in another article</a>).</p>
<h4 id="heading-ssh-agent">SSH agent</h4>
<p>I stopped using private keys in a form of files. I bought myself a Yubikey in the last Amazon Prime day and I’m loving it. I went through the <a target="_blank" href="https://github.com/drduh/YubiKey-Guide">magnificent manual</a> to setup the GPG and SSH using Yubikey and since then I’ve removed my previous flow and replaced it with a yubikey-driven GPG agent (that can and does correctly work in SSH agent mode also).</p>
<p>I hope you are using some sort of SSH agent, right? Even if you use private key files, you don’t type in your password every time you need that private key, right? You would of course also never dream of keeping unencrypted private key laying around? <em>Right</em>?</p>
<h4 id="heading-sanity-check">Sanity check</h4>
<p>Now is the time to verify all is set up correctly:</p>
<ul>
<li>Make sure you can connect to AMT via amttool package from another host;</li>
<li>Make sure your passphrase is correct and that you can type it in and unlock the laptop after boot is complete;</li>
<li>Make sure after booting and before you unlock the drive you can connect to dropbear ssh port and that you can run decryption command crypt-unlock;</li>
<li>Now rinse &amp; repeat with an SSH tunnel through your bastion host.</li>
</ul>
<h3 id="heading-automation">Automation</h3>
<p>Finally, you might consider that all of this is fine and nice but it takes a lot to turn on: you need to start the machine, connect remotely to dropbear ssh, issue unlock procedure etc. That’s correct, it’s tedious. That’s why I wrote <a target="_blank" href="https://github.com/milanaleksic/laptop-booter/blob/master/README.md">https://github.com/milanaleksic/laptop-booter/blob/master/README.md</a> to do all the steps above. I recommend setting up aliases to boot up certain servers.</p>
<p>Many things can be improved here! You might for example wish for:</p>
<ul>
<li>only partial support for my flow: for example avoid bastion ssh tunnel or use local file for ssh private key etc;</li>
<li>env variables or config files instead of CLI args;</li>
<li>omit some warning messages or introduce logging levels.</li>
</ul>
<p>I can only say: PRs are welcome, requirements in the form of GitHub issues also (although depending on priority it might take some time for me to do sth about them).</p>
]]></content:encoded></item><item><title><![CDATA[InformIT ebook deal of the day → pushbullet]]></title><description><![CDATA[In case you are like me and like reading, www.informit.com has a rather large library of ebooks, some of which they offer in daily deals.
These “daily deals” are pretty good thing because you can end up getting a classic like Fowler’s “Refactoring” o...]]></description><link>https://blog.aleksic.dev/informit-ebook-deal-of-the-day-pushbullet-5aaf7849e954</link><guid isPermaLink="true">https://blog.aleksic.dev/informit-ebook-deal-of-the-day-pushbullet-5aaf7849e954</guid><category><![CDATA[Bash]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Sun, 11 Sep 2016 09:10:13 GMT</pubDate><content:encoded><![CDATA[<p>In case you are like me and like reading, <a target="_blank" href="http://www.informit.com">www.informit.com</a> has a rather large library of ebooks, some of which they offer in <em>daily deals</em>.</p>
<p>These “daily deals” are pretty good thing because you can end up getting a classic like Fowler’s <a target="_blank" href="https://www.amazon.com/Refactoring-Improving-Design-Existing-Code/dp/0201485672">“Refactoring”</a> or GoF’s <a target="_blank" href="https://www.amazon.com/Design-Patterns-Elements-Reusable-Object-Oriented/dp/B000SEIBB8/ref=sr_1_1?s=books&amp;ie=UTF8&amp;qid=1473582994&amp;sr=1-1&amp;keywords=gang+of+four+design+patterns">“Design Patterns”</a> with a 50% discount.</p>
<p>But, the publisher wants you to visit their page every day. They don’t offer a newsletter for example (and their Twitter is overloaded with corporate information so it’s hard to see only the tweets related to the DoDs).</p>
<p>So, how can we automate getting the information? I’ve made a one-liner bash command that sends <a target="_blank" href="https://www.pushbullet.com">Pushbullet</a> notifications via cron. Let’s go step by step…</p>
<h3 id="heading-matching-the-link">Matching the link</h3>
<p>If you open the HTML source code of the page you can see that the link to the book is the first link that starts with:</p>
<pre><code class="lang-ini">&lt;a <span class="hljs-attr">href</span>=<span class="hljs-string">"/store</span>
</code></pre>
<p>So, the simplest way is to download the page and just extract the first link that matches this expectation. Of course, it will not work forever, but you can always come back and report it here, I might need to update this pattern…</p>
<p>Simplest way I found to extract the script is:</p>
<pre><code class="lang-bash">curl www.informit.com/deals/ 2&gt;/dev/null | \
  sed -n <span class="hljs-string">'s/.*href="".*/http:\/\/www.informit.com\1/p'</span> | \
  head -1
</code></pre>
<p>The upper script will:</p>
<ol>
<li><p>download the page, hiding the progress bar</p>
</li>
<li><p>it will replace all lines that have store links with the link content only</p>
</li>
<li><p>show only the first match</p>
</li>
</ol>
<h3 id="heading-sending-the-link">Sending the link</h3>
<p>How do we now get the link? You can choose to send an email, but that’s so 1990's! Let’s find a more modern approach for this!</p>
<p>I have a Pushbullet app on my phone and, interestingly enough, guys keep it quite cheap (until you start sending <em>a lot of</em> notifications), so let’s use that one!</p>
<pre><code class="lang-bash">| xargs -I {} curl \
  --header <span class="hljs-string">"Access-Token: <span class="hljs-variable">$PUSHBULLET_TOKEN</span>"</span> \
  --header <span class="hljs-string">'Content-Type: application/json'</span> \
  --data-binary <span class="hljs-string">'{"body":"Deal of the day is: {}","title":"InformIT deal of the day","type":"note"}'</span> \
  --request POST https://api.pushbullet.com/v2/pushes
</code></pre>
<p>The command is almost a literal copy&amp;paste from the official API documentation — you need though a Bash environment property <code>PUSHBULLET_TOKEN</code> which carries the private <em>access token</em> provided on the page <a target="_blank" href="https://www.pushbullet.com/#settings">https://www.pushbullet.com/#settings</a> and that’s it.</p>
<h4 id="heading-putting-it-in-a-script">Putting it in a script</h4>
<p>Everything done until now is just a preparation for the last step: we need a way to check this daily instead of us executing the command (otherwise it would’ve just been easier to open the browser, right?).</p>
<p>Let’s save the command we’ve built in previous 2 steps into a file “<em>/home/informit_to_pushbullet.sh</em>” (with executable bit turned on of course).</p>
<pre><code class="lang-bash"><span class="hljs-meta">#!/usr/bin/env bash</span>
PUSHBULLET_TOKEN=<span class="hljs-string">""</span>

<span class="hljs-comment"># command we've built above&gt;</span>
</code></pre>
<p>I don’t really do it like this though in my machines: I tend to extract all of the important environment properties into a separate file I keep in Git, but of course in this article I’m keeping it simple.</p>
<h4 id="heading-using-cron-to-get-daily-notification">Using cron to get daily notification</h4>
<p>I will use a standard <em>crontab</em> to do this. Depending on your setup you can use different ways: <em>supervisord</em> or <em>systemD</em> just to name a few. I still think crontab is the simplest/most generic way.</p>
<pre><code class="lang-bash">crontab -e
</code></pre>
<p>And, finally, add this line in the editor presented to you and it’s done!</p>
<pre><code class="lang-bash">5 10 * * * /home/informit_to_pushbullet.sh 2&gt;&amp;1 &gt; /tmp/pushbullet.log
</code></pre>
<p>Since I live in Brussels, I chose 10:05 AM to generate notification, around 1 hour after it is published. You, of course, need to adapt the time to your time zone.</p>
<blockquote>
<p>Of course, while experimenting, you might want to replace “5 10” with “*/1 *” so you can send notification every minute, until you’re sure everything works.</p>
</blockquote>
]]></content:encoded></item><item><title><![CDATA[Using AWS Lambda to verify site uptime]]></title><description><![CDATA[Recently I had to start looking into the AWS Lambda as it might become part of a portfolio of cloud services we shall start depending on.
As you might already know currently Lambdas can be written only using:

node.js (which I passionately hate);

Ja...]]></description><link>https://blog.aleksic.dev/using-aws-lambda-to-verify-uptime-5a459fc3bef6</link><guid isPermaLink="true">https://blog.aleksic.dev/using-aws-lambda-to-verify-uptime-5a459fc3bef6</guid><category><![CDATA[AWS]]></category><category><![CDATA[lambda]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Sun, 10 Jul 2016 08:52:40 GMT</pubDate><content:encoded><![CDATA[<p>Recently I had to start looking into the AWS Lambda as it <em>might</em> become part of a portfolio of cloud services we shall start depending on.</p>
<p>As you might already know currently Lambdas can be written only using:</p>
<ul>
<li><p><strong>node.js</strong> (which I passionately hate);</p>
</li>
<li><p><strong>Java</strong> (boring: I’d need to package a jar);</p>
</li>
<li><p><strong>Python</strong> (which I chose not to learn);</p>
</li>
<li><p>no <em>native</em> Go support as of now (although at some point in time maybe Amazon reacts to <a target="_blank" href="https://forums.aws.amazon.com/message.jspa?messageID=677221">this forum post</a> requesting exactly that). Yes I can “package my Go binary“ but that seems a bit non-native way of doing it.</p>
</li>
</ul>
<p>I wanted the simplest, quickest <a target="_blank" href="https://en.wikipedia.org/wiki/Proof_of_concept">POC</a>… so, node it is.</p>
<p>After I have defined a hammer, then came the search for a perfect candidate :). I chose a very simple use case: <em>testing if my website is still up.</em></p>
<h4 id="heading-lambda-code">Lambda code</h4>
<p>This is what I came up after couple of hours.</p>
<p>This code is also copy/paste friendly — I didn’t use any fancy extra library like <a target="_blank" href="https://github.com/caolan/async">async</a>, <a target="_blank" href="https://www.npmjs.com/package/future">Future</a> or whatever is considered cool by the hipster node developers, sorry about that (but these kinds of choices is exactly the reason why I hate node that much).</p>
<p>What it does is:</p>
<ol>
<li><p>goes through the list of sites, verifies response is as expected</p>
</li>
<li><p>expects all sites to be https (this might not be suitable your case — you might want to use <strong>require(‘http’)</strong> in place of <strong>https</strong> below)</p>
</li>
<li><p>sends an email if at least one failure is detected</p>
</li>
</ol>
<h4 id="heading-trigger">Trigger</h4>
<p>As a trigger, I chose “<em>CloudWatch Events — Schedule</em>” which seems to be a cron-like way of triggering Lambdas in AWS. You just set up the timeout of for example 15 minutes (1 minute is minimum though).</p>
<p>Maybe the coolest thing about it is that you get automatically logging since all <em>CloudWatch</em> rule executions are being logged. Cool, right?</p>
<h4 id="heading-role">Role</h4>
<p>You might also need to add suitable role, since sending emails is of course not enabled by default, I ended up writing a quick policy extension like this:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"Statement"</span>: [
    {
      <span class="hljs-attr">"Action"</span>: [
        <span class="hljs-string">"logs:CreateLogGroup"</span>,
        <span class="hljs-string">"logs:CreateLogStream"</span>,
        <span class="hljs-string">"logs:PutLogEvents"</span>
      ],
      <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
      <span class="hljs-attr">"Resource"</span>: <span class="hljs-string">"arn:aws:logs:*:*:*"</span>
    },
    {
      <span class="hljs-attr">"Action"</span>: [
        <span class="hljs-string">"ses:SendEmail"</span>
      ],
      <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
      <span class="hljs-attr">"Resource"</span>: <span class="hljs-string">"arn:aws:ses:us-east-1::identity/alarmemail@gmail.com"</span>
    }
  ],
  <span class="hljs-attr">"Version"</span>: <span class="hljs-string">"2012-10-17"</span>
}
</code></pre>
<h4 id="heading-billing">Billing</h4>
<p>Seems like the price for Lambdas (well, at least for now) is so low that this kind of scripts basically ends up being executed for free.</p>
<p>Of course, it goes without saying that you should be limiting the <strong>execution time</strong> to something reasonable just to be sure, and that you don’t trigger it too often, but that’s it — I stay far far far below $1 per month and that’s good!</p>
<h4 id="heading-profit-and-have-fun">Profit and have fun</h4>
<p>I hope Lambda stays cheap as it is since this is very simple/extensive way of doing simple stuff like checking a web site!</p>
<p>I still think it sucks for complex use cases, but… that’s a completely different rant there!</p>
]]></content:encoded></item><item><title><![CDATA[Ngrok vs dynamic DNS for remote Linux home server access]]></title><description><![CDATA[Imagine you don’t want to expose all your Raspberry Pies fully on the Internet (probably you shouldn’t ever do that in fact) but still want to be able to reach them from outside of your home.
Imagine you want to access your NAS to schedule a download...]]></description><link>https://blog.aleksic.dev/ngrok-vs-dynamic-dns-for-remote-linux-home-server-access-1486299502f2</link><guid isPermaLink="true">https://blog.aleksic.dev/ngrok-vs-dynamic-dns-for-remote-linux-home-server-access-1486299502f2</guid><category><![CDATA[Homelab]]></category><dc:creator><![CDATA[Milan Aleksić]]></dc:creator><pubDate>Sat, 04 Jun 2016 08:11:04 GMT</pubDate><content:encoded><![CDATA[<p>Imagine you don’t want to expose all your Raspberry Pies fully on the Internet (probably you shouldn’t ever do that in fact) but still want to be able to reach them from outside of your home.</p>
<p>Imagine you want to access your NAS to schedule a download of a huge file you don’t want to download from office etc.</p>
<p>I do things like this on daily basis…</p>
<p>Here I want to present 2 successful and simple ways I use to access active network nodes remotely and my thoughts on when is one approach better than the other.</p>
<h4 id="heading-level-1-use-noip-since-millions-already-do-that">Level 1: use noip since millions already do that</h4>
<p>I have couple of Rpi 1A, 1B and 3 (and couple of Radxa Rock Pros and a Synology, but that’s a different story altogether). Amongst other things I have my web site on one of them (<a target="_blank" href="https://www.milanaleksic.net">https://www.milanaleksic.net</a>) and I use noip (<a target="_blank" href="http://noip.com">http://www.noip.com</a>) to do it. The <em>normal way</em> is to expose a single node fully or set up your router to pass-through to ports on selected servers.</p>
<p>In my particular case, and since micro computers like RPi and above-mentioned RRPs are relatively cheap, I use one server as a <em>partial</em> DMZ, which basically means— only some ports are published. This allows me to do risky stuff on other nodes while I’m at home without endangering the rest.</p>
<p>Setting up thing like this is not that difficult: you just search for a suitable client from the dynamic DNS provider you use, or use some 3rd party app which supports your dynamic DNS provider.</p>
<blockquote>
<p>I use for example <strong>inadyn</strong> ( <a target="_blank" href="https://github.com/troglobit/inadyn">https://github.com/troglobit/inadyn</a>) since I found it to be easier to configure via Chef which I use to setup all my home nodes and laptops.</p>
</blockquote>
<p>After client starts updating DNS with your host’s IP you can quickly start accessing your home server’s services exposed on your router, if you are allowed to do that of course and only if you’ve setup your router correctly.</p>
<blockquote>
<p><strong>Caveat:</strong> check if your internet provider allows you to access your home server remotely on certain ports. I’m in Belgium and Telenet here doesn’t allow ports 80,443 but Belgacom does</p>
</blockquote>
<p>I used this method as a first choice since it was… logical and simple. Everyone is doing it, it can’t be wrong, right?</p>
<p>Dynamic DNS is perfect if you have something that constantly needs to be available on Internet (like a personal web site for example).</p>
<p>The only problem I see is that this approach is acceptable only on an Internet-facing node. That means: tough luck if you are connected via WiFi in a hotel or if you are behind corporate firewall for example. I’m quite sure provider/hotel will not give you an IP, thus putting IP of the endpoint of your provider in your dynamic DNS server is… pointless.</p>
<p>Maybe time for Level 2?</p>
<h4 id="heading-level-2-ngrok-as-an-uber-tool-for-the-geeks">Level 2: ngrok as an über tool for the geeks</h4>
<p>I really like using ngrok ( <a target="_blank" href="https://ngrok.com/">https://ngrok.com/</a>). In case you don’t know what’s it about: it’s like a TeamViewer/LogMeIn, but for geeks: it allows you to not worry about not having a stable public IP. It’s like a hand that takes your host’s port, publishes it on Internet.</p>
<p>That’s in fact all you need, right? Exposed port should be all you need since then you can expose ssh and do everything you need!</p>
<blockquote>
<p>The following tips expect some level of knowledge of Linux</p>
</blockquote>
<p>You want to open VPN tunnel through that port? Here is what you need to do:</p>
<pre><code class="lang-bash"><span class="hljs-meta">#!/bin/bash</span>

<span class="hljs-keyword">if</span> [ <span class="hljs-string">"<span class="hljs-variable">$NGROK_PORT</span>"</span> == <span class="hljs-string">""</span> ]; <span class="hljs-keyword">then</span>
NGROK_PORT=$(go run ngrok_port.go -email=<span class="hljs-variable">$NGROK_USERNAME</span> -password=<span class="hljs-variable">$NGROK_PASSWORD</span>)
<span class="hljs-keyword">fi</span>

sudo sshuttle --auto-hosts --dns --exclude <span class="hljs-variable">$SUBNET_HOME</span> \
-r <span class="hljs-variable">$USERNAME</span>@0.tcp.ngrok.io:<span class="hljs-variable">$NGROK_PORT</span>
-e <span class="hljs-string">'ssh -i /home/$USERNAME/.ssh/id_rsa -o \
UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'</span> \
<span class="hljs-variable">$SUBNET_REMOTE</span>
</code></pre>
<p>This uses an <em>sshtunnel</em> application that should be available as part of your Linux distribution. Probably <strong>not</strong> the fastest or the most optimal. You should use OpenVPN for that, plenty of blogs available that explain how to — it just takes more time to set it up and explain.</p>
<p>The <em>ngrok_port.go</em> is a script I made in Go language (<a target="_blank" href="https://golang.org">https://golang.org</a>) which takes your ngrok username and password as parameters, logs you in into ngrok and fetches/scraps the IP exposed on their server.</p>
<blockquote>
<p>The script is as simple as it gets, which means that it doesn’t cover all possible cases: different server or different layout of the page and so on.</p>
</blockquote>
<p><a target="_blank" href="https://gist.github.com/milanaleksic/162c99697a08bb36fe514f66399913ec"><strong>Ngrok free mode always has only one single tunnel allowed. If you use TCP tunnel, this script…</strong><br />*Ngrok free mode always has only one single tunnel allowed. If you use TCP tunnel, this script extracts the value (since…*gist.github.com</a></p>
<p>So, these 2 scripts combined should allow you to quickly setup remote VPN connection and you are allowed to do whatever you’d like on the remote server: connect via VNC, SSH to another host, access the sites available only in that remote network etc. And all of this without support from system admins.</p>
<p>When you are done working with the remote host you can quickly just ssh to remote host and shutdown ngrok server which will allow you not to worry that someone might try to access your computer:</p>
<p>ssh 'killall ngrok'</p>
<blockquote>
<p><strong>Caveat</strong>: nothing perfect in the world is for free: ngrok comes with free plan that allows only one port to be exposed, that’s why the sshtunnel trick is so valuable: you get a lot through a single port</p>
</blockquote>
<h4 id="heading-is-there-level-3">Is there Level 3?</h4>
<p>Of course there is.</p>
<p>Although I haven’t found any problem not covered by the above-mentioned two solutions, that doesn’t mean there are no places for improvement.</p>
<p>If someone can’t help you with starting an ngrok server remotely you can of course use TeamViewer or LogMeIn for “quick remote access”, but that kind of defeats the entire approach since you end up mixing full desktop solution just to start a server on remote system.</p>
<p>I was thinking about <em>pull daemon</em> which would check every 15 minutes somewhere if a need exist for ngrok to be activated on the remote host. That way you keep the ngrok server active only when you really need it. From where a pull can be done is tricky: it needs to be a safe HTTPS place, either an S3 file as simplest solution or an online service, I haven’t decided which way I want to go. Probably left for another experiment when I get time🤓.</p>
<h3 id="heading-conclusion">Conclusion</h3>
<p>Noip or similar dynamic DNS provider is a good choice when the host is publicly reachable but you have no idea what’s the IP.</p>
<p>You should try to use ngrok service for other common case: quick access to a remote host behind a firewall or NAT mapping.</p>
]]></content:encoded></item></channel></rss>