Nvidia GPU on bare metal NixOS Kubernetes cluster explained
Since the last time I published the second MAZE (Massive Argumented Zonal Environments) article, I realized that the framework is getting more mature, but I need a solution to run it on a large scale. In the past, I built a bare metal Kubernetes cluster on top of a three-node mini PC interconnected with USB4. I have a retired workstation PC with NVIDIA GeForce RTX 2080 Ti. I wondered why not put this PC into the three-node Kubernetes cluster and configure Kubernetes to make CUDA work. Once I have one configured, extending the computing power capacity with more GPUs will be easier.

Screenshot of btop of my four-node Kubernetes cluster, with three mini PCs connected via USB4 and the newly added retired workstation sporting an Nvidia GeForce RTX 2080 Ti.
With that in mind, I took a week on this side quest. It turns out it’s way harder than I expected. I have learned a lot from digging this rabbit hole, and I encountered other holes while digging this one. Getting the Nvidia device plugin up and running in Kubernetes is hard. Not to mention, running on top of NixOS makes it even more challenging. Regardless, I finally managed to get it to work! Seeing it running for the first time was a very joyful moment.

Screenshot of the Kubernetes dashboard shows the logs of a MAZE pod running an experiment on a CUDA device
An article like this could be helpful to someone as more software engineers are getting into machine learning. Running all of it on the cloud is a costly option. There are also privacy concerns for personal use, so Nvidia GPU on a local bare-metal Kubernetes cluster is still a very tempting option. Here, I would like to share my experience setting up a bare metal Kubernetes cluster with the Nvidia GPU CDI plugin enabled in NixOS.
...
Read the full articleMAZE - My AI models are finally evolving!
This is the second article in the MAZE machine learning series. To learn more about what is MAZE (Massive Argumented Zonal Environments), you can read other articles here:
More than one week has passed since I published my previous MAZE article. I spent most of my spare time besides my work on this project, and I have already made significant progress. I can’t believe what I’ve done in the past week with just myself. I am very proud to have achieved this much in such a short time. Today, I want to share the latest update about the MAZE project.
MAZE Web app
First, as promised in the previous article, I wanted to build a UI that made researching much easier and publish the data publicly so that anybody interested in machine learning could view those neuron networks. People can learn a new trick or two or even open new research based on the latest effective patterns found through MAZE. And yes, I deliver! Here are some screenshots from the web app:

Screenshot of MAZE web app experiment page

Screenshot of MAZE web app agent page
I published the web app at here, and you can visit yourself to find out:
As you can see, the domain name has article01
as a prefix because I will move at light speed in improving it by making lots of changes, and I don’t want to maintain backward compatibility.
So, this website will be a snapshot of my initial test rounds.
During the development of MAZE, I realized it’s tough to learn how a model works by reading its gene code, so I made it possible to view each model’s PyTorch module DAG (Directed acyclic graph) directly. Here are some really interesting examples, like this complex one with many branches:

DAG diagram of a neuron network with linear and maxpool at the beginning then connect to many branches at the bottom with a few modules before the output node
I will show you more interesting examples in the following sections.
...
Read the full articleMAZE - How I would build AGI
Update: This is the first article in the MAZE machine learning series. You can read other articles here:
I can’t believe I am publishing this. I’ve been thinking about Artificial general intelligence and how to build it for a very long time. Call me crazy if you want, but I have an idea of building it differently from the mainstream approach. Instead of using backpropagation hammering down on smartly handcrafted networks, I want to build a system that can produce arbitrary neuron networks based on evolving and mutating genes in a series of controlled environments. Of course, every cool approach deserves an awesome acronym name. That’s why I named it MAZE (Massive Argumented Zonal Environments) 😎:

MAZE stands for Massive Argumented Zonal Environments
I spent last week building a prototype that can generate random neuron networks based on a gene sequence. Shockingly, I haven’t even implemented the part of generating offspring and gene mutation, but some randomly generated networks I saw during the development have already shown better performance than the ones I crafted manually during the learning process.
...
Read the full articleESP32 Tesla dashcam remote USB project in Rust failed. Here's what I've learned
As engineers, we all celebrate successful projects. But what about those that failed? Should we sweep them under the rug and pretend they never happened? There’s nothing wrong with a failed project. It’s not a shame but just part of the normal process to succeed. As long as we can learn from our mistakes and the lessons that come out of them, it’s a positive result. Today, I would like to share the story of my recent failed pet project – an ESP32-based Tesla dashcam video remote access system written in Rust.
The needs
It’s great being a Tesla driver that the dashcam system keeps recording, so if there’s any situation that happened, you have evidence to prove it. However, the video clips are on a USB drive. When you get home, if you want to pull out the clips, you need to unplug the USB drive, plug it into your computer, and copy the files you want. I always wish to have a USB thumb drive connecting to my home Wifi to make it possible for me to grab video files over the network without the manual process. A meme pops up in my mind to explain why you need a system like that.

A crying woman claimed: You hit my car and run 3 years ago. A guy replied: No, I didn't, here's the evidence
...
Read the full article