CakeLens V5, the AI-gen video detection model is now open-sourced!

Today, I’m excited to announce that the CakeLens v5 AI-generated video detection model is now open-source! Why open-source, you might ask? Well, the CakeLens project serves multiple purposes for me. (For more background, see my earlier article: I built an AI-gen detection model for videos, here’s what I learned.) The most critical one was to teach myself how to build a machine learning-powered product from end to end, including data collection, labeling, model design, training, and inference. It has already achieved that goal.

Beyond personal learning, I hoped CakeLens could uncover some business value. However, the reality is that most users don’t care much whether the funny videos they see online are AI-generated or not. In an enterprise context, though, detecting AI-generated content is more critical. For example, ID verification services often require users to hold an ID and take a photo or short video. What happens if scammers generate realistic fake photos or videos in the future? What if job candidates alter their face or voice? These issues are already happening.

Example of face-swapped job interview video

Screenshot from a LinkedIn post (source) showing a job interview conducted via video call, where the candidate's face has been swapped using AI-generated imagery. This highlights real-world cases of AI-generated content being used for identity fraud in professional settings.

I believe this represents a new market category yet to be fully addressed. CakeLens is my initial step into exploring this space.

While the CakeLens v5 model for detecting AI-generated videos works reasonably well with a limited dataset (77% precision, 74% recall with 50% threshold), its accuracy is still too low for enterprise use cases. I searched for similar open-source models but found none. I guess cool kids are mostly focused on large language models (LLMs) right now. Although this isn’t a flashy moonshot project, I believe open-sourcing CakeLens v5 offers educational value to others. So, here we are!

...

Read the full article

Two AMD 7900XTX GPUs in a Tinygrad-Based Training Workstation with Peer-to-Peer PCIe Communication

I’ve been diving into machine learning projects lately, and I enjoy it a lot. However, one thing bothers me: I lack the computing power to test many interesting ideas. In my previous article for CakeLens, I designed and trained a model to detect AI-generated videos. But due to limited local computing power, I had to rent H100/A100 GPUs from Modal to experiment with different approaches. And it’s not cheap:

Screenshot of Modal's billing dashboard showing the total cost as $2.1K

Screenshot of Modal's billing dashboard showing the total cost as $2.1K

My PC has an RTX 4090, so I could run training locally. However, memory constraints make it painful to train larger models. Even when a model fits on the GPU, the intensive computation consumes all GPU resources, rendering my PC unusable. To solve this, I need more local computing power to run machine learning experiments without breaking the bank.

My first idea was to buy another RTX 4090 or perhaps an RTX 5090. I checked prices online and was shocked. I bought my current 4090 for around $2,000 USD, but now they’re selling for $3,600 on Amazon. That’s insane! 🤯

Screenshot of Amazon product page, featuring Asus ROG Strix RTX 4090 GPU selling at $3,599.95

Screenshot of Amazon product page, featuring Asus ROG Strix RTX 4090 GPU selling at $3,599.95

Curious about the cost of an H100 for my home office, I checked its price:

FIXME

Screenshot of Amazon product page, featuring Nvidia Tesla H100 GPU selling at $25,249.95

Heh, you know what? I’m not planning to sell two KDNYs I have accumulated just yet 😅.

I don’t have the budget for more Nvidia GPUs right now, but I still want a local setup to experiment at a lower cost. One day, while browsing X, I found a post by Tinygrad showcasing their gradient functions defined in just 40 lines of code.

A X post by @__tinygrad__ showcasing gradients for backprop defined in just 40 lines

A X post by @__tinygrad__ showcasing gradients for backprop defined in just 40 lines

I tried it, and it was impressive—no dependencies, just an instant install with uv.

Screenshot of installing tinygrad with uv shows it only takes 9ms

Screenshot of installing tinygrad with uv shows it only takes 9ms

After researching further, I really liked Tinygrad’s concept. It’s like the RISC (Reduced Instruction Set Computer) of machine learning, while PyTorch feels more like CISC (complex instruction set computer). I appreciate its clean, minimalist design, and it seems to support AMD GPUs well.

This made me wonder: why does everyone say Nvidia GPUs are the go-to for machine learning? They claim Nvidia’s strength lies in its software. Hmm, is that true? 🤔 Or, as some might say, is it a skill issue? 🤣

I’m not sure, but I wanted to find out. I’m curious about Tinygrad’s pre-built AMD training workstation. It’s tempting, but it’s outside the budget I can allocate, and it’s too bulky for my home office.

Screenshot of Tinybox, a pre-built machine learning work station by the tiny corp featuring AMD and Nvidia GPU options. The AMD option is selling at $15,000 USD

Screenshot of Tinybox, a pre-built machine learning work station by the tiny corp featuring AMD and Nvidia GPU options. The AMD option is selling at $15,000 USD

Looked at the GPUs they are using, the AMD 7900XTX seemed mature. Best of all, the price was reasonable—just $1,100:

Screenshot of Amazon product page, featuring XFX AMD Radeon RX 7900XTX GPU selling at $1,099.54

Screenshot of Amazon product page, featuring XFX AMD Radeon RX 7900XTX GPU selling at $1,099.54

I had a retired PC, so I quickly purchased two 7900XTX GPUs:

Two boxes of XFX AMD Radeon RX 7900XTX GPUs

Two boxes of XFX AMD Radeon RX 7900XTX GPUs

I did my best with cable management:

Two  XFX AMD Radeon RX 7900XTX GPUs in an open PC case with PCI power cables connected to them

Two XFX AMD Radeon RX 7900XTX GPUs in an open PC case with PCI power cables connected to them

It was time-consuming, and I tried 😅

...

Read the full article

I built an AI-gen video detection model and browser extension in a month

Have you ever wondered while browsing the internet whether the video or image you’re viewing is real or AI-generated? People say, “Seeing is believing,” but that’s less true in the AI era. Nowadays, generating photorealistic videos with audio is easier and cheaper than ever.

AI-generated video featuring a kangaroo denied boarding on a flight as an emotional support animal, posted by @gdb on X

AI-generated video featuring a kangaroo denied boarding on a flight as an emotional support animal, posted by @gdb on X

If you’ve followed trends on X, you may have noticed many users liked and reposted a video of a kangaroo being denied boarding on an airplane as an emotional support animal, ticket in paw. As adorable as it was, the video was AI-generated. The ticket’s text is gibberish. I call it “AI fonts.” I’m no linguistics expert, but the verbal exchange also felt off.

I’ve faced the same issue. While browsing X, I’ve retweeted content, only to later realize it was AI-generated, which was embarrassing. I wished for an easy-to-use tool to distinguish AI-generated content from real content. I tested several online tools claiming to detect AI-generated content, but none worked as expected. So, I spent the past month training a model and building a browser extension focused on detecting AI-generated videos on X. I named it CakeLens, inspired by the viral “Is it a cake?” videos. Instead of identifying cakes, it detects AI-generated content. I chose the name because I wanted it to be as easy as “a piece of cake” to use.

CakeLens is now available on the Chrome Web Store. You need to sign up for an account at CakeLens.ai to use it. Once set up, a button appears in the upper-right corner of videos on X when you hover over them. Click it to submit the video for AI-generated content detection.

Screenshot of X.com showing the CakeLens button on the upper-right corner when hovering on a video

Screenshot of X.com showing the CakeLens button on the upper-right corner when hovering on a video

View the detection results on the submissions page of your CakeLens account.

Screenshot of the submission page of CakeLens

Screenshot of the submission page of CakeLens

The latest version of my model achieves 77% precision and 74% recall on the validation dataset at 50% as the threshold.

Screenshot of TensorBoard PR curve for CakeLens' latest model

Screenshot of TensorBoard PR curve for CakeLens' latest model

I’ve learned a lot from this project. Today, I’m sharing what I’ve learned from building this pet project!

...

Read the full article

Nvidia GPU on bare metal NixOS Kubernetes cluster explained

Since the last time I published the second MAZE (Massive Argumented Zonal Environments) article, I realized that the framework is getting more mature, but I need a solution to run it on a large scale. In the past, I built a bare metal Kubernetes cluster on top of a three-node mini PC interconnected with USB4. I have a retired workstation PC with NVIDIA GeForce RTX 2080 Ti. I wondered why not put this PC into the three-node Kubernetes cluster and configure Kubernetes to make CUDA work. Once I have one configured, extending the computing power capacity with more GPUs will be easier.

Screenshot of btop of my four-node Kubernetes cluster, with three mini PCs connected via USB4 and the newly added retired workstation sporting an Nvidia GeForce RTX 2080 Ti.

Screenshot of btop of my four-node Kubernetes cluster, with three mini PCs connected via USB4 and the newly added retired workstation sporting an Nvidia GeForce RTX 2080 Ti.

With that in mind, I took a week on this side quest. It turns out it’s way harder than I expected. I have learned a lot from digging this rabbit hole, and I encountered other holes while digging this one. Getting the Nvidia device plugin up and running in Kubernetes is hard. Not to mention, running on top of NixOS makes it even more challenging. Regardless, I finally managed to get it to work! Seeing it running for the first time was a very joyful moment.

Screenshot of the Kubernetes dashboard shows the logs of a MAZE pod  running an experiment on a CUDA device

Screenshot of the Kubernetes dashboard shows the logs of a MAZE pod running an experiment on a CUDA device

An article like this could be helpful to someone as more software engineers are getting into machine learning. Running all of it on the cloud is a costly option. There are also privacy concerns for personal use, so Nvidia GPU on a local bare-metal Kubernetes cluster is still a very tempting option. Here, I would like to share my experience setting up a bare metal Kubernetes cluster with the Nvidia GPU CDI plugin enabled in NixOS.

...

Read the full article