WildSight AI is an edge-powered camera system designed to detect wildlife, track animal movement, and alert rangers in real time when potentially dangerous human-animal encounters occur. The system is powered by the AMD Kria KR260 board, running AI models locally to ensure fast, reliable operation in remote and off-grid environments.
This project builds upon my earlier EagleEye AI smart tracking camera, which used face recognition and a Pelco-D pan-tilt unit. In the future, I plan to replace that unit with a lightweight, servo-controlled gimbal and wildlife-specific AI, including species classification and conflict detection logic.
🐘 Real-World Problem: Human-Wildlife ConflictIn many protected areas and national parks, poaching, crop damage, and accidental encounters with wildlife threaten both animals and local communities.
Rangers often lack real-time situational awareness, especially in remote areas without internet access. Traditional IP camera systems or cloud-based AI solutions are often too slow, too power-hungry, or unreliable in these conditions.
WildSight AI provides a fast, low-power alternative: a self-contained system that can track and classify animals, detect the presence of nearby humans, and send local or remote alerts before danger escalates.
🔧 How It Works- A USB or IP camera mounted on a 2-axis servo gimbal scans the area, controlled by ROS2 nodes.
- The system detects and tracks movement using MegaDetector (for animals/humans) and optionally SpeciesNet (to identify species).
If both a protected animal and a human are detected in the same frame, an alert is triggered:
- Local signal - LED lamp
- Optional SMS/email alert to rangers
A live RTSP video stream or a DisplayPort video output is available for real-time monitoring.
- All processing runs on-device with the Kria KR260, no internet/cloud needed.
MegaDetector is an open-source object detection model originally developed by Microsoft AI for Earth to help automate the processing of camera trap imagery. Instead of classifying every possible species, MegaDetector focuses on a general detection task, identifying whether an image contains an animal, a person, or a vehicle. This design makes it broadly applicable across projects and ecosystems worldwide, regardless of the species present. By filtering large datasets into “images with wildlife” versus “empty images,” MegaDetector greatly reduces the manual workload for researchers, who can then focus on species-level identification using smaller, specialized classifiers.
🌱 Why It MattersWild-Sight-AI is built for the field — designed to run independently, in harsh environments, without needing cloud infrastructure. It empowers conservation teams to protect wildlife before conflict occurs, using fast, reliable, and fully open technology. Wildlabs.net
IntroductionWild-Sight-AI is my attempt to port Microsoft’s MegaDetector v5 — a YOLOv5-based wildlife detection model — onto the Kria KR260 adaptive SOM.
The motivation is simple: run a modern wildlife detector directly on the edge — no cloud, no offloading — and use the detections to control a PTZ camera in real time. This project builds on my earlier Eagle-Eye-AI setup, but this time with a much larger and more complex model.
The result:
After a long series of engineering battles, I managed to run MegaDetector on the Kria DPU using Vitis AI 3.5 + VVAS 3.0, and show the first bounding boxes live in the video output stream, taken from my IP TV set-top box.
For this proof of concept, I used only 12 images for calibration — just enough to demonstrate the full pipeline of compiling a PyTorch model into an XMODEL and running it on the DPU of the Kria board.
With such a limited dataset, the model’s accuracy naturally degrades during quantization, which is clearly visible in the placement of the bounding boxes.
After correcting the activation function — replacing the previously unsuitable LeakyReLU
with the more appropriate Hardswish
() — the MegaDetector model immediately produced far better results, even with only 12 calibration images.
Building on this, I reduced the input image size to 448x256 to boost performance and then recalibrated the model using 1,000 randomly selected images from the freely available WCS dataset.
With this improved calibration set, the detector achieved much more stable accuracy at an acceptable frame rate. Bounding boxes are now positioned correctly and track animals reliably, as illustrated in the example below.
This project wasn’t just about plugging in a model. I quickly discovered that almost nothing was compatible out of the box. Here’s a taste of the obstacles I had to overcome:
- Model mismatch: MegaDetector is YOLOv5, but the official Kria runtime image only supports YOLOv3.
- Outdated runtime: The SD card image is locked to Vitis AI Runtime 2.5 + VVAS 2.0, while YOLOv5 requires Vitis AI 3.5.
- Kernel driver lock: The
zocl
driver in the official image blocks upgrading. - Custom runtime container: The official
kria-runtime
docker uses Vitis AI 2.5 — not usable. - Dependencies hell
- Compiled Protobuf 3.21.3 from source.
- Built OpenCV 4.6 from source to extract missing libraries.
- Rebuilt the entire VVAS framework 3.0 from source.
- FPGA image rebuild: Vivado 2023.1 required, with clock fixes for the newer kernel.
- Quantization headaches: YOLOv5 contains unsupported ops (SiLU activations, reshapes, grids). I had to patch forward passes, handle anchors manually, and hack the quantizer script.
- Accuracy evaluation: The quantizer doesn’t understand YOLO raw tensor heads, so I wrote a custom evaluation wrapper.
Every step was a rabbit hole. But eventually, all the pieces clicked together.
Building the EnvironmentTo make everything reproducible, I created custom Dockerfiles and a new Ubuntu 22.04 root filesystem for Kria.
- Kernel & zocl: Built a Vitis-compatible version of the
zocl
driver (2.15). - Docker runtime: Built my own
kria-runtime
image, based on Vitis AI 3.5 and VVAS 3.0. - VVAS build: Compiled from source on the Kria board.
- Extra dependencies: Protobuf 3.21.3 and OpenCV 4.6 built from scratch.
- Build times: ~2 hours directly on the board, but much faster using QEMU on a host PC (see my other Hackster project: Automate Kria Ubuntu SD Card Image with Docker and the updated scripts for generating Wild-Sight-AI SD-Card image on GitHub.
All Dockerfiles, FPGA image, and application code are on GitHub:
MegaDetector → Kria DPUThe model pipeline looks like this:
- Start with the PyTorch MegaDetector v5 checkpoint (
.pt
). - Quantize with a small calibration dataset using
pytorch_nndct
. - Export to
.xmodel
. - Compile with
vai_c_xir
to generate a DPU-ready model.
Quantizing to work was the hardest part.
- I had to rewrite the quantization script, since MegaDetector outputs raw detection heads.
- Anchors were extracted directly from the PyTorch checkpoint.
- I re-implemented YOLO decoding in C++ (anchors × stride, sigmoid activations, grid offsets).
Inference doesn’t stop at the model. The whole application runs as a ROS2 node.
- Input: Camera stream enters a GStreamer pipeline.
- Inference: The
vvas_xinfer
plugin runs the compiled MegaDetector model on the DPU. - Pad probe: I intercept raw tensor outputs in a probe, decode them into bounding boxes, and build a
GstInferencePrediction
tree. - Publishing: Bounding boxes are published via ROS2 topics to the camera rotator controller nodes.
- Overlay: VVAS DrawResults overlays bboxes on the outgoing video stream.
In Wild-Sight-AI, MegaDetector serves as the first-stage detector running directly on the Kria board’s DPU. Its role is to scan incoming video streams in real time and highlight regions containing animals, people, or vehicles. These bounding boxes can then be used either directly for monitoring human–wildlife interactions or passed on to an optional second-stage classifier for species-level recognition. By leveraging MegaDetector’s proven robustness across diverse datasets, the system can generalize to many wildlife environments without requiring a massive custom dataset from the start.
ResultsFinally, I saw bounding boxes on the screen set correctly and alert message appears, when a Human-Wildlife conflict occurs! 🎉
- Performance: FPS= 9.03909 on KR260.
- Visuals: Boundary boxes (different colors) follow moving people or animals.
- Accuracy: A 1000 calibration images, taken from WCS camera traps, were used for this test, and results are quite astonishing.
- Alert message: Appears, when both a wild animal and a human are present in the same frame.
This section is designed for quickly testing the software, building the Docker image, installing, and launching it with the appropriate FPGA firmware from our GitHub repository. Simply follow the steps below.
SDCard Image
- Download the provided SD card image based on Ubuntu 22.04 and Docker on your host machine from our repo.
NOTE: The official SD card image for Kria board is currently not appropriate for this project, because it is based on Vitis-AI-Runtime 2.5. We need 3.5 and specific zocl kernel module (2.15), so use the provided one:
wget https://github.com/s59mz/kria-build-system/releases/download/wild-sight-ai-1.0/k-5.15-zocl-2.15-kria.wic.zip
- Burn the image on an SD card (32 GB recommended) with Balena Etcher or similar program
- Extend the rootfs partition to maximum size available with gparted program
- Boot the Kria KR260 board with this image.
- Use username:
kria
and password:kria
for login.
Load Docker images
- Download the provided tar.7z docker images on Kria board from here.
wget https://github.com/s59mz/wild-sight-ai/releases/download/1.0/kria-image_3.5.tar.7z
wget https://github.com/s59mz/wild-sight-ai/releases/download/1.0/wild-sight-ai_1.0.tar.7z
- Unzip both files
sudo apt update
sudo apt install 7z
7z e kria-image_3.5.tar.7z
7z e wild-sight-ai_1.0.tar.7z
- Load both images on your Docker system on Kria board. This takes some time.
docker load -i kria-image_3.5.tar
docker load -i wild-sight-ai_1.0.tar
- Check if both images were installed successfully:
docker images
Install Application
Install wild-sight-ai repository from github
git clone https://github.com/s59mz/wild-sight-ai
cd wild-sight-ai
Optional: rebuild docker images
This step is needed only if you need to re-build the previous downloaded docker images from scratch. Warning: The build process takes about 2 hours on Kria board!
./build.sh
Install AI model
The provided megadetector.xmodel
file is to big (~150MB) to be included in the source code, so it should be downloaded separately and installed in project directory:
cd model/megadetector
wget https://github.com/s59mz/wild-sight-ai/releases/download/1.0/megadetector.xmodel
Install FPGA firmware
cd ~/wild-sight-ai
# Install Firmware Binaries
cp fpga-firmware/firmware-kr260-wild-sight.deb /tmp
sudo apt install /tmp/firmware-kr260-wild-sight.deb
Build ROS2 Nodes
This step is needed to be run only once, before launching the application.
# Launch the wild-sight-ai docker image
kria@localhost:~/wild-sight-ai$ ./run.sh
-—-
# Build the ROS2 nodes inside a docker container
root@xlnx-docker:~/ros2_ws# colcon build
Lunch the Application
To start the application, these steps are needed every time you power up or restart the Kria board.
- Connect an IP camera that supports RTSP and set the output resolution to 1920x1080. Ensure both the camera and the Kria board are on the same local network, using a standard 1 Gbps Ethernet switch.
- Before launching the application, connect a high-resolution monitor to the DisplayPort of the Kria board. If your monitor has an HDMI input, use an Active DisplayPort to HDMI adapter (passive adapters will not work).
- Execute the following commands on the running Kria board:
# Load the FPGA firmware
sudo xmutil unloadapp
sudo xmutil loadapp kr260-wild-sight
sudo xmutil desktop_disable
# go to the Wild-Sight-AI app git repository (in case yoy reboot the board)
cd ~/wild-sight-ai
# Launch the wild-sight-ai docker image
./run.sh
# Launch the app with your camera URL
./run_app.sh rtsp://192.168.1.11:554/stream1
# To Exit press Ctrl-C
Expected Output
You should see the camera’s captured images on the monitor connected to the board.
When an animal or a person is detected, Blue or Yellow boxes will appear around them, tracking its movement and the camera on the rotator tracks the detected animal.
At the end of development, I recorded a short demonstration video to showcase Wild-Sight-AI in action on the Kria board. The system runs the quantized MegaDetector model in real time, draws bounding boxes around detected animals and humans, and raises a “Human-Wildlife Conflict Detected” alert when both an animal and a person appear in the same frame.
🎥 Watch the demo here:In the video, the Kria board is connected to a rotating camera and an external display. First, the demo shows bounding boxes around animals in a random broadcast wildlife videos, including reptiles, and even fast-moving predators. Later, the setup switches to the project’s own camera, which successfully tracks a plush toy as a simulated “animal” target. The warning light is activated when both an “animal” and a “human” are present, illustrating the conflict detection functionality.
Future WorkThis is only the first milestone. The next steps are:
- Use a proper calibration dataset with wildlife images for better quantization.
- Retrain / fine-tune MegaDetector specifically for the target hardware.
- Upgrade pan/tilt hardware with faster servos and a custom adapter PCB.
- Add a SpeciesNet classification in the pipeline for detecting species.
- Field testing in real wildlife monitoring scenarios.
🔩 PCB Prototype (via NextPCB)
A simple but essential custom PCB will connect:
- Kria’s GPIO pins (via Pi header) to servo PWM lines
- A 5V step-down converter from 12V input
- Servo and camera connectors
This board replaces the earlier bulky Pelco-D motor controller, allowing faster tracking and better portability.
ConclusionThis project demonstrates the first working MegaDetector v5 running on the Kria DPU with ROS2 + GStreamer integration.
Even though performance (~9.0 FPS) and accuracy still need tuning, the hardest part has been solved:
- Migrated the Kria platform from Vitis AI 2.5 → 3.5.
- Built a new runtime environment from scratch.
- Rebuilt FPGA bitstream for compatibility.
- Patched YOLOv5 to quantize and compile successfully.
- Integrated everything into a ROS2 pipeline with live bounding box overlays.
Wild-Sight-AI proves that complex PyTorch models like MegaDetector can run on Kria SOMs with enough persistence. This is a key step toward autonomous, low-power wildlife monitoring at the edge.
* * *Appendix: How to Compile MegaDetector (PyTorch → XMODEL)Below is a clean, reproducible path to convert the MegaDetector v5 (PyTorch) model into a Vitis AI XMODEL you can deploy on the KR260 DPU. These steps match the toolchain used in this project and the quantization script yolo5s_quant.py
provided in the yolov5 directory in the project repository.
Prerequisites
- A machine with Docker installed (x86_64 is fine).
- Vitis AI 3.5 Docker image (the official AMD image
xilinx/vitis-ai-pytorch-cpu:latest
). - The MegaDetector v5 PyTorch checkpoint (md_v5a.0.1.pt downloaded from here).
- A small directory of calibration images representative of your target scenes (the more varied the better; 100–500 images is a good start). Images can be get from WCS Camera Traps. Labels/annotations are not needed for calibration—just images.
- The KR260 DPU arch file (arch.json) that matches your bitstream (DPUCZDX8G B3136 for KR260 Smartcam is common).
$ cat arch.json
{
"fingerprint":"0x101000016010406"
}
Prepare working directory
# get Vitis-AI 3.5
git clone https://github.com/Xilinx/Vitis-AI
cd Vitis-AI
# create workspace directories
mkdir -p workspace/{models,calibration_images}
cd workspace
# Put your MegaDetector weights here
cp /path/to/md_v5a.0.1.pt models/
# Add some JPG/PNG calibration images here
cp -r /path/to/calib_images/* calibration_images/val/animals
Get YOLOv5,apply patches and drop in the quantization script.
# Get YOLOv5
git clone https://github.com/ultralytics/yolov5.git
cd yolo5
# Checkout a tag for MegaDetector
git checkout c23a441c9df7ca9b1f275e8c8719c949269160d1
# apply YOLOv5 patches
cd models
cp path/to/wild-sight-ai/yolov5/patches/yolo.py .
cp path/to/wild-sight-ai/yolov5/patches/experimental.py .
# Put the provided quantization script from project repo to working directory
cd ../..
cp path/to/wild-sight-ai/yolo5s_quant.py .
The yolo5s_quant.py
script:
- Loads
md_v5a.0.1.pt
via YOLOv5’sDetectMultiBackend
. - Runs Vitis AI quantizer with a calibration pass.
- Runs a quick test/export pass to emit
Model_int.xmodel
.
Place the script next to your models/ and calibration_images/ folders:
./workspace
├─ yolov5/
├─ yolo5s_quant.py
├─ models/
│ └─ md_v5a.0.1.pt
└─ calibration_images/val/animals
└─ *.jpg|*.png
Patches for yolo5 repo
We need to change 2 important things, unless we cannot compile to XMODEL:
- In the Detect.forward() return raw tensors directly, without using other functions that are not supported in DPU, like split.
- we need to replace all
SiLU
activator functions withHardswish(inplace=True)
, because the SiLU float version is not supported in DPU IP v4.0
Start the Vitis AI 3.5 Docker
cd path/to/Vitis-AI
./docker_run.sh xilinx/vitis-ai-pytorch-cpu:latest
# inside a running docker install the missing needed tool for yolov5
pip install seaborn
# go to your working directory
cd workspace
Run quantization (Calib → Test/Export)
Calibration pass (uses your calibration images to collect statistics):
python3 yolo5s_quant.py \
--quant_mode calib \
--data_dir ./calibration_images \
--model_dir ./models/md_v5a.0.1.pt \
--subset_len 12 \
--batch_size 1
Test & export pass (emits the deployable XMODEL):
python3 yolo5s_quant.py \
--quant_mode test \
--data_dir ./calibration_images \
--model_dir ./models/md_v5a.0.1.pt \
--subset_len 1 \
--batch_size 1 \
--deploy
NOTE: The subset_len
and batch_size
must be set to 1 here, so the script emits the deployable XMODEL.
If successful, you’ll see artifacts under quantize_result/
, including:
Model_int.xmodel
← quantized graph (pre-compile)
Compile to target DPU (vai_c_xir)
Use your KR260-matching arch.json
:.
vai_c_xir \
-x quantize_result/Model_int.xmodel \
-a arch.json \
-o dpu_model \
-n megadetector
This produces:
dpu_model/
├─ megadetector.xmodel ← compiled for your DPU
├─ meta.json ← runner metadata
└─ md5sum.txt
NOTE: If you see a huge number of CPU-assigned ops (transposes, reshapes) or hundreds of tiny subgraphs, that’s usually a sign something prevented good fusion. The working configuration in this project uses Hardswish instead of SiLU to avoid CPU-only ops on DPUCZDX8G with VAI 3.5. Stick with the provided pipeline unless you have a reason (and the time) to retune the graph.
**************************************************
* VITIS_AI Compilation - Xilinx Inc.
**************************************************
[UNILOG][INFO] Compile mode: dpu
[UNILOG][INFO] Debug mode: null
[UNILOG][INFO] Target architecture: DPUCZDX8G_ISA1_B3136_0101000016010406
[UNILOG][INFO] Graph name: Model, with op num: 1222
[UNILOG][INFO] Begin to compile...
[UNILOG][INFO] Total device subgraph number 6, DPU subgraph number 1
[UNILOG][INFO] Compile done.
[UNILOG][INFO] The meta json is saved to "/workspace/Torch/dpu_model/meta.json"
[UNILOG][INFO] The compiled xmodel is saved to "/workspace/Torch/dpu_model/megadetector.xmodel"
[UNILOG][INFO] The compiled xmodel's md5sum is fb1e8be78645af8252e32f72ea0ceab0, and has been saved to "/workspace/Torch/dpu_model/md5sum.txt"
vitis-ai-user@
Copy to the target & sanity check
On the KR260 (host or over SSH), place the model where your app expects it. For this wild-sight-ai project:
scp dpu_model/megadetector.xmodel kr260:home/kria/wild-sight-ai/model/megadetector/
Quick benchmark
On the Kria board inside running container, when the FPGA firmware is loaded already:
xdputil benchmark /opt/xilinx/kr260-wild-sight/share/vitis_ai_library/models/megadetector/megadetector.xmodel 1
This produces 9.04 FPS (for 448×256 with no further optimizations):
kria@localhost:~/wild-sight-ai$ sudo xmutil unloadapp
[sudo] password for kria:
remove from slot 0 returns: 0 (Ok)
kria@localhost:~/wild-sight-ai$ sudo xmutil loadapp kr260-wild-sight
kr260-wild-sight: loaded to slot 0
kria@localhost:~/wild-sight-ai$ ./run.sh
======================================================================
__ ___ _ _ _____ _ _ _ _____
\ \ / (_) | | | / ____(_) | | | | /\ |_ _|
\ \ /\ / / _| | __| | | (___ _ __ _| |__ | |_ / \ | |
\ \/ \/ / | | |/ _. | \___ \| |/ _. | ._ \| __| / /\ \ | |
\ /\ / | | | (_| | ____) | | (_| | | | | |_ / ____ \ _| |_
\/ \/ |_|_|\__,_| |_____/|_|\__, |_| |_|\__| /_/ \_\_____|
__/ |
|___/
======================================================================
Build Date: 2025/09/12 21:28
*** Welcome to Wildi-Sight-AI! Type "./run_app.sh" to start.
root@xlnx-docker:/opt/xilinx/kr260-wild-sight/share/vitis_ai_library/models/megadetector# xdputil benchmark megadetector.xmodel 1
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20250926 23:06:27.466972 37 test_dpu_runner_mt.cpp:477] shuffle results for batch...
I20250926 23:06:27.469184 37 performance_test.hpp:73] 0% ...
I20250926 23:06:33.469415 37 performance_test.hpp:76] 10% ...
I20250926 23:06:39.469604 37 performance_test.hpp:76] 20% ...
I20250926 23:06:45.469782 37 performance_test.hpp:76] 30% ...
I20250926 23:06:51.469956 37 performance_test.hpp:76] 40% ...
I20250926 23:06:57.470139 37 performance_test.hpp:76] 50% ...
I20250926 23:07:03.470347 37 performance_test.hpp:76] 60% ...
I20250926 23:07:09.470520 37 performance_test.hpp:76] 70% ...
I20250926 23:07:15.470737 37 performance_test.hpp:76] 80% ...
I20250926 23:07:21.470952 37 performance_test.hpp:76] 90% ...
I20250926 23:07:27.471176 37 performance_test.hpp:76] 100% ...
I20250926 23:07:27.471251 37 performance_test.hpp:79] stop and waiting for all threads terminated....
I20250926 23:07:27.535609 37 performance_test.hpp:85] thread-0 processes 543 frames
I20250926 23:07:27.535678 37 performance_test.hpp:93] it takes 64405 us for shutdown
I20250926 23:07:27.535703 37 performance_test.hpp:94] FPS= 9.03998 number_of_frames= 543 time= 60.0665 seconds.
I20250926 23:07:27.535754 37 performance_test.hpp:96] BYEBYE
Test PASS.
root@xlnx-docker:~/ros2_ws#
SomeTroubleshooting Tips (I’ve learned by making this project):
- Too many CPU ops in compile log: Typically caused by unsupported activations/layout dances. Using
Hardswish(inplace=True)
instead of SiLU or LeakyReLU got us a clean, fast graph. - Garbage boxes: Ensure you’re applying the official YOLOv5 decode math (sigmoid + 2x trick for xy/wh + correct grid/stride + per-class/objectness merge) before NMS. The quant script’s validation path can help catch decode bugs.
- Calibration quality: Use diverse, representative images. If boxes are consistently off, increase calibration set (e.g., 300–500 frames) and re-run calib → test.
To quantize the MegaDetector model into .xmodel
, a representative dataset of calibration images is required. For this project, I built a helper toolkit that downloads random images from the WCS dataset, optionally crops out animals using the provided bounding boxes, and selects random subsets for calibration. The prepared directory calibration_images/
is then used directly by the quantization script.
👉 Full step-by-step instructions and scripts are available in the project dataset GitHub repository.
Appendix: YOLOv5-P6 Output Decoding on CPUIn the original YOLOv5 implementation, the Detect.forward()
layer already performs the grid/anchor decoding and applies sigmoid activations before returning final predictions. However, many of these operations (view, permute, dynamic grid generation, etc.) are not supported by the DPU compiler.
To make the model deployable on the Kria board, we stripped Detect.forward()
down to only the convolution layers, leaving the raw tensor outputs (anchors × 8 values per grid cell) on the DPU. This means the model no longer produces bounding boxes directly — instead, we must implement the entire YOLOv5 decoding pipeline on the CPU.
The following section explains how this decoder works and how the raw DPU outputs are transformed back into bounding boxes, scores, and classes.
YOLOv5-P6 models (like MegaDetector v5a) use four detection “heads,” corresponding to strides 8, 16, 32, and 64. Each head produces an H×W×(3×(5+num_classes)) tensor, where:
- 3 = number of anchors per cell
- 5 = (tx, ty, tw, th, to) → box center offsets, box size, and objectness logit
- num_classes = raw logits for each class
For MegaDetector (humans, animals, vehicles) num_classes = 3
, so each anchor has 8 values, giving C = 24 channels per cell.
Anchor and stride definitions
Each head is decoded with a fixed stride and corresponding anchor set:
- P3 (stride 8): grid ≈ W/8 × H/8
- P4 (stride 16): grid ≈ W/16 × H/16
- P5 (stride 32): grid ≈ W/32 × H/32
- P6 (stride 64): grid ≈ W/64 × H/64
Anchors are defined in pixels of the input resolution, not grid units.
Decoding procedure
For each grid cell (x, y) and anchor a, the 8 values are decoded:
tx, ty, tw, th, to, class_logits…
Steps:
1. Objectness + class score
obj = sigmoid(to)
cls = sigmoid(max(class_logits))
prob = obj * cls
Predictions below a confidence threshold are skipped.
2. Center coordinates
cx = (sigmoid(tx) * 2 - 0.5 + x) * stride
cy = (sigmoid(ty) * 2 - 0.5 + y) * stride
3. Box dimensions
bw = (sigmoid(tw) * 2)^2 * anchor_w
bh = (sigmoid(th) * 2)^2 * anchor_h
4. Box corners
x1 = cx - bw/2
y1 = cy - bh/2
x2 = cx + bw/2
y2 = cy + bh/2
5. NMS (Non-Max Suppression)
After all boxes are decoded, NMS removes duplicates using IoU thresholds.
Special considerations on Kria
- Vitis AI delivers NHWC float tensors, so decoding loops treat channel index as the innermost stride.
- The number of heads used depends on the input size. At smaller resolutions (e.g., 448×256), only the first two heads (stride 8 & 16) are meaningful.
- Confidence and NMS thresholds must be tuned; higher thresholds reduce false positives but may miss smaller objects.
- If the input is letterboxed to match the model’s stride requirements, un-letterboxing (or VVAS MetaAffineFixer) is needed to map predictions back to the original video resolution.
- The NHWC decoder is implemented in the
handle_tensorbuf.cpp
file.
To make Wild-Sight-AI reproducible and portable, the project uses two Docker images:
Kria Runtime Image (kria-image:3.5):
- Based on Ubuntu 22.04 with Vitis AI Runtime 3.5 and VVAS 3.0.
- Includes required dependencies: XRT 2.15, OpenCV 4.6, Protobuf 3.21.3, GStreamer plugins, and VVAS libraries.
- Provides the execution environment for AI workloads on the KR260 DPU.
Wild-Sight-AI Application Image (wild-sight-ai:1.0):
- Based on the Kria Runtime image.
- Adds ROS 2 Humble, the Wild-Sight-AI application code, and all necessary build tools.
- This is the only image you need to run the final application.
Build & Run
A helper script build.sh
is provided in the repository root. It automatically:
- Builds the runtime image (if not already present).
- Builds the application image on top.
- Prints a success message once ready.
To run the application, simply use:
./build.sh
./run.sh
The run.sh script launches the ROS 2 node with GStreamer pipelines, publishes detections, and overlays bounding boxes on the output video stream.
More Details
For a detailed breakdown of the Dockerfiles, build process, dependencies, and troubleshooting tips, see the full Dockerfiles README.
Appendix: Building a Custom Wild-Sight-AI SD Card ImageFor this project I needed a custom Ubuntu-based SD card image for the Kria KR260 board. The default Xilinx images are a good starting point, but they don’t always include the right kernel modules, the correct ZOCL driver version for Vitis AI, or convenient extras like Docker. To make the Wild-Sight-AI pipeline reproducible and portable, I automated the entire image creation process using my Kria Build System:
👉 GitHub repo: s59mz/kria-build-system
👉 Wild-Sight-AI release: wild-sight-ai-1.0
This repo provides scripts that run inside a Docker build environment, fetch the official Ubuntu rootfs and kernel packages, patch the image with the right modules, and finally configure the system for Wild-Sight-AI deployment.
Why a custom image?
- The stock KR260 Ubuntu image comes with ZOCL v2.13, but Vitis AI 3.5 requires ZOCL v2.15.
- We also need Docker preinstalled for deployment flexibility.
- It’s handy to preconfigure a user account, SSH access, networking, and timezone so the image is plug-and-play.
Scripts overview
The main repo (kria-build-system) drives the process, but the Wild-Sight-AI release adds three important scripts:
1. install-modules.sh
This script ensures the SD card image has the correct kernel modules installed. It:
- Downloads the right kernel modules package (
5.15.0-1053-xilinx-zynqmp
). - Extracts and installs them into
/lib/modules
. - Replaces the stock
ZOCL v2.13
module withZOCL v2.15
, either from a prebuilt ZIP or by building from XRT sources. - Runs
depmod
to update module dependencies.
# remove the old zocl module v2.13
mv $ROOTFS_DIR/lib/modules/$KERNEL_VER/kernel/drivers/gpu/drm/zocl/zocl.ko \
$ROOTFS_DIR/lib/modules/$KERNEL_VER/kernel/drivers/gpu/drm/zocl/zocl.ko.distro
# replace it with a new precompiled zocl v2.15
unzip /root/modules/zocl-2.15.zip -d $ROOTFS_DIR/lib/modules/$KERNEL_VER/kernel/drivers/gpu/drm/zocl/
2. install-docker.sh
Adds Docker support directly into the root filesystem:
- Installs Docker CE, CLI, containerd, and plugins (
docker-compose
,buildx
). - Configures the official Docker apt repository inside the chroot.
- Ensures Docker is ready to use on first boot.
apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
3. post-config.sh
Performs the final system setup:
- Enables root SSH login.
- Creates a
kria
user (password:kria
) with sudo and docker group membership. - Configures networking (
eth0
via DHCP). - Sets timezone (Ljubljana in my case).
- Enables
systemd-networkd
andsystemd-resolved
services for reliable networking.
# Create kria user and add to docker group
chroot /mnt/rootfs useradd -m -s /bin/bash kria
echo "kria:kria" | chroot /mnt/rootfs chpasswd
chroot /mnt/rootfs usermod -a -G docker kria
How to use it
1. Clone the build system repo:
git clone https://github.com/s59mz/kria-build-system.git
cd kria-build-system
2. Check out the Wild-Sight-AI release branch:
git checkout tags/wild-sight-ai-1.0
3. Build the SD card image using Docker:
./build.sh
4. The final .wic
file will appear in the output/ directory
. Flash it to an SD card using dd or Balena Etcher, resize the ext4 partition to maximum available space on SD card, insert it into the KR260, and power up.
There’s the whole project series available in Hackster.io, about how to build a custom Kria SD card images.
Results
With this custom image:
- The right ZOCL module (v2.15) is in place for Vitis AI 3.5.
- Docker works out of the box.
- The
kria
user can SSH in and run containers without extra setup. - The Wild-Sight-AI pipeline runs immediately without manual system tweaks.
This way, anyone can clone the repo, build the image, and get an identical runtime environment for Wild-Sight-AI on their own Kria board. 🚀
Comments