Published September 17, 2025 © GPL3+

WildSight AI: Real-Time Human-Wildlife Conflict Detection

Edge AI camera tracks wildlife in real time, classifies species, and alerts rangers to dangerous human-animal encounters.

ExpertFull instructions provided3 days172

WildSight AI: Real-Time Human-Wildlife Conflict Detection

Things used in this project

Hardware components

AMD Kria™ KR260 Robotics Starter Kit

NextPCB Custom PCB Board

Optional: For future extensions (not implemented yet)

USB Camera Module 2MP 30FPS 90 degrees

2-Axis Gimbal FPV

MG90S Servo

Software apps and online services

MegaDetector

ROS Robot Operating System

WCS Camera Traps dataset

GStreamer

AMD Video Analytics SDK

AMD Vitis AI

AMD Vivado Design Suite

AMD Vitis Unified Software Platform

XRT - Xilinx Run Time

OpenCV

Protocol Buffer

Docker

SpeciesNet Classifier

Optional, not implemented yet

Story

🎯 Overview

WildSight AI is an edge-powered camera system designed to detect wildlife, track animal movement, and alert rangers in real time when potentially dangerous human-animal encounters occur. The system is powered by the AMD Kria KR260 board, running AI models locally to ensure fast, reliable operation in remote and off-grid environments.

This project builds upon my earlier EagleEye AI smart tracking camera, which used face recognition and a Pelco-D pan-tilt unit. In the future, I plan to replace that unit with a lightweight, servo-controlled gimbal and wildlife-specific AI, including species classification and conflict detection logic.

🐘 Real-World Problem: Human-Wildlife Conflict

In many protected areas and national parks, poaching, crop damage, and accidental encounters with wildlife threaten both animals and local communities.

Rangers often lack real-time situational awareness, especially in remote areas without internet access. Traditional IP camera systems or cloud-based AI solutions are often too slow, too power-hungry, or unreliable in these conditions.

WildSight AI provides a fast, low-power alternative: a self-contained system that can track and classify animals, detect the presence of nearby humans, and send local or remote alerts before danger escalates.

🔧 How It Works

A USB or IP camera mounted on a 2-axis servo gimbal scans the area, controlled by ROS2 nodes.
The system detects and tracks movement using MegaDetector (for animals/humans) and optionally SpeciesNet (to identify species).

If both a protected animal and a human are detected in the same frame, an alert is triggered:

Local signal - LED lamp
Optional SMS/email alert to rangers

A live RTSP video stream or a DisplayPort video output is available for real-time monitoring.

All processing runs on-device with the Kria KR260, no internet/cloud needed.

About MegaDetector

MegaDetector is an open-source object detection model originally developed by Microsoft AI for Earth to help automate the processing of camera trap imagery. Instead of classifying every possible species, MegaDetector focuses on a general detection task, identifying whether an image contains an animal, a person, or a vehicle. This design makes it broadly applicable across projects and ecosystems worldwide, regardless of the species present. By filtering large datasets into “images with wildlife” versus “empty images,” MegaDetector greatly reduces the manual workload for researchers, who can then focus on species-level identification using smaller, specialized classifiers.

🌱 Why It Matters

Wild-Sight-AI is built for the field — designed to run independently, in harsh environments, without needing cloud infrastructure. It empowers conservation teams to protect wildlife before conflict occurs, using fast, reliable, and fully open technology. Wildlabs.net

Introduction

Wild-Sight-AI is my attempt to port Microsoft’s MegaDetector v5 — a YOLOv5-based wildlife detection model — onto the Kria KR260 adaptive SOM.

The motivation is simple: run a modern wildlife detector directly on the edge — no cloud, no offloading — and use the detections to control a PTZ camera in real time. This project builds on my earlier Eagle-Eye-AI setup, but this time with a much larger and more complex model.

The result:

After a long series of engineering battles, I managed to run MegaDetector on the Kria DPU using Vitis AI 3.5 + VVAS 3.0, and show the first bounding boxes live in the video output stream, taken from my IP TV set-top box.

Early detection results: initial bounding boxes from a video stream (TV set-top box input).

For this proof of concept, I used only 12 images for calibration — just enough to demonstrate the full pipeline of compiling a PyTorch model into an XMODEL and running it on the DPU of the Kria board.

With such a limited dataset, the model’s accuracy naturally degrades during quantization, which is clearly visible in the placement of the bounding boxes.

Close-up view of bounding boxes during the first XMODEL compilation attempt

After correcting the activation function — replacing the previously unsuitable LeakyReLU with the more appropriate Hardswish() — the MegaDetector model immediately produced far better results, even with only 12 calibration images.

Building on this, I reduced the input image size to 448x256 to boost performance and then recalibrated the model using 1,000 randomly selected images from the freely available WCS dataset.

With this improved calibration set, the detector achieved much more stable accuracy at an acceptable frame rate. Bounding boxes are now positioned correctly and track animals reliably, as illustrated in the example below.

Bounding boxes of the patched MegaDetector model

Challenges

This project wasn’t just about plugging in a model. I quickly discovered that almost nothing was compatible out of the box. Here’s a taste of the obstacles I had to overcome:

Model mismatch: MegaDetector is YOLOv5, but the official Kria runtime image only supports YOLOv3.
Outdated runtime: The SD card image is locked to Vitis AI Runtime 2.5 + VVAS 2.0, while YOLOv5 requires Vitis AI 3.5.
Kernel driver lock: The zocl driver in the official image blocks upgrading.
Custom runtime container: The official kria-runtime docker uses Vitis AI 2.5 — not usable.
Dependencies hell
Compiled Protobuf 3.21.3 from source.
Built OpenCV 4.6 from source to extract missing libraries.
Rebuilt the entire VVAS framework 3.0 from source.
FPGA image rebuild: Vivado 2023.1 required, with clock fixes for the newer kernel.
Quantization headaches: YOLOv5 contains unsupported ops (SiLU activations, reshapes, grids). I had to patch forward passes, handle anchors manually, and hack the quantizer script.
Accuracy evaluation: The quantizer doesn’t understand YOLO raw tensor heads, so I wrote a custom evaluation wrapper.

Every step was a rabbit hole. But eventually, all the pieces clicked together.

Building the Environment

To make everything reproducible, I created custom Dockerfiles and a new Ubuntu 22.04 root filesystem for Kria.

Kernel & zocl: Built a Vitis-compatible version of the zocl driver (2.15).
Docker runtime: Built my own kria-runtime image, based on Vitis AI 3.5 and VVAS 3.0.
VVAS build: Compiled from source on the Kria board.
Extra dependencies: Protobuf 3.21.3 and OpenCV 4.6 built from scratch.
Build times: ~2 hours directly on the board, but much faster using QEMU on a host PC (see my other Hackster project: Automate Kria Ubuntu SD Card Image with Docker and the updated scripts for generating Wild-Sight-AI SD-Card image on GitHub.

All Dockerfiles, FPGA image, and application code are on GitHub:

MegaDetector → Kria DPU

The model pipeline looks like this:

Start with the PyTorch MegaDetector v5 checkpoint (.pt).
Quantize with a small calibration dataset using pytorch_nndct.
Export to .xmodel.
Compile with vai_c_xir to generate a DPU-ready model.

Quantizing to work was the hardest part.

I had to rewrite the quantization script, since MegaDetector outputs raw detection heads.
Anchors were extracted directly from the PyTorch checkpoint.
I re-implemented YOLO decoding in C++ (anchors × stride, sigmoid activations, grid offsets).

ROS2 + GStreamer Integration

Inference doesn’t stop at the model. The whole application runs as a ROS2 node.

Input: Camera stream enters a GStreamer pipeline.
Inference: The vvas_xinfer plugin runs the compiled MegaDetector model on the DPU.
Pad probe: I intercept raw tensor outputs in a probe, decode them into bounding boxes, and build a GstInferencePrediction tree.
Publishing: Bounding boxes are published via ROS2 topics to the camera rotator controller nodes.
Overlay: VVAS DrawResults overlays bboxes on the outgoing video stream.

Simplified GStreamer pipeline diagram

In Wild-Sight-AI, MegaDetector serves as the first-stage detector running directly on the Kria board’s DPU. Its role is to scan incoming video streams in real time and highlight regions containing animals, people, or vehicles. These bounding boxes can then be used either directly for monitoring human–wildlife interactions or passed on to an optional second-stage classifier for species-level recognition. By leveraging MegaDetector’s proven robustness across diverse datasets, the system can generalize to many wildlife environments without requiring a massive custom dataset from the start.

Results

Finally, I saw bounding boxes on the screen set correctly and alert message appears, when a Human-Wildlife conflict occurs! 🎉

Performance: FPS= 9.03909 on KR260.
Visuals: Boundary boxes (different colors) follow moving people or animals.
Accuracy: A 1000 calibration images, taken from WCS camera traps, were used for this test, and results are quite astonishing.
Alert message: Appears, when both a wild animal and a human are present in the same frame.

Final result - Boundary boxes around detected humans and animals in a live broadcast video

Benchmark test of the MegaDetector XMODEL on Vitis-AI 3.5

Quick Launch and Testing the Application

This section is designed for quickly testing the software, building the Docker image, installing, and launching it with the appropriate FPGA firmware from our GitHub repository. Simply follow the steps below.

SDCard Image

Download the provided SD card image based on Ubuntu 22.04 and Docker on your host machine from our repo.

NOTE: The official SD card image for Kria board is currently not appropriate for this project, because it is based on Vitis-AI-Runtime 2.5. We need 3.5 and specific zocl kernel module (2.15), so use the provided one:

wget https://github.com/s59mz/kria-build-system/releases/download/wild-sight-ai-1.0/k-5.15-zocl-2.15-kria.wic.zip

Burn the image on an SD card (32 GB recommended) with Balena Etcher or similar program
Extend the rootfs partition to maximum size available with gparted program
Boot the Kria KR260 board with this image.
Use username: kria and password: kria for login.

Load Docker images

Download the provided tar.7z docker images on Kria board from here.

wget https://github.com/s59mz/wild-sight-ai/releases/download/1.0/kria-image_3.5.tar.7z
wget https://github.com/s59mz/wild-sight-ai/releases/download/1.0/wild-sight-ai_1.0.tar.7z

Unzip both files

sudo apt update
sudo apt install 7z

7z e kria-image_3.5.tar.7z
7z e wild-sight-ai_1.0.tar.7z

Load both images on your Docker system on Kria board. This takes some time.

docker load -i kria-image_3.5.tar
docker load -i wild-sight-ai_1.0.tar

Check if both images were installed successfully:

docker images

Install Application

Install wild-sight-ai repository from github

git clone https://github.com/s59mz/wild-sight-ai
cd wild-sight-ai

Optional: rebuild docker images

This step is needed only if you need to re-build the previous downloaded docker images from scratch. Warning: The build process takes about 2 hours on Kria board!

./build.sh

Install AI model

The provided megadetector.xmodel file is to big (~150MB) to be included in the source code, so it should be downloaded separately and installed in project directory:

cd model/megadetector
wget https://github.com/s59mz/wild-sight-ai/releases/download/1.0/megadetector.xmodel

Install FPGA firmware

cd ~/wild-sight-ai

# Install Firmware Binaries
cp fpga-firmware/firmware-kr260-wild-sight.deb /tmp
sudo apt install /tmp/firmware-kr260-wild-sight.deb

Build ROS2 Nodes

This step is needed to be run only once, before launching the application.

# Launch the wild-sight-ai docker image
kria@localhost:~/wild-sight-ai$ ./run.sh

-—-

# Build the ROS2 nodes inside a docker container
root@xlnx-docker:~/ros2_ws# colcon build

Lunch the Application

To start the application, these steps are needed every time you power up or restart the Kria board.

Connect an IP camera that supports RTSP and set the output resolution to 1920x1080. Ensure both the camera and the Kria board are on the same local network, using a standard 1 Gbps Ethernet switch.
Before launching the application, connect a high-resolution monitor to the DisplayPort of the Kria board. If your monitor has an HDMI input, use an Active DisplayPort to HDMI adapter (passive adapters will not work).
Execute the following commands on the running Kria board:

# Load the FPGA firmware
sudo xmutil unloadapp
sudo xmutil loadapp kr260-wild-sight
sudo xmutil desktop_disable

# go to the Wild-Sight-AI app git repository (in case yoy reboot the board)
cd ~/wild-sight-ai

# Launch the wild-sight-ai docker image
./run.sh

# Launch the app with your camera URL
./run_app.sh rtsp://192.168.1.11:554/stream1

# To Exit press Ctrl-C

Expected Output

You should see the camera’s captured images on the monitor connected to the board.

When an animal or a person is detected, Blue or Yellow boxes will appear around them, tracking its movement and the camera on the rotator tracks the detected animal.

Startup screen of Wild-Sight-AI Application

At the end of development, I recorded a short demonstration video to showcase Wild-Sight-AI in action on the Kria board. The system runs the quantized MegaDetector model in real time, draws bounding boxes around detected animals and humans, and raises a “Human-Wildlife Conflict Detected” alert when both an animal and a person appear in the same frame.

🎥 Watch the demo here:

Watch a live demo of Wild-Sight-AI detecting animals and humans in real-time on the Kria board.

In the video, the Kria board is connected to a rotating camera and an external display. First, the demo shows bounding boxes around animals in a random broadcast wildlife videos, including reptiles, and even fast-moving predators. Later, the setup switches to the project’s own camera, which successfully tracks a plush toy as a simulated “animal” target. The warning light is activated when both an “animal” and a “human” are present, illustrating the conflict detection functionality.

Future Work

This is only the first milestone. The next steps are:

Use a proper calibration dataset with wildlife images for better quantization.
Retrain / fine-tune MegaDetector specifically for the target hardware.
Upgrade pan/tilt hardware with faster servos and a custom adapter PCB.
Add a SpeciesNet classification in the pipeline for detecting species.
Field testing in real wildlife monitoring scenarios.

🔩 PCB Prototype (via NextPCB)

A simple but essential custom PCB will connect:

Kria’s GPIO pins (via Pi header) to servo PWM lines
A 5V step-down converter from 12V input
Servo and camera connectors

This board replaces the earlier bulky Pelco-D motor controller, allowing faster tracking and better portability.

Conclusion

This project demonstrates the first working MegaDetector v5 running on the Kria DPU with ROS2 + GStreamer integration.

Even though performance (~9.0 FPS) and accuracy still need tuning, the hardest part has been solved:

Migrated the Kria platform from Vitis AI 2.5 → 3.5.
Built a new runtime environment from scratch.
Rebuilt FPGA bitstream for compatibility.
Patched YOLOv5 to quantize and compile successfully.
Integrated everything into a ROS2 pipeline with live bounding box overlays.

Wild-Sight-AI proves that complex PyTorch models like MegaDetector can run on Kria SOMs with enough persistence. This is a key step toward autonomous, low-power wildlife monitoring at the edge.

* * *

Appendix: How to Compile MegaDetector (PyTorch → XMODEL)

Below is a clean, reproducible path to convert the MegaDetector v5 (PyTorch) model into a Vitis AI XMODEL you can deploy on the KR260 DPU. These steps match the toolchain used in this project and the quantization script yolo5s_quant.py provided in the yolov5 directory in the project repository.

Prerequisites

A machine with Docker installed (x86_64 is fine).
Vitis AI 3.5 Docker image (the official AMD image xilinx/vitis-ai-pytorch-cpu:latest).
The MegaDetector v5 PyTorch checkpoint (md_v5a.0.1.pt downloaded from here).
A small directory of calibration images representative of your target scenes (the more varied the better; 100–500 images is a good start). Images can be get from WCS Camera Traps. Labels/annotations are not needed for calibration—just images.
The KR260 DPU arch file (arch.json) that matches your bitstream (DPUCZDX8G B3136 for KR260 Smartcam is common).

$ cat arch.json 
{
    "fingerprint":"0x101000016010406"
}

Prepare working directory

# get Vitis-AI 3.5
git clone https://github.com/Xilinx/Vitis-AI
cd Vitis-AI

# create workspace directories
mkdir -p workspace/{models,calibration_images}
cd workspace

# Put your MegaDetector weights here
cp /path/to/md_v5a.0.1.pt models/

# Add some JPG/PNG calibration images here
cp -r /path/to/calib_images/* calibration_images/val/animals

Get YOLOv5,apply patches and drop in the quantization script.

# Get YOLOv5
git clone https://github.com/ultralytics/yolov5.git
cd yolo5

# Checkout a tag for MegaDetector
git checkout c23a441c9df7ca9b1f275e8c8719c949269160d1

# apply YOLOv5 patches 
cd models
cp path/to/wild-sight-ai/yolov5/patches/yolo.py .
cp path/to/wild-sight-ai/yolov5/patches/experimental.py .

# Put the provided quantization script from project repo to working directory
cd ../..
cp path/to/wild-sight-ai/yolo5s_quant.py .

The yolo5s_quant.py script:

Loads md_v5a.0.1.pt via YOLOv5’s DetectMultiBackend.
Runs Vitis AI quantizer with a calibration pass.
Runs a quick test/export pass to emit Model_int.xmodel.

Place the script next to your models/ and calibration_images/ folders:

./workspace
├─ yolov5/
├─ yolo5s_quant.py
├─ models/
│   └─ md_v5a.0.1.pt
└─ calibration_images/val/animals
    └─ *.jpg|*.png

Patches for yolo5 repo

We need to change 2 important things, unless we cannot compile to XMODEL:

In the Detect.forward() return raw tensors directly, without using other functions that are not supported in DPU, like split.
we need to replace all SiLU activator functions with Hardswish(inplace=True), because the SiLU float version is not supported in DPU IP v4.0

Start the Vitis AI 3.5 Docker

cd path/to/Vitis-AI
./docker_run.sh xilinx/vitis-ai-pytorch-cpu:latest

# inside a running docker install the missing needed tool for yolov5
pip install seaborn

# go to your working directory
cd workspace

Run quantization (Calib → Test/Export)

Calibration pass (uses your calibration images to collect statistics):

python3 yolo5s_quant.py \
--quant_mode calib \
--data_dir ./calibration_images \
--model_dir ./models/md_v5a.0.1.pt \
--subset_len 12 \
--batch_size 1

Test & export pass (emits the deployable XMODEL):

python3 yolo5s_quant.py \
--quant_mode test \
--data_dir ./calibration_images \
--model_dir ./models/md_v5a.0.1.pt \
--subset_len 1 \
--batch_size 1 \
--deploy

NOTE: The subset_len and batch_size must be set to 1 here, so the script emits the deployable XMODEL.

If successful, you’ll see artifacts under quantize_result/, including:

Model_int.xmodel ← quantized graph (pre-compile)

Compile to target DPU (vai_c_xir)

Use your KR260-matching arch.json:.

vai_c_xir \
-x quantize_result/Model_int.xmodel \
-a arch.json \
-o dpu_model \
-n megadetector

This produces:

dpu_model/
├─ megadetector.xmodel   ← compiled for your DPU
├─ meta.json             ← runner metadata
└─ md5sum.txt

NOTE: If you see a huge number of CPU-assigned ops (transposes, reshapes) or hundreds of tiny subgraphs, that’s usually a sign something prevented good fusion. The working configuration in this project uses Hardswish instead of SiLU to avoid CPU-only ops on DPUCZDX8G with VAI 3.5. Stick with the provided pipeline unless you have a reason (and the time) to retune the graph.

**************************************************
* VITIS_AI Compilation - Xilinx Inc.
**************************************************
[UNILOG][INFO] Compile mode: dpu
[UNILOG][INFO] Debug mode: null
[UNILOG][INFO] Target architecture: DPUCZDX8G_ISA1_B3136_0101000016010406
[UNILOG][INFO] Graph name: Model, with op num: 1222
[UNILOG][INFO] Begin to compile...
[UNILOG][INFO] Total device subgraph number 6, DPU subgraph number 1
[UNILOG][INFO] Compile done.
[UNILOG][INFO] The meta json is saved to "/workspace/Torch/dpu_model/meta.json"
[UNILOG][INFO] The compiled xmodel is saved to "/workspace/Torch/dpu_model/megadetector.xmodel"
[UNILOG][INFO] The compiled xmodel's md5sum is fb1e8be78645af8252e32f72ea0ceab0, and has been saved to "/workspace/Torch/dpu_model/md5sum.txt"
vitis-ai-user@

Copy to the target & sanity check

On the KR260 (host or over SSH), place the model where your app expects it. For this wild-sight-ai project:

scp dpu_model/megadetector.xmodel kr260:home/kria/wild-sight-ai/model/megadetector/

Quick benchmark

On the Kria board inside running container, when the FPGA firmware is loaded already:

xdputil benchmark /opt/xilinx/kr260-wild-sight/share/vitis_ai_library/models/megadetector/megadetector.xmodel 1

This produces 9.04 FPS (for 448×256 with no further optimizations):

kria@localhost:~/wild-sight-ai$ sudo xmutil unloadapp
[sudo] password for kria: 
remove from slot 0 returns: 0 (Ok)
kria@localhost:~/wild-sight-ai$ sudo xmutil loadapp kr260-wild-sight
kr260-wild-sight: loaded to slot 0
kria@localhost:~/wild-sight-ai$ ./run.sh 

======================================================================
 
 __          ___ _     _    _____ _       _     _              _____ 
 \ \        / (_) |   | |  / ____(_)     | |   | |       /\   |_   _|
  \ \  /\  / / _| | __| | | (___  _  __ _| |__ | |_     /  \    | |  
   \ \/  \/ / | | |/ _. |  \___ \| |/ _. | ._ \| __|   / /\ \   | |  
    \  /\  /  | | | (_| |  ____) | | (_| | | | | |_   / ____ \ _| |_ 
     \/  \/   |_|_|\__,_| |_____/|_|\__, |_| |_|\__| /_/    \_\_____|
                                     __/ |                           
                                    |___/                            

 
======================================================================

Build Date: 2025/09/12 21:28

*** Welcome to Wildi-Sight-AI! Type "./run_app.sh" to start.

root@xlnx-docker:/opt/xilinx/kr260-wild-sight/share/vitis_ai_library/models/megadetector# xdputil benchmark megadetector.xmodel 1
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20250926 23:06:27.466972    37 test_dpu_runner_mt.cpp:477] shuffle results for batch...
I20250926 23:06:27.469184    37 performance_test.hpp:73] 0% ...
I20250926 23:06:33.469415    37 performance_test.hpp:76] 10% ...
I20250926 23:06:39.469604    37 performance_test.hpp:76] 20% ...
I20250926 23:06:45.469782    37 performance_test.hpp:76] 30% ...
I20250926 23:06:51.469956    37 performance_test.hpp:76] 40% ...
I20250926 23:06:57.470139    37 performance_test.hpp:76] 50% ...
I20250926 23:07:03.470347    37 performance_test.hpp:76] 60% ...
I20250926 23:07:09.470520    37 performance_test.hpp:76] 70% ...
I20250926 23:07:15.470737    37 performance_test.hpp:76] 80% ...
I20250926 23:07:21.470952    37 performance_test.hpp:76] 90% ...
I20250926 23:07:27.471176    37 performance_test.hpp:76] 100% ...
I20250926 23:07:27.471251    37 performance_test.hpp:79] stop and waiting for all threads terminated....
I20250926 23:07:27.535609    37 performance_test.hpp:85] thread-0 processes 543 frames
I20250926 23:07:27.535678    37 performance_test.hpp:93] it takes 64405 us for shutdown
I20250926 23:07:27.535703    37 performance_test.hpp:94] FPS= 9.03998 number_of_frames= 543 time= 60.0665 seconds.
I20250926 23:07:27.535754    37 performance_test.hpp:96] BYEBYE
Test PASS.
root@xlnx-docker:~/ros2_ws#

SomeTroubleshooting Tips (I’ve learned by making this project):

Too many CPU ops in compile log: Typically caused by unsupported activations/layout dances. Using Hardswish(inplace=True) instead of SiLU or LeakyReLU got us a clean, fast graph.
Garbage boxes: Ensure you’re applying the official YOLOv5 decode math (sigmoid + 2x trick for xy/wh + correct grid/stride + per-class/objectness merge) before NMS. The quant script’s validation path can help catch decode bugs.
Calibration quality: Use diverse, representative images. If boxes are consistently off, increase calibration set (e.g., 300–500 frames) and re-run calib → test.

Appendix: Preparing the Calibration Dataset

To quantize the MegaDetector model into .xmodel, a representative dataset of calibration images is required. For this project, I built a helper toolkit that downloads random images from the WCS dataset, optionally crops out animals using the provided bounding boxes, and selects random subsets for calibration. The prepared directory calibration_images/ is then used directly by the quantization script.

👉 Full step-by-step instructions and scripts are available in the project dataset GitHub repository.

Appendix: YOLOv5-P6 Output Decoding on CPU

In the original YOLOv5 implementation, the Detect.forward() layer already performs the grid/anchor decoding and applies sigmoid activations before returning final predictions. However, many of these operations (view, permute, dynamic grid generation, etc.) are not supported by the DPU compiler.

To make the model deployable on the Kria board, we stripped Detect.forward() down to only the convolution layers, leaving the raw tensor outputs (anchors × 8 values per grid cell) on the DPU. This means the model no longer produces bounding boxes directly — instead, we must implement the entire YOLOv5 decoding pipeline on the CPU.

The following section explains how this decoder works and how the raw DPU outputs are transformed back into bounding boxes, scores, and classes.

YOLOv5-P6 models (like MegaDetector v5a) use four detection “heads,” corresponding to strides 8, 16, 32, and 64. Each head produces an H×W×(3×(5+num_classes)) tensor, where:

3 = number of anchors per cell
5 = (tx, ty, tw, th, to) → box center offsets, box size, and objectness logit
num_classes = raw logits for each class

For MegaDetector (humans, animals, vehicles) num_classes = 3, so each anchor has 8 values, giving C = 24 channels per cell.

Anchor and stride definitions

Each head is decoded with a fixed stride and corresponding anchor set:

P3 (stride 8): grid ≈ W/8 × H/8
P4 (stride 16): grid ≈ W/16 × H/16
P5 (stride 32): grid ≈ W/32 × H/32
P6 (stride 64): grid ≈ W/64 × H/64

Anchors are defined in pixels of the input resolution, not grid units.

Decoding procedure

For each grid cell (x, y) and anchor a, the 8 values are decoded:

tx, ty, tw, th, to, class_logits…

Steps:

1. Objectness + class score

obj  = sigmoid(to)
cls  = sigmoid(max(class_logits))
prob = obj * cls

Predictions below a confidence threshold are skipped.

2. Center coordinates

cx = (sigmoid(tx) * 2 - 0.5 + x) * stride
cy = (sigmoid(ty) * 2 - 0.5 + y) * stride

3. Box dimensions

bw = (sigmoid(tw) * 2)^2 * anchor_w
bh = (sigmoid(th) * 2)^2 * anchor_h

4. Box corners

x1 = cx - bw/2
y1 = cy - bh/2
x2 = cx + bw/2
y2 = cy + bh/2

5. NMS (Non-Max Suppression)

After all boxes are decoded, NMS removes duplicates using IoU thresholds.

Special considerations on Kria

Vitis AI delivers NHWC float tensors, so decoding loops treat channel index as the innermost stride.
The number of heads used depends on the input size. At smaller resolutions (e.g., 448×256), only the first two heads (stride 8 & 16) are meaningful.
Confidence and NMS thresholds must be tuned; higher thresholds reduce false positives but may miss smaller objects.
If the input is letterboxed to match the model’s stride requirements, un-letterboxing (or VVAS MetaAffineFixer) is needed to map predictions back to the original video resolution.
The NHWC decoder is implemented in the handle_tensorbuf.cpp file.

Appendix: Docker Images

To make Wild-Sight-AI reproducible and portable, the project uses two Docker images:

Kria Runtime Image (kria-image:3.5):

Based on Ubuntu 22.04 with Vitis AI Runtime 3.5 and VVAS 3.0.
Includes required dependencies: XRT 2.15, OpenCV 4.6, Protobuf 3.21.3, GStreamer plugins, and VVAS libraries.
Provides the execution environment for AI workloads on the KR260 DPU.

Wild-Sight-AI Application Image (wild-sight-ai:1.0):

Based on the Kria Runtime image.
Adds ROS 2 Humble, the Wild-Sight-AI application code, and all necessary build tools.
This is the only image you need to run the final application.

Build & Run

A helper script build.sh is provided in the repository root. It automatically:

Builds the runtime image (if not already present).
Builds the application image on top.
Prints a success message once ready.

To run the application, simply use:

./build.sh
./run.sh

The run.sh script launches the ROS 2 node with GStreamer pipelines, publishes detections, and overlays bounding boxes on the output video stream.

More Details

For a detailed breakdown of the Dockerfiles, build process, dependencies, and troubleshooting tips, see the full Dockerfiles README.

Appendix: Building a Custom Wild-Sight-AI SD Card Image

For this project I needed a custom Ubuntu-based SD card image for the Kria KR260 board. The default Xilinx images are a good starting point, but they don’t always include the right kernel modules, the correct ZOCL driver version for Vitis AI, or convenient extras like Docker. To make the Wild-Sight-AI pipeline reproducible and portable, I automated the entire image creation process using my Kria Build System:

👉 GitHub repo: s59mz/kria-build-system

👉 Wild-Sight-AI release: wild-sight-ai-1.0

This repo provides scripts that run inside a Docker build environment, fetch the official Ubuntu rootfs and kernel packages, patch the image with the right modules, and finally configure the system for Wild-Sight-AI deployment.

Why a custom image?

The stock KR260 Ubuntu image comes with ZOCL v2.13, but Vitis AI 3.5 requires ZOCL v2.15.
We also need Docker preinstalled for deployment flexibility.
It’s handy to preconfigure a user account, SSH access, networking, and timezone so the image is plug-and-play.

Scripts overview

The main repo (kria-build-system) drives the process, but the Wild-Sight-AI release adds three important scripts:

1. install-modules.sh

This script ensures the SD card image has the correct kernel modules installed. It:

Downloads the right kernel modules package (5.15.0-1053-xilinx-zynqmp).
Extracts and installs them into /lib/modules.
Replaces the stock ZOCL v2.13 module with ZOCL v2.15, either from a prebuilt ZIP or by building from XRT sources.
Runs depmod to update module dependencies.

# remove the old zocl module v2.13
mv $ROOTFS_DIR/lib/modules/$KERNEL_VER/kernel/drivers/gpu/drm/zocl/zocl.ko \
   $ROOTFS_DIR/lib/modules/$KERNEL_VER/kernel/drivers/gpu/drm/zocl/zocl.ko.distro

# replace it with a new precompiled zocl v2.15
unzip /root/modules/zocl-2.15.zip -d $ROOTFS_DIR/lib/modules/$KERNEL_VER/kernel/drivers/gpu/drm/zocl/

2. install-docker.sh

Adds Docker support directly into the root filesystem:

Installs Docker CE, CLI, containerd, and plugins (docker-compose, buildx).
Configures the official Docker apt repository inside the chroot.
Ensures Docker is ready to use on first boot.

apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

3. post-config.sh

Performs the final system setup:

Enables root SSH login.
Creates a kria user (password: kria) with sudo and docker group membership.
Configures networking (eth0 via DHCP).
Sets timezone (Ljubljana in my case).
Enables systemd-networkd and systemd-resolved services for reliable networking.

# Create kria user and add to docker group
chroot /mnt/rootfs useradd -m -s /bin/bash kria
echo "kria:kria" | chroot /mnt/rootfs chpasswd
chroot /mnt/rootfs usermod -a -G docker kria

How to use it

1. Clone the build system repo:

git clone https://github.com/s59mz/kria-build-system.git
cd kria-build-system

2. Check out the Wild-Sight-AI release branch:

git checkout tags/wild-sight-ai-1.0

3. Build the SD card image using Docker:

./build.sh

4. The final .wic file will appear in the output/ directory. Flash it to an SD card using dd or Balena Etcher, resize the ext4 partition to maximum available space on SD card, insert it into the KR260, and power up.

There’s the whole project series available in Hackster.io, about how to build a custom Kria SD card images.

Results

With this custom image:

The right ZOCL module (v2.15) is in place for Vitis AI 3.5.
Docker works out of the box.
The kria user can SSH in and run containers without extra setup.
The Wild-Sight-AI pipeline runs immediately without manual system tweaks.

This way, anyone can clone the repo, build the image, and get an identical runtime environment for Wild-Sight-AI on their own Kria board. 🚀

Prebuilt Docker images and XMODEL

• kria-image_3.5.tar.7z Prebuilt Docker image containing the Kria runtime environment (Vitis AI Runtime 3.5 + VVAS 3.0). • wild-sight-ai_1.0.tar.7z Prebuilt Docker image of the Wild-Sight-AI application, based on ROS2 Humble, GStreamer, and integrated with the MegaDetector model. • megadetector.xmodel (manual copy required) Due to file size limits, the compiled MegaDetector model (xmodel) is not packaged inside the GitHub repository. → Download separately and place it into: wild-sight-ai/models/megadetector/

Credits

Matjaz Zibert

14 projects • 31 followers

Hardware Engineer with Software Development Skills, Extensive background in telecommunications, FPGA integration, Callsign S59MZ (Ham-Radio)

Thanks to Dan Morris , Ultralytics, and WCS Camera Traps - LILA BC.

WildSight AI: Real-Time Human-Wildlife Conflict Detection

Things used in this project

Hardware components

Software apps and online services

Story

🎯 Overview

🐘 Real-World Problem: Human-Wildlife Conflict

🔧 How It Works

About MegaDetector

🌱 Why It Matters

Introduction

Challenges

Building the Environment

MegaDetector → Kria DPU

ROS2 + GStreamer Integration

Results

Quick Launch and Testing the Application

🎥 Watch the demo here:

Future Work

Conclusion

* * *

Appendix: How to Compile MegaDetector (PyTorch → XMODEL)

Appendix: Preparing the Calibration Dataset

Appendix: YOLOv5-P6 Output Decoding on CPU

Appendix: Docker Images

Appendix: Building a Custom Wild-Sight-AI SD Card Image

Code

Wild-Sight-AI

SD Card Image for Kria KR260

Prebuilt Docker images and XMODEL

Credits

Matjaz Zibert

Comments

Embed the widget on your own site

WildSight AI: Real-Time Human-Wildlife Conflict Detection

WildSight AI: Real-Time Human-Wildlife Conflict Detection

Things used in this project

Hardware components

Software apps and online services

Story

🎯 Overview

🐘 Real-World Problem: Human-Wildlife Conflict

🔧 How It Works

About MegaDetector

🌱 Why It Matters

Introduction

Challenges

Building the Environment

MegaDetector → Kria DPU

ROS2 + GStreamer Integration

Results

Quick Launch and Testing the Application

🎥 Watch the demo here:

Future Work

Conclusion

* * *

Appendix: How to Compile MegaDetector (PyTorch → XMODEL)

Appendix: Preparing the Calibration Dataset

Appendix: YOLOv5-P6 Output Decoding on CPU

Appendix: Docker Images

Appendix: Building a Custom Wild-Sight-AI SD Card Image

Code

Wild-Sight-AI

SD Card Image for Kria KR260

Prebuilt Docker images and XMODEL

Credits

Matjaz Zibert

Comments

Related channels and tags