This tutorial is an explanation document of VART ADAS Detection from https://github.com/Xilinx/Vitis-AI/tree/master/examples/vai_runtime/adas_detection.
This Vitis AI Runtime ADAS detection has C++ based inference script which uses Multi-theading approach to get higher performance. This adas detection application is there since very early version of Vitis AI (also there was at DNNDK, if you know Deephi DNNDK then), and it can be build on any MPSoC with vitis ai runtime install and can be run over there. It can be easily tested on any MPSoC Evaluation Boards as KV260, ZCU102, Ultra96v2, ZCU104 and many other EVK or custom boards.
Due to its C++ implementation and optimized coding approach this application perform better while running it on any MPSoC Board. Also the Yolov3 network model used with this inference script is Pruned one 'yolov3_adas_pruned_0_9.xmodel' so that it have outstanding performance even you are running lower architecture of DPU hardware in the board.
1. IntroductionIn the realm of automotive innovation, Advanced Driver Assistance Systems (ADAS) stand as a beacon of safety and convenience. These systems amalgamate state-of-the-art sensors, cameras, and AI algorithms to furnish vehicles with heightened perceptiveness and responsive capabilities. By continually scanning the surroundings, analyzing road conditions, and alerting drivers to potential dangers, ADAS systems empower safer and more confident driving experiences. They encompass an array of functionalities including collision warning, lane departure detection, adaptive cruise control, blind spot monitoring, and parking assistance, collectively reshaping the landscape of road safety.
In alignment with the paradigm of ADAS advancement, this project sets out to implement a pivotal component: object detection. Specifically tailored for real-time applications, the project leverages the YOLO-v3 (You Only Look Once) object detection algorithm through the Vitis AI 3.0 framework from Xilinx. This strategic integration enables the swift identification of diverse objects such as cars, pedestrians, and cyclists from dynamic video streams or image sequences. By harnessing the computational efficiency and precision of YOLO-v3 within the robust infrastructure of Vitis AI, the project not only bolsters ADAS capabilities but also signifies a stride towards a safer and more intelligent automotive future.
2. Algorithm Overview1.YOLO-v3 Object Detection Algorithm:
- YOLO (You Only Look Once) is a state-of-the-art deep learning algorithm renowned for its efficiency and accuracy in real-time object detection tasks.
- YOLO-v3 is the third iteration of the YOLO algorithm, introducing several architectural improvements over its predecessors.
- Key features of YOLO-v3 include a feature pyramid network, which enables the detection of objects at multiple scales, and a prediction mechanism that utilizes a single neural network to predict bounding boxes and class probabilities directly from full images in a single evaluation.
- YOLO-v3 divides the input image into a grid and predicts bounding boxes and their associated class probabilities for each grid cell, resulting in a highly efficient and parallelizable approach to object detection.
2.Model Architecture:
- The YOLO-v3 model architecture consists of a backbone convolutional neural network (CNN) followed by detection layers.
- The backbone CNN, typically based on architectures like Darknet or ResNet, extracts features from the input image.
- Detection layers are responsible for predicting bounding boxes, confidence scores, and class probabilities for objects detected in the input image.
- YOLO-v3 employs a multi-scale detection strategy, enabling it to detect objects of varying sizes and aspect ratios within the same image.
3.Training and Inference:
- During training, the YOLO-v3 model is trained on annotated datasets using techniques like gradient descent and backpropagation to optimize its parameters for object detection.
- In inference, the trained model is deployed to detect objects in real-time or on static images or video frames.
- The model takes an input image and processes it through the backbone CNN to extract features.
- These features are then passed through detection layers to predict bounding boxes, confidence scores, and class probabilities for detected objects.
- Post-processing techniques such as non-maximum suppression (NMS) are applied to filter out redundant bounding boxes and refine the final set of detected objects.
4.Integration with Vitis AI:
- The YOLO-v3 model is integrated into the Vitis AI 3.0 framework from Xilinx for deployment on Xilinx hardware platforms.
- Vitis AI provides tools and libraries for optimizing and deploying deep learning models on Xilinx FPGAs and SoCs, enabling accelerated inference for real-time applications.
- By leveraging Vitis AI, the YOLO-v3 model can exploit the parallel processing capabilities of Xilinx hardware, resulting in efficient and high-performance object detection for ADAS applications.
Overall, the YOLO-v3 algorithm, combined with the Vitis AI framework, offers a powerful solution for real-time object detection in ADAS systems, providing enhanced safety and situational awareness on the road.
2.1. YOLO-v3 Object Detection AlgorithmThe YOLO-v3 algorithm is a real-time object detection system that divides the input image into a grid of cells. For each cell, the algorithm predicts bounding boxes and confidence scores for those boxes. The confidence scores represent the probability that an object exists within the bounding box and the accuracy of the predicted bounding box. Additionally, the algorithm predicts the probability of each class (e.g., car, person, cyclist) for each bounding box.
Steps:
1.Divide the Input Image:
- The algorithm divides the input image into an S × S grid of cells.
2.Prediction for Each Cell:
For each cell in the grid, the algorithm predicts:
- B bounding boxes: These bounding boxes represent the potential locations of objects within the cell.
- Confidence scores: These scores indicate the likelihood that an object exists within each bounding box, along with the accuracy of the prediction.
- Class probabilities: These probabilities represent the likelihood of each class (e.g., car, person, cyclist) being present within each bounding box.
- For each cell in the grid, the algorithm predicts:B bounding boxes: These bounding boxes represent the potential locations of objects within the cell.Confidence scores: These scores indicate the likelihood that an object exists within each bounding box, along with the accuracy of the prediction.Class probabilities: These probabilities represent the likelihood of each class (e.g., car, person, cyclist) being present within each bounding box.
3.Non-Maximum Suppression (NMS):
- To filter out redundant bounding boxes and ensure only the most confident detections are retained, the algorithm applies non-maximum suppression (NMS). This process removes overlapping bounding boxes and selects the ones with the highest confidence scores.
4.Output Generation:
- The final output consists of the filtered bounding boxes, along with the associated class predictions and confidence scores.
function YOLO(image):
grid_cells = divide_into_grid(image, S)
bounding_boxes = []
for cell in grid_cells:
for anchor in B:
box = predict_bounding_box(cell, anchor)
confidence = predict_confidence(cell, anchor)
class_probabilities = predict_class_probabilities(cell, anchor)
bounding_boxes.append((box, confidence, class_probabilities))
filtered_boxes = non_max_suppression(bounding_boxes)
return filtered_boxes
2.3. Flowchart->
+------------------------+
| Input Image |
+------------------------+
|
v
+------------------------+
| Divide into S x S |
| Grid Cells |
+------------------------+
|
v
+------------------------+
| Predict Bounding |
| Boxes, Confidences, |
| and Class Probs |
+------------------------+
|
v
+------------------------+
| Non-Max Suppression |
+------------------------+
|
v
+------------------------+
| Output Bounding |
| Boxes and Labels |
+------------------------+
This pseudocode and flowchart illustrate the key steps of the YOLO-v3 algorithm, including dividing the input image into grid cells, predicting bounding boxes, confidences, and class probabilities for each cell, aggregating predictions, applying non-maximum suppression, and finally outputting the filtered bounding boxes.
3. Implementation DetailsFor section/explanation from "Implementation Details", please go through this ReadMe Document. Github ReadMe is more easier to walk through.
Thanks for going through this Hackster.io tutorial and also the extended Github ReadMe document of it.
Kudos to Jinu Nyachhyon for creating this detail document!
Comments