When visiting nature, especially protected nature parks, which are abundant in wildlife, birds are commonly heard much more easily than they are seen. As such, an average visitor can't help themselves with descriptions and images very well.
The most prominent feature is actually sound and identification based on audio is much more practical in this case. If visitors can hear as well as correctly attribute the birds' singing and other signature sounds, they can have a more informed and immersed experience of their visit.
Machine learning for bird classificationWe used the Edge impulse platform to train a model on audio samples of 7 different bird species from Goričko Nature Park. We chose the species most significant to the local cultural landscape, a lot of which are also endangered.
We used xeno-canto's extensive sound library to find clear and distinctive audio clips for each of the 7 species. We also uploaded noisy audio clips that did not contain any bird sounds to account for instances where the bird sound is either not present or the user's environment is too noisy to identify it.
While the model's accuracy did not quite reach the percentages we would have wanted, it still proved to be efficient in recognizing the birds distinctive sounds in good enough conditions.
Since other modes of birdwatching and classifying often limit the user to certain hardware or physical object, we wanted to make our app as widely accessible and usable as possible. It is built using React with Tailwind and Shadcn UI components, so it adapts to any screen or device without issues.
To make our app as lightweight as possible and enable real-time machine learning classification directly in the browser, we deployed our Edge Impulse model using their WebAssembly (WASM) runtime. WebAssembly allows running highly efficient, compiled code within modern browsers, which makes it possible to execute models on sensor data without relying on a server or backend.
This approach provides several benefits:
- Privacy: All sensor data processing and inference are performed locally in the user’s browser, so sensitive data is never sent to external servers
- Low Latency: Predictions are computed instantly on-device, resulting in a seamless, responsive user experience.
- Cross-Platform: WebAssembly runs in all major browsers, making our solution easily accessible across different operating systems and devices.
Therefore our app is frontend-only, with all logic happening on the client side. It is available online and only requires access to the microphone, which most modern mobile devices nowadays have. The user interface is also designed with mobile phones in mind and remains pleasant and efficient even on smaller screens. We also implemented both day and night mode for accessibility.
Upon identification or by browsing the gallery, the user can also read up on additional information about a bird or view its picture and listen to an example of its typical sound to either get another example or compare it to what they heard.
While we only focused on Goričko Nature Park for our prototype, we believe this concept would expand nicely to other parks and areas.
The model can easily be trained on a much larger dataset, and its current exported size only takes up 6.7MB of disk space. Since Edge Impulse is optimised for edge devices that don't have a lot of space available, it is highly unlikely for a model with even 100 or more examples to take up more than 20MB of space, making it lightweight and quick.
Comments