It seems distractive in an office environment or university auditorium when a person is giving a presentation and moving towards the dice and switching the slides one after the other.
It is considered obligatory in our university presentations to designate a person to manage the display and switch slides. Thence I decided to resolve the problem by "Keyword Spotting"
Whether we are in the working place or university, the person will say start the display of a meeting, forward or go to next slide. Voice control makes this process more intuitive, by allowing users through voices to say, “start the Presentation” to get started. Then, when a presentation ends a presenter can say “, Stop the video” to turn attention away from the display and onto the discussion.
IdeaHow Does it Work?Collecting data and train model using an NN classifier made in Edge Impulse Studio, I can classify keywords said by different people. The device I used is small memory footprint and has low computational cost along with high precision.
Data Collecting
I spent days and nights collecting data from multiple people, family, friends and from universities to make model more robust I also took some commands from siri by saying these keywords.
Collected data:Since the result does not satisfy me, I added the data using my smartphone and micro controller
After that, I applied Edge Impulse's EON Tuner, which produced perfect test accuracy of about 91%, which is far better than it would have been without it. Making the optimal model for the datasets was made simple and convenient by using Edge Impulse Studio, which also gave the tools for data to model deployment and inference. On Edge Impulse, you can examine how the model and dataset were trained. https://studio.edgeimpulse.com/public/128033/latest
Video of the Project
Comments