Published April 15, 2024 © CC BY

Trying Thai voice commands on Arduino Nano RP2040 board

Arduino Speech Recognition Engine is quite interesting with its offline, MCU-oriented, and support more than 40 languages.

BeginnerProtip1 hour160

Trying Thai voice commands on Arduino Nano RP2040 board

Things used in this project

Hardware components

Arduino Nano RP2040 Connect

Software apps and online services

Arduino Speech Recognition Engine

PlatformIO IDE

DSpotterSDK Maker

Story

In December 2023, I participated in a hands-on workshop titled "Renesas Voice UI Solutions" organized in Thailand by the Thai Embedded Systems Association (TESA) and AVNET. As a result, I received a Renesas Voice-RA6E1 board. Since this period is a long Songkran holiday, I have decided to experiment with the Voice-RA6E1 board. However, the Renesas development process using e2 studio is a bit complex, and there's a risk of deleting the license key programmed into flash memory if you don't use the sample code. While researching further, I came across news about four genuine Arduino boards that support Cyberon's Speech Recognition Engine. I happen to have an Arduino Nano RP2040 board, so I'm considering giving it a try.

Arduino boards supporting Speech Recognition Engine.

1st step: example code

To experiment with sample code using Cyberon's Speech Recognition Engine, the first step is to retrieve the serial number from the board. Here's how to retrieve the serial number for the Arduino Nano RP2040 board:

Install the Cyberon_DSpotterSDK_Maker_RP2040 library.
Open the GetSerialNumber.ino sample code.
Build and upload the code to the board.
Check the serial number printed from the Serial monitor.

This Cyberon_DSpotterSDK_Maker_RP2040 library provides the necessary functions for interacting with Cyberon's Speech Recognition Engine on the Arduino Nano RP2040 board. You can install the library using the Arduino Library Manager or by manually downloading and adding the library files to your project. For the other three Arduino boards, the included library must be chosen according to the board type: Cyberon_DSpotterSDK_Maker_33BLE (Nano 33 BLE Sense), Cyberon_DSpotterSDK_Maker_PortentaH7 (Portenta H7), and Cyberon_DSpotterSDK_Maker_NiclaVision (Nicla Vision).

Serial number reported via Serial.

Then, I visited Cyberon's DSpotterSDK Maker website and registered for a trial license. Upon successful registration of the retrieved serial number, I received a license text to be entered into the CybLicense.h file. This license text is locked to the specific board used for registration. With the trial license text in hand, I decided to try out the sample code VR_LEDControl, which includes a pre-trained English language model. The trigger word is "Hey Arduino, " and the voice commands to control the three-color LED are "LED red", "LED green", "LED blue", and "LED off". The sample code utilizes the LED on pin 13 of the board to indicate the status (waiting for a trigger or a command).

Serial monitor showed detected commands that affected color LEDs.

The speech recognition models are provided in binary format and stored in two files: Model_L0.h (smaller file, lower accuracy) and Model_L1.h (larger file, higher accuracy). Due to its ARM Cortex-M0+ processor, the Arduino Nano RP2040 board is limited to using the Model_L0 to minimize processing load. The other three boards, however, can choose between Model_L0 and Model_L1. A list of commands and model codes is provided in the Info.txt file located in the /data subfolder.

The sample code appears straightforward. The callback function receives event information, command codes, and parameters obtained from detecting trigger words or commands.

void VRCallback(int nFlag, int nID, int nScore, int nSG, int nEnergy) {
  if (nFlag==DSpotterSDKHL::InitSuccess) {
    // code to handle SDK init success event
  } else if (nFlag==DSpotterSDKHL::GetResult) {
    switch(nID) {
      // code to handle voice commands
    }
  } else if (nFlag==DSpotterSDKHL::ChangeStage) {
    switch(nID) {
      // code to handle state changes: trigger <-> command
    }
  } else if (nFlag==DSpotterSDKHL::GetError) {
    // code to handle SDK error
  } else if (nFlag == DSpotterSDKHL::LostRecordFrame) {
    // code to handle voice streaming issues
  }
}

To assess the processing performance, I profiled the main code section, g_oDSpotterSDKHL.DoVR(). The maximum time observed, representing the processing time for audio data in the buffer, was 24 milliseconds. Counting the number of processing cycles within a second, I found approximately 3-4 cycles resulting in less than 100 milliseconds. This indicates a utilization ratio of no more than 10%, suggesting ample time for handling other tasks. Considering the RP2040's ARM Cortex-M0+ processor running at 133 MHz, the performance is quite impressive. Switching to a high-performance processor like Cortex-M4 or M7 would likely enable even more demanding signal processing capabilities.

Serial monitor showing profiled time sequence

2nd step: switch to Thai

Compared to speech recognition through Google Assistant or other cloud-based solutions, theCyberon engine offers several advantages for developing home appliance products:

Offline operation: Eliminates the complexity of network connectivity.
Microcontroller-oriented: Enables processing on microcontrollers, making it suitable for resource-constrained devices.
Multilingual support: Supports up to 40 languages/dialects, catering to a global audience.

Language is a crucial aspect of developing products for smart home scenarios. So, I would like to try this feature in my native tongue: Thai language.

To facilitate Thai commands, a new model must be created from user-selected phrases. The sample code provides the following steps for creating a custom language model:

Register email and board: Enter email and board information on the model settings website and accept the license agreement.
Select Project Language: Choose the language to be used for the project.
Define Trigger and Command Phrases: Provide the phrases that will serve as triggers and commands for voice interactions.
Generate Model: Upon confirmation, the generated model file will be sent to the registered email address.

The generated language model can then be integrated into the product's firmware to enable voice recognition and command execution in Thai.

Flow of the custom model creation.

Replacing the CybLicense.h and Model_L0.h files in the original sample code with the files received via email allows the VR_LEDControl code to function immediately. However, the newly created model introduces a 20-second delay between each trigger reception, as evident from the messages reported through the Serial monitor. This is one of the limitations of any custom model created by the trial license.

Delay between each trigger as the effect of trial license for custom commands.

Based on the experiment with sequential voice commands, it appears that the speech recognition of Thai commands is not yet fully accurate. This could be due to the use of the lower-accuracy L0 model. The command "ไฟเขียวติด" (substitution of "LED green" command) was not detected at all, while other commands were recognized correctly. For the trigger phrase "สวัสดีครับ" (substitution of "Hey Arduino" trigger), it had to be spoken slowly to be detected. However, when switching to using voice narrated by Google Translate for speech output, the recognition accuracy was higher than with my voice.

3rd step: add online feature

The Arduino Nano RP2040 Connect board includes a u-blox NINA-WT102 module (essentially an ESP32) for connecting to WiFi and Bluetooth wireless networks. Adding online features is relatively straightforward, I chose to experiment with the MQTT protocol due to its minimal overhead. Since I always use Platform.io as the development tool, I started by adding the WiFiNINA library (replacing the standard WiFi library), PubSubClient, and ArduinoJSON. The code modifications were implemented in two ways:

Uplink function: When a voice command is recognized, the status is reported as JSON to the MQTT broker.
Downlink function: JSON commands for controlling the LED can be used via MQTT.

When I added the MQTT protocol code to the sample VR_LEDControl code, I encountered a barrage of "lost recording frame" messages, preventing voice command detection. Debugging revealed that the common practice of periodically checking the connection to the MQTT broker was causing delays that resulted in missing parts of the audio data. Consequently, I opted to move the WiFi/MQTT connection code to the setup() function, which immediately resolved the issue.

Detected voice command being reported via MQTT.

Final idea:

The Speech Recognition Engine developed in collaboration between Arduino and Cyberon is impressive in terms of its simple code and convenient online tools. However, the 20-second delay in detecting trigger phrases feels like a subtle push to purchase the $9 license to unlock this limitation. Additionally, this purchase is locked to the board, not the user. Therefore, anyone planning to use it for a project should be aware of this condition, as it could lead to budget overruns if the number of devices needs to be expanded. For businesses, choosing hardware that has a pre-existing agreement with Cyberon, such as Renesas, seems like a much better option. So, my current plan is to continue my development effort with the Renesas Voice RA6E1 board.

Renesas Voice-RA6E1 board.

References:

Instantly understand 40+ languages, with Speech Recognition Engine.

main.cpp

#include <Arduino.h>
#include <SPI.h>
#include <WiFiNINA.h>
#include <PubSubClient.h>
#include <ArduinoJson.h>
#include <DSpotterSDK_MakerHL.h>
#include <LED_Control.h>

// WiFi and MQTT settings
// constants
#define WIFI_SSID         "WIFI_SSID"
#define WIFI_PASSWD       "WIFI_PASSWD"
#define MQTT_BROKER       "MQTT_BROKER"
#define MQTT_PORT         1883
#define MQTT_STATUS_TOPIC "STATIC_TOPIC"
#define MQTT_CMD_TOPIC    "CMD_TOPIC"

// persistent variables
WiFiClient wifiClient;
PubSubClient mqttClient(wifiClient);
JsonDocument jsonDoc;
int lastEvent = 0;

// private functions
void MQTTReconnect(void);
void MQTTCallback(char* topic, byte* payload, unsigned int length);

// The DSpotter License Data.
#include "CybLicense.h"
#define DSPOTTER_LICENSE g_lpdwLicense

// The DSpotter Keyword Model Data.
#if defined(TARGET_ARDUINO_NANO33BLE) || defined(TARGET_PORTENTA_H7) || defined(TARGET_NICLA_VISION)
// For ARDUINO_NANO33BLE and PORTENTA_H7
#include "Model_L1.h"             // The packed level one model file.
// For NANO_RP2040_CONNECT
#elif defined(TARGET_NANO_RP2040_CONNECT)
#include "Model_L0.h"             // The packed level zero model file.
#endif
#define DSPOTTER_MODEL g_lpdwModel

// Define for led return value, please refer to the info.txt
#define COMMAND_LED_GREEN  10000
#define COMMAND_LED_RED    10001
#define COMMAND_LED_BLUE   10002
#define COMMAND_LED_OFF    10003

// The VR engine object. Only can exist one, otherwise not worked.
static DSpotterSDKHL g_oDSpotterSDKHL;

void VRCallback(int nFlag, int nID, int nScore, int nSG, int nEnergy);

// initialize hardware and software components
void setup() {
  // Init LED control
  LED_Init_All();

  // Init Serial output for show debug info
  Serial.begin(9600);
  while(!Serial);
  DSpotterSDKHL::ShowDebugInfo(true);

  // Init VR engine & Audio
  if (g_oDSpotterSDKHL.Init(DSPOTTER_LICENSE, sizeof(DSPOTTER_LICENSE), DSPOTTER_MODEL, VRCallback) != DSpotterSDKHL::Success)
    return;

  // Check NINA module
  if (WiFi.status() == WL_NO_MODULE) {
    Serial.println("Communication with WiFi module failed!");
    // don't continue
    while (true);
  }

  // Check firmware version
  String fv = WiFi.firmwareVersion();
  if (fv < WIFI_FIRMWARE_LATEST_VERSION) {
    Serial.print("Firmware version: ");
    Serial.println(fv);
    Serial.println("Please upgrade the firmware");
  }

  // Scan available WiFi networks
  int numSsid = WiFi.scanNetworks();
  for (int net_id = 0; net_id < numSsid; net_id++) {
    Serial.print(net_id);
    Serial.print(") ");
    Serial.print(WiFi.SSID(net_id));
    Serial.print("\tSignal: ");
    Serial.print(WiFi.RSSI(net_id));
    Serial.println(" dBm");
  }

  // connect WiFi
  do {
    Serial.print("Attempting to connect to WPA SSID: ");
    Serial.println(WIFI_SSID);
    int status = WiFi.begin(WIFI_SSID, WIFI_PASSWD);
    Serial.print("WiFi status: ");
    Serial.println(status);
    delay(5000);
  } while (WiFi.status() != WL_CONNECTED);

  // connect MQTT
  mqttClient.setServer(MQTT_BROKER, MQTT_PORT);
  mqttClient.setCallback(MQTTCallback);
  mqttClient.connect("RP2040-Nano-Voice");
  mqttClient.subscribe(MQTT_CMD_TOPIC);
}

void loop() {
  static uint32_t prevMs = 0;
  // Reconnect to MQTT
  //MQTTReconnect(); // skip this function, because it causes a lot of lost recording frames
  // Do VR
  g_oDSpotterSDKHL.DoVR();

  // report status change via MQTT
  if (lastEvent != 0) {
    char buf[256];
    jsonDoc.clear();
    switch(lastEvent) {
      case COMMAND_LED_GREEN:
        jsonDoc["event"] = "LED_GREEN";
        break;
      case COMMAND_LED_RED:
        jsonDoc["event"] = "LED_RED";
        break;
      case COMMAND_LED_BLUE:
        jsonDoc["event"] = "LED_BLUE";
        break;
      case COMMAND_LED_OFF:
        jsonDoc["event"] = "LED_OFF";
        break;
      default:
        jsonDoc["event"] = "UNKNOWN";
        break;
    }
    serializeJson(jsonDoc, buf);
    mqttClient.publish(MQTT_STATUS_TOPIC, buf);
    lastEvent = 0;
  }

  // MQTT loop
  if ((millis() - prevMs) > 1000) {
    prevMs = millis();
    mqttClient.loop();
  }
}

// Callback function for VR engine
void VRCallback(int nFlag, int nID, int nScore, int nSG, int nEnergy) {
  if (nFlag==DSpotterSDKHL::InitSuccess) {
      //ToDo
  }
  else if (nFlag==DSpotterSDKHL::GetResult) {
      /*
      When getting an recognition result,
      the following index and scores are also return to the VRCallback function:
          nID        The result command id
          nScore     nScore is used to evaluate how good or bad the result is.
                     The higher the score, the more similar the voice and the result command are.
          nSG        nSG is the gap between the voice and non-command (Silence/Garbage) models.
                     The higher the score, the less similar the voice and non-command (Silence/Garbage) models are.
          nEnergy    nEnergy is the voice energy level.
                     The higher the score, the louder the voice.
      */
      switch(nID) {
        case COMMAND_LED_GREEN:
          LED_RGB_Green();
          break;
        case COMMAND_LED_RED:
          LED_RGB_Red();
          break;
        case COMMAND_LED_BLUE:
          LED_RGB_Blue();
          break;
        case COMMAND_LED_OFF:
          LED_RGB_Off();
          break;
        default:
          break;
      }
      lastEvent = nID;
  } else if (nFlag==DSpotterSDKHL::ChangeStage) {
      switch(nID) {
          case DSpotterSDKHL::TriggerStage:
            LED_RGB_Off();
            LED_BUILTIN_Off();
            break;
          case DSpotterSDKHL::CommandStage:
            LED_BUILTIN_On();
            break;
          default:
            break;
      }
  } else if (nFlag==DSpotterSDKHL::GetError) {
      if (nID == DSpotterSDKHL::LicenseFailed)
      {
          //Serial.print("DSpotter license failed! The serial number of your device is ");
          //Serial.println(DSpotterSDKHL::GetSerialNumber());
      }
      g_oDSpotterSDKHL.Release();
      while(1);//hang loop
  } else if (nFlag == DSpotterSDKHL::LostRecordFrame) {
      //ToDo
  }
}

// Reconnect to MQTT
void MQTTReconnect(void) {
  while (WiFi.status() != WL_CONNECTED) {
    Serial.print("Attempting to connect to WPA SSID: ");
    Serial.println(WIFI_SSID);
    int status = WiFi.begin(WIFI_SSID, WIFI_PASSWD);
    Serial.print("WiFi status: ");
    Serial.println(status);
    delay(5000);
  }
  while (!mqttClient.connected()) {
    Serial.print("Attempting MQTT connection...");
    if (mqttClient.connect("RP2040-Nano-Voice")) {
      Serial.println("connected");
      mqttClient.subscribe(MQTT_CMD_TOPIC);
    } else {
      Serial.print("failed, rc=");
      Serial.print(mqttClient.state());
      Serial.println(" try again in 5 seconds");
      delay(5000);
    }
  }
}

// Callback function for MQTT
void MQTTCallback(char* topic, byte* payload, unsigned int length) {
  char buf[256];
  memset(buf, 0, sizeof(buf));
  memcpy(buf, payload, length);
  Serial.print("Got message: ");
  Serial.println(buf);
  deserializeJson(jsonDoc, buf);
  if (jsonDoc.containsKey("command")) {
    String command = jsonDoc["command"];
    if (command == "LED_GREEN") {
      LED_RGB_Green();
    } else if (command == "LED_RED") {
      LED_RGB_Red();
    } else if (command == "LED_BLUE") {
      LED_RGB_Blue();
    } else if (command == "LED_OFF") {
      LED_RGB_Off();
    }
  }
}

Credits

Supachai Vorapojpisut

4 projects • 2 followers

Background in Electrical Engineering with past experience in firmware development for several processor platforms.

Trying Thai voice commands on Arduino Nano RP2040 board

Things used in this project

Hardware components

Software apps and online services

Story

1st step: example code

2nd step: switch to Thai

3rd step: add online feature

Final idea:

References:

Code

main.cpp

Credits

Supachai Vorapojpisut

Comments

Embed the widget on your own site

Trying Thai voice commands on Arduino Nano RP2040 board

Trying Thai voice commands on Arduino Nano RP2040 board

Things used in this project

Hardware components

Software apps and online services

Story

1st step: example code

2nd step: switch to Thai

3rd step: add online feature

Final idea:

References:

Code

main.cpp

Credits

Supachai Vorapojpisut

Comments

Related channels and tags