Published August 10, 2025

NOVA - An AI assistant for optical store

Nova is interactive AI kiosk that lets you virtually try on glasses to find perfect pair and buy them instantly!

AdvancedFull instructions provided3 hours295

NOVA - An AI assistant for optical store

Things used in this project

Hardware components

Webcam, Logitech® HD Pro

Seeed Studio XIAO ESP32C3

ePaper Driver Board for Seeed Studio XIAO

Seeed Studio XIAO 7.5 - ePaper Panel

DFRobot 7'' HDMI Display with Capacitive Touchscreen

Raspberry Pi 4 Model B

Seeed Studio XIAO nRF52840 Sense (XIAO BLE Sense)

Software apps and online services

Raspberry Pi Raspbian

Arduino IDE

Microsoft VS Code

OpenCV

Google mediapipe

adafruit io

Story

Overview

In a traditional optical store, the assistant is more than a cashier-they’re a stylist, consultant, and guide. They study your face, suggest frames that suit your features, answer your questions, and help you finalize your purchase. Modern AR try-on systems may show you frames on your face, but they stop there. They can’t understand your needs, give expert advice, or remember your preferences.

The Idea Behind NOVA

Buying glasses isn’t just about trying them on-it’s about choosing the right pair. In most optical stores, that choice depends on having someone with you: a friend, family member, or a trusted expert who can give honest feedback. Many people don’t trust store assistants, assuming they’re there mainly to sell, not to give unbiased advice. Current AI kiosks only show how frames look, but they don’t tell you if they suit your face shape, style, or lifestyle. They can’t answer questions like: What’s this frame made of? Which colors suit me? What’s my face shape anyway? And surprisingly, most people don’t even know their own face shape.

On the seller’s side, having professional assistants available at all times is expensive. Large stores need multiple skilled staff working simultaneously, driving up manpower costs and making scaling harder.

Nova was designed to bridge both gaps-for customers and businesses. It acts like a knowledgeable, unbiased friend who knows your face shape, understands your preferences, and can recommend frames that truly fit you. It answers your questions instantly, in multiple languages, and remembers your choices. For store owners, Nova reduces the need for constant human assistance while still delivering high-quality, personalized customer guidance. This means better service for shoppers, lower costs for businesses, and a smarter, faster buying process for everyone.

Core Features

Voice Interaction: Users can talk to Nova - ask about frame styles, face shape suitability, or frame options and nova shows that pair of glasses without touching a screen.
Multilingual Support: Nova understands and responds in multiple languages, making eyewear shopping accessible to all.
Face Shape Intelligence: Using MediaPipe, Nova detects your face shape, remembersand suggests frames that suit you best.
AR Try-On Experience: Try glasses virtually in real time with advanced web-based AR integration.
Cloud-Powered AI: Nova uses Whisper for speech-to-text and GPT for intelligent, conversational responses.
Instant Token System: Press “Buy, ” and Nova sends your selection to the store’s e-paper display via MQTT—your glasses are ready at the counter.
Touch + Voice Control: Browse visually on a touch display or simply use your voice - Nova works both ways for convenience

When the “Buy” button is pressed on website, it sends the order data via MQTT web-hooks, instantly displaying the token number both on the website and the e-Paper display for quick pickup.

Hardware Used

Raspberry Pi 4 (2GB) – Runs Nova’s AI system, AR try-on web app, and handles voice processing + AI responses.
Logitech C922 Camera (with mic) – Captures live video for AR try-on and face shape analysis.
Seeed Studio XIAO ESP32C3 – Receives order data from the website via MQTT and drives the e-paper display to show the token number.
E-Paper Display – Displays the unique token number instantly at the counter along with glasses data for quick order pickup.
External Speaker – Outputs Nova’s AI-generated voice responses for clear, natural communication.
Xiao nrf Button + LED – Lets users confirm that audio is recording and sends signal to start and stop the recording.

How It Works

1. Voice & Interaction Input

Hardware: Seeed Studio XIAO nRF52840 with a physical button, switch, and LED indicator.

Function: When the user presses the button, recording starts instantly and the LED lights up to confirm. The switch can enable or disable voice mode.

Audio Source: Microphone on the Logitech C922 camera captures the voice input.

Reason: Gives users both visual (LED) and tactile (button) feedback for a simple, accessible interaction.

2. Speech-to-Text Conversion

Software/Service: Whisper (Open-AI) for high-accuracy multilingual transcription.

Reason: Converts spoken queries into text with minimal errors, even in noisy store environments.

3. Face Shape Detection & AI Understanding

Hardware: Logitech C922 camera for real-time face capture.

Software: MediaPipe Face Mesh detects and classifies the user’s face shape in real time.

Integration: MediaPipe’s output lets GPT “know” the user’s face shape and remember it during the session.

Extra Functionality: URL switching allows GPT to interactively guide users through product pages without manual browsing.

4. Intelligent Conversation

Software: GPT API for context-aware and personalized responses.

Reason: Merges visual analysis (MediaPipe) with natural conversation for a more human-like experience.

5. Audio Feedback

Hardware: speaker.

Software: Edge TTS for natural voice output.

Reason: Provides clear, pleasant speech feedback for an immersive.

6. Selection & Token System

Website: Custom web app with AR view, frame catalog, and “Buy” button.

Process: When “Buy” is pressed, the website sends data via Webhooks to MQTT, which is handled by a Seeed Studio XIAO ESP32C3 connected to an e-paper display.

Outcome: Token number appears both on the store’s e-paper counter display and on the user’s screen.

7. Persistent Store Display

Hardware: E-paper display.

Reason: Low-power, always-on visibility for quick order pickup.

Conclusion

With Nova, we’re not just trying on glasses — we’re redefining how people choose them. This isn’t a simple AR demo or a basic product catalog. It’s an intelligent assistant that sees your face shape, remembers your preferences, speaks your language, and instantly connects the online experience with the in-store counter through real-time tokens.

Let’s make eyewear shopping smarter. Let’s make AI your personal stylist.

Nova doesn’t just show you glasses — it helps you find the ones made for you.

Code

import os
import cv2
import time
import wave
import serial
import asyncio
import openai
import sounddevice as sd
import numpy as np
import edge_tts
import subprocess
from faster_whisper import WhisperModel
import mediapipe as mp

# ---------------- CONFIG ----------------
SERIAL_PORT = "/dev/ttyACM0"
BAUD_RATE = 115200
AUDIO_FILE = "input.wav"
IMAGE_FILE = "/home/shubh/image/captured.jpg"
MP3_OUTPUT_PATH = "/home/shubh/Music/response.mp3"
BLANK_HTML = "/home/shubh/blank.html"
MAIN_WEBSITE = "https://dhruvpandit46.github.io/Glasses/"

# Hardcoded API key (replace with yours)
openai.api_key = "sk-REPLACE_WITH_YOUR_KEY"

KEYWORD_URLS = {
    "man": "https://dhruvpandit46.github.io/Glasses/?sku=rayban_justin_noir_rougeMirroir",
    "woman": "https://dhruvpandit46.github.io/Glasses/?sku=rayban_clubround_noir_cuivre_flash",
    "rectangle": "https://dhruvpandit46.github.io/Glasses/?sku=rayban_chris_noir_gun_bleu_mirroir",
    "square": "https://dhruvpandit46.github.io/Glasses/?sku=rayban_justin_noir_rougeMirroir",
    "round": "https://dhruvpandit46.github.io/Glasses/?sku=rayban_round_gun_vert",
    "aviator": "https://dhruvpandit46.github.io/Glasses/?sku=rayban_aviator_or_marron",
    "metal": "https://dhruvpandit46.github.io/Glasses/?sku=rayban_erika_marronArgent_marronVioletDegrade",
    "plastic": "https://dhruvpandit46.github.io/Glasses/?sku=rayban_justin_noir_bleuMirroir"
}

# Whisper model (lightweight)
whisper_model = WhisperModel("tiny", compute_type="int8")

# Globals
face_shape_memory = None
last_url = MAIN_WEBSITE
recording_buffer = []
recording_stream = None

# ---------------- AUDIO RECORDING ----------------
def audio_callback(indata, frames, time_info, status):
    if status:
        print(status)
    recording_buffer.append(indata.copy())

def start_audio_recording(sr=16000):
    global recording_buffer, recording_stream
    recording_buffer = []
    try:
        recording_stream = sd.InputStream(samplerate=sr, channels=1, dtype='int16', callback=audio_callback)
        recording_stream.start()
        print("Recording started...")
    except Exception as e:
        print("Failed to start audio recording:", e)
        recording_stream = None

def stop_audio_recording(file, sr=16000):
    global recording_stream
    if recording_stream:
        try:
            recording_stream.stop()
            recording_stream.close()
        except Exception as e:
            print("Error stopping stream:", e)
    if not recording_buffer:
        print("No audio recorded.")
        with wave.open(file, 'wb') as wf:
            wf.setnchannels(1)
            wf.setsampwidth(2)
            wf.setframerate(sr)
            wf.writeframes(b'')
        return
    audio = np.concatenate(recording_buffer, axis=0)
    try:
        with wave.open(file, 'wb') as wf:
            wf.setnchannels(1)
            wf.setsampwidth(2)
            wf.setframerate(sr)
            wf.writeframes(audio.tobytes())
        print("Audio saved")
    except Exception as e:
        print("Failed to write audio file:", e)

# ---------------- CAMERA ----------------
def capture_image():
    for attempt in range(3):
        cap = cv2.VideoCapture("/dev/video0")
        cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
        cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
        time.sleep(0.5)
        if cap.isOpened():
            ret, frame = cap.read()
            cap.release()
            if ret:
                try:
                    os.makedirs(os.path.dirname(IMAGE_FILE), exist_ok=True)
                    cv2.imwrite(IMAGE_FILE, frame)
                    print(f"Image saved at {IMAGE_FILE}")
                    return True
                except Exception as e:
                    print("Failed to save image:", e)
                    return False
        else:
            cap.release()
        print(f"Retrying camera... (attempt {attempt+1}/3)")
        time.sleep(1)
    print("Error: Camera not accessible")
    return False

# ---------------- FACE SHAPE DETECTION ----------------
def detect_face_shape(path):
    img = cv2.imread(path)
    if img is None:
        print("detect_face_shape: image not found or unreadable.")
        return None
    with mp.solutions.face_mesh.FaceMesh(static_image_mode=True, max_num_faces=1) as mesh:
        res = mesh.process(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
        if not res.multi_face_landmarks:
            print("No face landmarks detected.")
            return None
        lm = res.multi_face_landmarks[0].landmark
        h, w = img.shape[:2]
        def pt(i): return np.array([lm[i].x * w, lm[i].y * h])
        try:
            jaw_w = np.linalg.norm(pt(234) - pt(454))
            cheek = np.linalg.norm(pt(93) - pt(323))
            forehead = np.linalg.norm(pt(127) - pt(356))
            length = np.linalg.norm(pt(152) - pt(10))
        except Exception as e:
            print("Landmark math failed:", e)
            return None
        r1 = length / max(jaw_w, 1e-6)
        r2 = cheek / max(jaw_w, 1e-6)
        r3 = forehead / max(jaw_w, 1e-6)
        if r1 > 1.5 and r2 < 0.95:
            return "oblong"
        elif r3 > 1.05 and r2 > 1.05:
            return "heart"
        elif r2 > 1.1 and r3 < 0.95:
            return "diamond"
        elif abs(jaw_w - length) < 40:
            return "round"
        elif abs(jaw_w - cheek) < 20:
            return "square"
        return "oval"

# ---------------- SPEECH -> TEXT ----------------
def speech_to_text(file):
    try:
        segs, info = whisper_model.transcribe(file)
        txt = " ".join([s.text for s in segs])
        lang = info.language or "en"
        return txt.strip(), lang
    except Exception as e:
        print("Whisper transcription failed:", e)
        return "", "en"

# ---------------- CHATGPT CALL ----------------
def chatgpt_response(prompt):
    global last_url, face_shape_memory
    system_msg = (
        "You are Nova, a stylist assistant. "
        f"Current preview URL user is seeing: {last_url}. "
        "If user asks about appearance or if glasses suit them, respond accordingly. "
        "If they say 'show me' glasses, reply with a short sentence like 'Here are glasses for man', "
        "followed by exactly one keyword: man, woman, round, oval, square, rectangle, aviator, diamond, heart, metal, plastic."
    )
    if face_shape_memory:
        system_msg += f" Face shape: {face_shape_memory}."
    try:
        res = openai.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": system_msg},
                {"role": "user", "content": prompt}
            ]
        )
        return res.choices[0].message.content.strip()
    except Exception as e:
        print("OpenAI API error:", e)
        return "Sorry, I'm having trouble connecting to my brain. Please try again later."

# ---------------- TTS ----------------
async def speak(text, lang):
    voice = {"en": "en-IN-NeerjaNeural", "hi": "hi-IN-SwaraNeural"}.get(lang, "en-IN-NeerjaNeural")
    try:
        tts = edge_tts.Communicate(text, voice)
        await tts.save(MP3_OUTPUT_PATH)
        os.system(f"ffplay -nodisp -autoexit '{MP3_OUTPUT_PATH}' > /dev/null 2>&1")
    except Exception as e:
        print("TTS failed:", e)

# ---------------- URL KEYWORD HANDLING ----------------
def extract_sku_keywords(text):
    text = text.lower()
    return [k for k in KEYWORD_URLS if k in text]

def open_dynamic_url(keywords):
    global last_url
    if not keywords:
        print("No keyword found. Re-opening last URL.")
        return
    keyword = keywords[0]
    new_url = KEYWORD_URLS.get(keyword)
    if new_url and new_url != last_url:
        last_url = new_url
        print("Opening new target URL:", last_url)
        os.system("pkill -f 'chromium-browser' || true")
        time.sleep(1)
        os.system(f"chromium-browser --kiosk --incognito --disk-cache-dir=/dev/null --disable-logging --noerrdialogs --app='{last_url}' > /dev/null 2>&1 &")
    else:
        print("Keyword already shown or unknown.")

# ---------------- UI PLACEHOLDERS ----------------
def show_blank_loading_page(message):
    with open(BLANK_HTML, "w") as f:
        f.write(f"""
<html><body style='background:black;color:white;font-size:3em;display:flex;flex-direction:column;align-items:center;justify-content:center;height:100vh;text-align:center;'>
<div>{message}</div>
<div style='border:6px solid white;border-top:6px solid transparent;border-radius:50%;width:40px;height:40px;margin-top:20px;animation:spin 1s linear infinite;'></div>
<style>@keyframes spin{{0%{{transform:rotate(0)}}100%{{transform:rotate(360deg)}}}}</style>
</body></html>
""")
    os.system("pkill -f 'chromium-browser' || true")
    os.system(f"chromium-browser --kiosk --incognito --disk-cache-dir=/dev/null --disable-logging --noerrdialogs 'file://{BLANK_HTML}' > /dev/null 2>&1 &")

def display_response_text(reply):
    with open(BLANK_HTML, "w") as f:
        f.write(f"""
<html><body style='background:black;color:white;font-size:3em;display:flex;align-items:center;justify-content:center;height:100vh;text-align:center;'>
<div id='response'></div>
<script>
const text = `{reply}`;
let i = 0;
function typeEffect() {{
    if (i < text.length) {{
        document.getElementById("response").innerHTML += text.charAt(i);
        i++;
        setTimeout(typeEffect, 80);
    }}
}}
typeEffect();
</script>
</body></html>
""")
    os.system("pkill -f 'chromium-browser' || true")
    time.sleep(1)
    os.system(f"chromium-browser --kiosk --incognito --disk-cache-dir=/dev/null --disable-logging --noerrdialogs 'file://{BLANK_HTML}' > /dev/null 2>&1 &")

# ---------------- MAIN LOOP ----------------
def main():
    global face_shape_memory
    print("Starting system...")
    open_dynamic_url(["man"])  # start with a default URL
    while True:
        start_audio_recording()
        time.sleep(5)  # record for 5 seconds
        stop_audio_recording(AUDIO_FILE)
        text, lang = speech_to_text(AUDIO_FILE)
        if not text:
            continue
        print(f"User said: {text}")
        if "photo" in text.lower() or "picture" in text.lower():
            if capture_image():
                face_shape = detect_face_shape(IMAGE_FILE)
                if face_shape:
                    face_shape_memory = face_shape
                    print(f"Detected face shape: {face_shape}")
                else:
                    face_shape_memory = None
        reply = chatgpt_response(text)
        print(f"Assistant: {reply}")
        display_response_text(reply)
        asyncio.run(speak(reply, lang))
        keywords = extract_sku_keywords(reply)
        open_dynamic_url(keywords)

if __name__ == "__main__":
    main()

#include <WiFi.h>
#include <Adafruit_MQTT.h>
#include <Adafruit_MQTT_Client.h>
#include <ArduinoJson.h>
#include "TFT_eSPI.h"

#define EPAPER_ENABLE

#ifdef EPAPER_ENABLE
EPaper epaper;
#endif

// WiFi Credentials
#define WLAN_SSID       ""
#define WLAN_PASS       ""

// MQTT (Adafruit IO)
#define AIO_SERVER      "io.adafruit.com"
#define AIO_SERVERPORT  1883
#define AIO_USERNAME    ""
#define AIO_KEY         ""

WiFiClient client;
Adafruit_MQTT_Client mqtt(&client, AIO_SERVER, AIO_SERVERPORT, AIO_USERNAME, AIO_KEY);
Adafruit_MQTT_Subscribe orderFeed = Adafruit_MQTT_Subscribe(&mqtt, AIO_USERNAME "/feeds/QR CODE");

void connectWiFi() {
  Serial.print("Connecting to WiFi...");
  WiFi.begin(WLAN_SSID, WLAN_PASS);
  while (WiFi.status() != WL_CONNECTED) {
    Serial.print(".");
    delay(500);
  }
  Serial.println(" Connected!");
  Serial.println(WiFi.localIP());
}

void connectMQTT() {
  Serial.print("Connecting to MQTT...");
  while (!mqtt.connected()) {
    if (mqtt.connect()) {
      Serial.println(" Connected!");
    } else {
      Serial.print(" Failed: ");
      Serial.println(mqtt.connectErrorString(mqtt.connect()));
      delay(1000);
    }
  }
}

void showOrder(String orderID, String name, String desc, String price, String time) {
  Serial.println("=== Order Received ===");
  Serial.println("Token ID  : " + orderID);
  Serial.println("Name      : " + name);
  Serial.println("Desc      : " + desc);
  Serial.println("Price     : " + price);
  Serial.println("Time      : " + time);
  Serial.println("======================");

#ifdef EPAPER_ENABLE
  epaper.fillScreen(TFT_WHITE);

  // Header
  epaper.fillRect(0, 0, epaper.width(), 60, TFT_BLACK);
  epaper.setTextColor(TFT_WHITE);
  epaper.setTextSize(3);
  epaper.setCursor(20, 15);
  epaper.print("GLASSES ORDER");

  // Token Box
  epaper.setTextColor(TFT_BLACK);
  epaper.drawRect(20, 80, epaper.width() - 40, 60, TFT_BLACK);
  epaper.setTextSize(2);
  epaper.setCursor(30, 100);
  epaper.print("Token ID: " + orderID);

  // Info Box
  epaper.drawRect(20, 160, epaper.width() - 40, 100, TFT_BLACK);
  epaper.setCursor(30, 180);
  epaper.print("Name : " + name);
  epaper.setCursor(30, 200);
  epaper.print("Desc : " + desc);
  epaper.setCursor(30, 220);
  epaper.print("Price: " + price);

  // Footer
  epaper.setCursor(30, 240);
  epaper.print("Time : " + time);

  epaper.update();
#endif
}

void setup() {
  Serial.begin(115200);
  delay(1000);

#ifdef EPAPER_ENABLE
  epaper.begin();
#endif

  connectWiFi();
  mqtt.subscribe(&orderFeed);
  connectMQTT();
}

void reconnectMQTT() {
    int8_t ret;
    
    Serial.print("Connecting to MQTT... ");
    
    uint8_t retries = 3;
    while ((ret = mqtt.connect()) != 0) {
        Serial.println(mqtt.connectErrorString(ret));
        Serial.println("Retrying MQTT connection...");
        mqtt.disconnect();
        delay(5000);  // Wait 5 seconds
        retries--;
        if (retries == 0) {
            Serial.println("MQTT connection failed, will try again later");
            return;
        }
    }
    
    Serial.println("MQTT connected!");
    
    // Re-subscribe to ensure we receive updates
    
}

void loop() {
    if (!mqtt.connected()) {
        reconnectMQTT();
    }

  Adafruit_MQTT_Subscribe *subscription;
  while ((subscription = mqtt.readSubscription(5000))) {
    if (subscription == &orderFeed) {
      Serial.println("MQTT Data Received");

      StaticJsonDocument<512> doc;
      DeserializationError err = deserializeJson(doc, (char *)orderFeed.lastread);

      if (err) {
        Serial.print("JSON Error: ");
        Serial.println(err.c_str());
        return;
      }

      if (doc["cmd"] == "print") {
        String orderID  = doc["orderID"] | "??";
        String name     = doc["name"] | "??";
        String desc     = doc["description"] | "??";
        String price    = doc["price"] | "??";
        String time     = doc["ts"]["display"] | "??";

        showOrder(orderID, name, desc, price, time);
      } else {
        Serial.println("Non-print cmd ignored");
      }
    }
  }
}

#define BUTTON_PIN D1      // GPIO1 (D1 on XIAO)
#define LED_PIN LED_BUILTIN  // Built-in LED pin

bool buttonPreviouslyPressed = false;

void setup() {
  pinMode(BUTTON_PIN, INPUT_PULLUP);   // Internal pull-up resistor
  pinMode(LED_PIN, OUTPUT);            // LED output
  digitalWrite(LED_PIN, LOW);          // Start with LED off
  Serial.begin(115200);
}

void loop() {
  bool buttonPressed = (digitalRead(BUTTON_PIN) == LOW);

  if (buttonPressed && !buttonPreviouslyPressed) {
    Serial.println("CAPTURE");           // Notify Pi to capture image + start audio
    digitalWrite(LED_PIN, LOW);         // Turn LED ON
    buttonPreviouslyPressed = true;
  }

  if (!buttonPressed && buttonPreviouslyPressed) {
    Serial.println("RELEASE");           // Notify Pi to stop audio + send to ChatGPT
    digitalWrite(LED_PIN, HIGH);          // Turn LED OFF
    buttonPreviouslyPressed = false;
  }

  delay(10);  // Simple debounce
}