In this tutorial, you’ll learn how to build AI Retrieval-Augmented Generation (RAG) agent on the NVIDIA Jetson AGX Orin Dev Kit using n8n, which is a powerful workflow automation platform. AI Agents use large language models (LLMs) like GPT to decide which tool to use dynamically, based on input and context.
Unlike traditional chatbots that rely solely on pre-trained knowledge, RAG agents dynamically retrieve and use relevant external information. This project implements a fully local RAG assistant on the NVIDIA Jetson AGX Orin 64GB development kit. The system leverages:
- Qwen/Qwen3-4B large language model served via SGLang for low-latency, high-throughput inference.
- Snowflake-Arctic-Embed2 (via Ollama) for efficient and accurate text embeddings.
- Qdrant as the local vector database to store and retrieve semantic embeddings.
- n8n as the workflow automation engine to orchestrate the RAG pipeline, including document processing, query handling, and response generation.
- Telegram as the real-time messenger, enabling users to interact with the RAG agent through a familiar chat environment.
- WireGuard VPN as the secure remote access without exposing services to the public internet.
Here's a brief overview of the scenario I would like to create:
Since my ISP does not provide a public IP address for my LAN, there are various third-party tools like Cloudflare Tunnel and ngrok for exposing local services, but they are not entirely safe for private data.
With WireGuard integrated, you can securely connect to your Nvidia Jetson device from anywhere. Queries sent via Telegram are routed through the Telegram API to the local RAG pipeline, processed by the Qwen3 model, and returned in real time, making this ideal for edge AI, privacy-sensitive workloads, and low-connectivity environments. The n8n web interface is also accessible from the internet, allowing users to troubleshoot any issues from anywhere.
Run n8n Using Dockern8n is a powerful, open-source workflow automation tool that will act as the brain of your AI agent, connecting Telegram, LLMs, and vector databases.
docker run -it --rm \
-p 5678:5678 \
-v ~/.n8n:/home/node/.n8n \
-e N8N_BASIC_AUTH_ACTIVE=true \
-e N8N_BASIC_AUTH_USER=admin \
-e N8N_BASIC_AUTH_PASSWORD=your_password \
n8nio/n8n
Open your browser and go to:
http://<jetson-ip>:5678
Log in with the credentials above.
💡 Tip: Change your_password
to a secure one for production use.
Start the SGLang Server (for LLM Inference)We'll use SGLang to serve a local LLM (e.g., Qwen3-4B) capable of tool calling.
python3 -m sglang.launch_server \
--model-path Qwen/Qwen3-4B \
--host 0.0.0.0 \
--port 30000 \
--mem-fraction-static 0.5 \
--tool-call-parser qwen25
✅ This server will handle dynamic tool selection and reasoning.
Function tool calling is a capability that enables LLMs to go beyond simple text generation by interacting with external tools and real-world applications.
Run Ollama for EmbeddingsOllama powers the embedding model used by the RAG system to convert text into vectors.
Pull the embedding model:
ollama pull snowflake-arctic-embed2
Set Ollama host and start server:
export OLLAMA_HOST=0.0.0.0
ollama serve
This model (snowflake-arctic-embed2
) will encode user queries and documents for similarity search in Qdrant.
Qdrant stores and retrieves vector embeddings efficiently.
Pull the Qdrant Docker image:
docker pull qdrant/qdrant
Then, run the service:
docker run -p 6333:6333 -p 6334:6334 \
-v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
qdrant/qdrant
The API is now available at:
http://<jetson-ip>:6333/dashboard
You can now store and query embeddings from n8n workflows.
Set Up WireGuard VPNTo securely access your Nvidia Jetson from outside your local network, bring up the WireGuard tunnel:
sudo wg-quick up wg0
This setup assumes you’ve already configured wg0.conf
in /etc/wireguard/
. In this example, I’m using the wg-easy project.
This approach provides secure remote access to SGLang, Qdrant, and other services running on the NVIDIA Jetson dev kit, without exposing them to the public internet or relying on third-party tools like cloudflare or ngrok.
Create a Telegram Bot via BotFatherUse Telegram’s BotFather to create a chatbot interface for your AI agent.
Open Telegram and search for @BotFather
Start a chat and send:
/newbot
- Follow prompts to:Choose a name (e.g., MyJetsonAIBot)Choose a username (ends with bot, e.g., MyJetsonAIBot)
- After creation, BotFather will send HTTP API token. Save this token securely!
The workflow connects all components:
- Data Ingestion: From websites/PDFs > HTML extraction > text splitting
- Vector Storage: Documents processed into embeddings > stored in Qdrant
- Query Handling: Telegram messages trigger the RAG pipeline
- AI Response: LLM generates answers using retrieved context > replies via Telegram
This workflow uses the NVIDIA Jetson AI Lab website as the RAG input, specifically from webpage: https://www.jetson-ai-lab.com/research.html
The JSON provided details the structure of the workflow:
{
"name": "RAG example",
"nodes": [
{
"parameters": {},
"id": "aef807ca-2036-4316-b776-a86bd5ffdf35",
"name": "When clicking ‘Test workflow’",
"type": "n8n-nodes-base.manualTrigger",
"position": [
-2640,
200
],
"typeVersion": 1
},
{
"parameters": {
"jsonMode": "expressionData",
"jsonData": "={{ $('Extract from HTML').item.json.text }}",
"options": {}
},
"id": "e33dad49-6bcc-457f-ba9b-e3ad42b56a56",
"name": "Default Data Loader",
"type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader",
"position": [
-1700,
400
],
"typeVersion": 1
},
{
"parameters": {
"chunkSize": 2000,
"chunkOverlap": {},
"options": {}
},
"id": "026a9342-83c0-4b0c-8e80-b8a835ed100a",
"name": "Recursive Character Text Splitter",
"type": "@n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter",
"position": [
-1720,
620
],
"typeVersion": 1
},
{
"parameters": {
"content": "## 1.1 Extract from HTML\n\n",
"height": 335,
"width": 640,
"color": 4
},
"id": "c2ffc4de-8e78-4de7-8a21-f260a4005b20",
"name": "Sticky Note",
"type": "n8n-nodes-base.stickyNote",
"position": [
-2680,
80
],
"typeVersion": 1
},
{
"parameters": {
"content": "## 2. Create Vector Store using selfhosted Qdrant\n",
"height": 691,
"width": 615,
"color": 3
},
"id": "239ae893-4cc2-46f1-b897-7cfabb232000",
"name": "Sticky Note1",
"type": "n8n-nodes-base.stickyNote",
"position": [
-2020,
80
],
"typeVersion": 1
},
{
"parameters": {
"content": "## 3.1 Use the chat trigger\n\n\n",
"height": 304,
"width": 547
},
"id": "ec796a8c-4ce1-4c91-8a8e-2575b26bbdc6",
"name": "Sticky Note2",
"type": "n8n-nodes-base.stickyNote",
"position": [
-1360,
80
],
"typeVersion": 1
},
{
"parameters": {
"content": "## 4. Local AI Agent\n\n\n",
"height": 669,
"width": 627,
"color": 5
},
"id": "91405c7c-b6ba-4ecd-a15f-3d176a6adcb1",
"name": "Sticky Note5",
"type": "n8n-nodes-base.stickyNote",
"position": [
-760,
80
],
"typeVersion": 1
},
{
"parameters": {
"operation": "extractHtmlContent",
"extractionValues": {
"values": [
{
"key": "text",
"cssSelector": "body"
}
]
},
"options": {}
},
"type": "n8n-nodes-base.html",
"typeVersion": 1.2,
"position": [
-2240,
200
],
"id": "6e23e029-9188-4bfc-9eb6-fcb32d085d7a",
"name": "Extract from HTML"
},
{
"parameters": {
"mode": "insert",
"qdrantCollection": {
"__rl": true,
"value": "test",
"mode": "id"
},
"options": {}
},
"id": "d4e54663-8453-4766-8b67-429c04a2878f",
"name": "Qdrant Vector Store",
"type": "@n8n/n8n-nodes-langchain.vectorStoreQdrant",
"position": [
-1860,
200
],
"typeVersion": 1,
"credentials": {
"qdrantApi": {
"id": "IOKo0hbR1y8P3Qmq",
"name": "QdrantApi account"
}
}
},
{
"parameters": {
"mode": "retrieve-as-tool",
"toolDescription": "Retrieve information about NVIDIA Jetson AI Lab and people.\n",
"qdrantCollection": {
"__rl": true,
"value": "test",
"mode": "list",
"cachedResultName": "test"
},
"topK": 20,
"includeDocumentMetadata": false,
"options": {}
},
"type": "@n8n/n8n-nodes-langchain.vectorStoreQdrant",
"typeVersion": 1.3,
"position": [
-420,
400
],
"id": "dc6a9b66-2afb-4abf-a737-777f1926b77c",
"name": "Qdrant Vector Store1",
"credentials": {
"qdrantApi": {
"id": "IOKo0hbR1y8P3Qmq",
"name": "QdrantApi account"
}
}
},
{
"parameters": {
"content": "# Local AI RAG Assistant with n8n on NVIDIA Jetson AGX Orin 64GB Dev Kit\n \n## Deploy Qwen/Qwen3-4B Using Sglang and Snowflake-Arctic-Embed2 with Ollama for Embeddings and Qdrant as the Vector Store",
"height": 915,
"width": 3200,
"color": 7
},
"id": "c36d57f3-9cec-4684-88c5-565e56f1b02d",
"name": "Sticky Note3",
"type": "n8n-nodes-base.stickyNote",
"position": [
-2720,
-100
],
"typeVersion": 1
},
{
"parameters": {
"sessionIdType": "customKey",
"sessionKey": "=chat_with_{{ $('Listen for incoming events').first().json.message.chat.id }}"
},
"type": "@n8n/n8n-nodes-langchain.memoryBufferWindow",
"typeVersion": 1.3,
"position": [
-580,
420
],
"id": "31b28a96-3ad2-4257-ab57-bceb85607e17",
"name": "Simple Memory"
},
{
"parameters": {
"options": {}
},
"type": "@n8n/n8n-nodes-langchain.chatTrigger",
"typeVersion": 1.1,
"position": [
-1180,
200
],
"id": "5d3257c6-94ea-4c09-a887-1fc4d2a2f4aa",
"name": "When chat message received",
"webhookId": "153df2fe-1039-4c5f-8c97-be5e16edcfe4",
"disabled": true
},
{
"parameters": {
"promptType": "define",
"text": "={{ $json.message.text }}",
"options": {
"systemMessage": "=/no_think You are an assistant working for a Nvidia Jetson AI Lab Research Group. Provide information about people using tool. \n\nYour primary goal is to provide precise, contextually relevant, and concise answers based on the tools and resources available.\n\n### TOOL\nUse the \"Retrieve information about NVIDIA Jetson AI Lab and people\" tool to:\n- perform semantic similarity searches and retrieve information about Nvidia Jetson AI Lab Research Group\n relevant to the user's query.\n- access detailed information about Nvidia Jetson AI Lab Research Group when additional context or specifics are required.\n\n### Key Instructions\n1. **Response Guidelines**:\n - Clearly explain how the retrieved information addresses the user's query, if applicable.\n - If no relevant information is found, respond with: \"I cannot find the answer in the available resources.\"\n\n2. **Focus and Relevance**:\n - Ensure all responses are directly aligned with the user's question.\n - Avoid including extraneous details or relying solely on internal knowledge.\n"
}
},
"id": "c6d22128-8508-4a35-bbec-ab271c7fc707",
"name": "AI Agent",
"type": "@n8n/n8n-nodes-langchain.agent",
"position": [
-640,
200
],
"typeVersion": 1.6
},
{
"parameters": {
"model": "snowflake-arctic-embed2:latest"
},
"type": "@n8n/n8n-nodes-langchain.embeddingsOllama",
"typeVersion": 1,
"position": [
-1920,
420
],
"id": "95974449-6510-4219-a12b-3d5351ca1cfd",
"name": "Embeddings snowflake-arctic-embed2",
"credentials": {
"ollamaApi": {
"id": "WuO7GZoCJoeRCgMl",
"name": "Ollama account"
}
}
},
{
"parameters": {
"model": "snowflake-arctic-embed2:latest"
},
"type": "@n8n/n8n-nodes-langchain.embeddingsOllama",
"typeVersion": 1,
"position": [
-480,
600
],
"id": "fd92a103-15b0-4537-b01d-a8f9318c207e",
"name": "Embeddings snowflake-arctic-embed",
"credentials": {
"ollamaApi": {
"id": "WuO7GZoCJoeRCgMl",
"name": "Ollama account"
}
}
},
{
"parameters": {
"model": {
"__rl": true,
"value": "Qwen/Qwen3-4B",
"mode": "list",
"cachedResultName": "Qwen/Qwen3-4B"
},
"options": {}
},
"type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
"typeVersion": 1.2,
"position": [
-720,
420
],
"id": "dac4409d-c43d-4893-b717-449466bd5e33",
"name": "SGLang",
"credentials": {
"openAiApi": {
"id": "wIBNk1p41L1TWzWu",
"name": "Local LLM"
}
}
},
{
"parameters": {
"url": "https://www.jetson-ai-lab.com/research.html",
"options": {}
},
"id": "5c01e26e-24f4-4551-bdd8-06e5cd19b088",
"name": "Get data from website",
"type": "n8n-nodes-base.httpRequest",
"position": [
-2440,
200
],
"typeVersion": 4.2
},
{
"parameters": {
"updates": [
"message"
],
"additionalFields": {}
},
"id": "081d1e46-8136-4a15-b3a9-979f6b18acb1",
"name": "Listen for incoming events",
"type": "n8n-nodes-base.telegramTrigger",
"position": [
-1180,
540
],
"webhookId": "322dce18-f93e-4f86-b9b1-3305519b7834",
"typeVersion": 1,
"credentials": {
"telegramApi": {
"id": "iuLi4kjMG1YACFTy",
"name": "Telegram account"
}
}
},
{
"parameters": {
"content": "## 3.2 Use the telegram bot\n\n\n\n",
"height": 344,
"width": 547
},
"id": "6d2ad98d-a651-47db-9c50-44236139936c",
"name": "Sticky Note4",
"type": "n8n-nodes-base.stickyNote",
"position": [
-1360,
420
],
"typeVersion": 1
},
{
"parameters": {
"chatId": "={{ $('Listen for incoming events').first().json.message.from.id }}",
"text": "={{ $('AI Agent').item.json.output.replace(/&/g, \"&\").replace(/>/g, \">\").replace(/</g, \"<\").replace(/\"/g, \""\") }}",
"additionalFields": {
"appendAttribution": false,
"parse_mode": "HTML"
}
},
"id": "c89c55ba-5d7a-4090-a0d4-b1a7db95bef5",
"name": "Correct errors",
"type": "n8n-nodes-base.telegram",
"position": [
180,
360
],
"typeVersion": 1.1,
"webhookId": "cd30e054-0370-4aef-b7bf-483c1e73c449",
"credentials": {
"telegramApi": {
"id": "iuLi4kjMG1YACFTy",
"name": "Telegram account"
}
}
},
{
"parameters": {
"chatId": "={{ $('Listen for incoming events').first().json.message.from.id }}",
"text": "={{ $json.output }}",
"additionalFields": {
"appendAttribution": false,
"parse_mode": "HTML"
}
},
"id": "06b704a1-ce7b-437d-950a-0e5ba4253ac1",
"name": "Telegram1",
"type": "n8n-nodes-base.telegram",
"position": [
-20,
200
],
"typeVersion": 1.1,
"webhookId": "df3e62fe-25fb-481f-a4ad-255a167818bb",
"credentials": {
"telegramApi": {
"id": "iuLi4kjMG1YACFTy",
"name": "Telegram account"
}
},
"onError": "continueErrorOutput"
},
{
"parameters": {
"content": "## 5. Telegram\n\n\n",
"height": 669,
"width": 487,
"color": 6
},
"id": "2228d980-da5a-4c8f-b44b-30d049153e7e",
"name": "Sticky Note6",
"type": "n8n-nodes-base.stickyNote",
"position": [
-80,
80
],
"typeVersion": 1
},
{
"parameters": {
"content": "## 1.2 Download PDF",
"height": 335,
"width": 640,
"color": 4
},
"id": "46e1bdca-0874-481d-ae51-1fe9272fb268",
"name": "Sticky Note7",
"type": "n8n-nodes-base.stickyNote",
"position": [
-2680,
440
],
"typeVersion": 1
},
{
"parameters": {
"formTitle": "Add your file here",
"formFields": {
"values": [
{
"fieldLabel": "File",
"fieldType": "file",
"acceptFileTypes": ".pdf",
"requiredField": true
}
]
},
"options": {}
},
"id": "53581c98-d112-469d-91aa-ef358761167f",
"name": "On form submission",
"type": "n8n-nodes-base.formTrigger",
"position": [
-2540,
580
],
"webhookId": "4e1e20d4-f759-42c8-8439-87b93f43aa7c",
"typeVersion": 2.2
}
],
"pinData": {},
"connections": {
"When clicking ‘Test workflow’": {
"main": [
[
{
"node": "Get data from website",
"type": "main",
"index": 0
}
]
]
},
"Default Data Loader": {
"ai_document": [
[
{
"node": "Qdrant Vector Store",
"type": "ai_document",
"index": 0
}
]
]
},
"Recursive Character Text Splitter": {
"ai_textSplitter": [
[
{
"node": "Default Data Loader",
"type": "ai_textSplitter",
"index": 0
}
]
]
},
"Extract from HTML": {
"main": [
[
{
"node": "Qdrant Vector Store",
"type": "main",
"index": 0
}
]
]
},
"Qdrant Vector Store1": {
"ai_tool": [
[
{
"node": "AI Agent",
"type": "ai_tool",
"index": 0
}
]
]
},
"Simple Memory": {
"ai_memory": [
[
{
"node": "AI Agent",
"type": "ai_memory",
"index": 0
}
]
]
},
"When chat message received": {
"main": [
[]
]
},
"Embeddings snowflake-arctic-embed2": {
"ai_embedding": [
[
{
"node": "Qdrant Vector Store",
"type": "ai_embedding",
"index": 0
}
]
]
},
"Embeddings snowflake-arctic-embed": {
"ai_embedding": [
[
{
"node": "Qdrant Vector Store1",
"type": "ai_embedding",
"index": 0
}
]
]
},
"SGLang": {
"ai_languageModel": [
[
{
"node": "AI Agent",
"type": "ai_languageModel",
"index": 0
}
]
]
},
"Qdrant Vector Store": {
"main": [
[]
]
},
"Get data from website": {
"main": [
[
{
"node": "Extract from HTML",
"type": "main",
"index": 0
}
]
]
},
"Telegram1": {
"main": [
[],
[
{
"node": "Correct errors",
"type": "main",
"index": 0
}
]
]
},
"Listen for incoming events": {
"main": [
[
{
"node": "AI Agent",
"type": "main",
"index": 0
}
]
]
},
"AI Agent": {
"main": [
[
{
"node": "Telegram1",
"type": "main",
"index": 0
}
]
]
},
"On form submission": {
"main": [
[
{
"node": "Qdrant Vector Store",
"type": "main",
"index": 0
}
]
]
}
},
"active": false,
"settings": {
"executionOrder": "v1"
},
"versionId": "a7cd7783-ffb3-421e-b0cd-a24b082136fb",
"meta": {
"templateCredsSetupCompleted": true,
"instanceId": "adc1857ea07e92ce7e9be3a75c7bb0fe7997ac27b02fe8d5b142a598aee663eb"
},
"id": "UPXF0jR4WYDNUYNS",
"tags": []
}
Telegram as the real-time messenger enabling users to interact with the RAG agent through a familiar chat environment. Here is an example output:
This setup transforms your NVIDIA Jetson AGX Orin into a fully offline, secure AI assistant capable of understanding context, retrieving relevant data, and interacting naturally through Telegram messenger.
References
Comments