Tomorrow's Innovations | AI Consulting Services

Introduction

Imagine asking your watch a question and getting an immediate answer without needing internet connectivity – that's the power of running a small language model (slm) on the device itself. In this three-part series, we will explore building a simple chatbot that runs entirely on an Apple Watch Ultra 2, using a small open-source language model.

By running the AI model locally on the watch (instead of a cloud server), we achieve on-device intelligence with compelling benefits:

Privacy: Your data stays on your wrist – no server processing required
Offline Access: Works anywhere, even without cellular or WiFi
Reduced Latency: No network round-trip means faster responses
Independence: No subscription services or API costs

AI On Your Wrist: The Power of Edge Computing Infographic

Click to expand

Click to view full-size infographic

What is Edge Computing?

Edge computing is a paradigm where computation happens locally on the "edge" device rather than in the cloud. Instead of sending your data to a remote server for processing, the device itself handles the workload.

For AI applications, this means running machine learning models directly on consumer hardware. Apple has been quietly building toward this future – their latest watches process Siri voice requests on-device when possible, thanks to the S9 chip's Neural Engine.

The Apple Watch Ultra 2: A Tiny Powerhouse

The Apple Watch Ultra 2's hardware makes on-device AI feasible:

Component	Specification	AI Relevance
Processor	S9 SiP, 64-bit dual-core CPU	Handles general compute tasks
Neural Engine	4-core dedicated ML accelerator	Speeds up model inference by 10-15x
RAM	~1GB	Must fit model + app + OS
Storage	64GB	Plenty of room for model files

The Neural Engine is the secret weapon here. Apple designed it specifically for machine learning tasks, and it can perform tensor operations far more efficiently than the CPU alone. When we deploy our chatbot, we'll target this hardware accelerator using CoreML.

The Challenge: Miniaturizing AI

Here's the catch – Large Language Models (LLMs) are typically resource-intensive. Models like GPT-5.1 have hundreds of billions of parameters and require enterprise-grade hardware. Even "small" language models (SLMs) like Llama 2-7B need 14GB of memory in half-precision.

On a tiny wearable with ~1GB of RAM, we need to be clever:

Use a small, optimized model (under 100M parameters)
Leverage efficient code and Apple's ML frameworks
Accept trade-offs in capability for portability

And when it comes to performance, well, think of this project as a proof of concept. I am not going into this with the expectation of reasonable tokens per second response times.

The good news? Enthusiasts have already demonstrated that a distilled GPT-2 model can run locally on Apple Watch Ultra 2 using CoreML. This proves feasibility – with careful optimization, you really can have a miniature chatbot on your wrist. Previous attempts at running a GPT-2 model on an Ultra 2 had challenges with token count. We will be diving deep into every layer of the model to understand exactly how tokens flow through the SLM of choice.

Defining Our Chatbot Function

Before diving into SLM model selection and implementation, let's outline the basic structure of our chatbot. At its core, the chatbot takes user input and generates a text response. Here's the conceptual flow in Swift pseudocode:

func chatbotReply(to userMessage: String) -> String {
    // 1. Tokenize: Convert input string to model-readable token IDs
    let inputTokens: [Int] = tokenize(userMessage)

    // 2. Inference: Run the language model to generate response tokens
    var responseTokens: [Int] = model.generateResponse(to: inputTokens)

    // 3. Detokenize: Convert output tokens back to human-readable text
    let responseText: String = detokenize(responseTokens)

    return responseText
}

// Example usage
let userQuestion = "Hello, how are you?"
let botAnswer = chatbotReply(to: userQuestion)
// Output: "Hello! I'm doing well. How can I help you today?"

This three-step pipeline is universal to all AI SLMs:

Tokenization: Breaking text into numerical tokens the model understands
Inference: Running the neural network forward pass
Detokenization: Converting the output numbers back to words

Input Methods on the Watch

You might wonder how users will input text on such a small screen. Several options exist:

Voice Dictation: Speak your question, let the watch transcribe it
Scribble: Draw letters on the screen
Paired iPhone: Type on your phone, send to watch
Pre-set Prompts: Quick-reply style buttons for common queries

For our implementation, we'll support all of these through the standard watchOS text input APIs. We will start with pre-set prompts for quick and easy testing.

Series Roadmap

Here's what we'll cover in this three-part series:

Part 1 (This Article): Introduction to edge computing and the project concept

Part 2: Choosing the right small language model – we'll compare candidates like GPT-2 Small, DistilGPT-2, and others, explaining why model selection is critical for success on constrained hardware.

Part 3: Step-by-step implementation – converting the model to CoreML, building the Swift app, and deploying to the Apple Watch Ultra 2.

Development Environment

For this project, we're using:

Development Machine: Mac Studio M3 Ultra (512GB unified memory)
IDE: VSCode with Cline extension for AI-assisted coding
Apple Tools: Xcode 15+ with watchOS 10 SDK
Version Control: The code will be made available through a GitHub repository
Target Device: Apple Watch Ultra 2

The Mac Studio's generous memory makes model conversion, fine tuning and testing smooth, even when working with multiple model variants.

What's Next

In Part 2, we'll dive deep into the world of small language models. We'll explore why DistilGPT-2 (82 million parameters) emerges as our top choice, and walk through the model conversion pipeline that transforms a PyTorch model into a watchOS-ready CoreML package.

Stay tuned – we're about to put AI on your wrist!

This is Part 1 of a three-part series on building an on-device AI chatbot for Apple Watch Ultra 2. Follow Tomorrow's Innovations for Parts 2 and 3.

Resources

Published on Tomorrow's Innovations | tomorrowsinnovations.co

This is Part 1 of a three-part series

First article in series

Part 2 coming soon

Running an AI Chatbot on Apple Watch Ultra 2: Part 1 - Edge Computing at Your Wrist