Introduction
Large language models are evolving rapidly, and Apple's MLX framework gives Mac users a powerful way to run them natively on Apple Silicon. With its optimized GPU support and unified memory design, MLX can unlock performance that feels closer to running on dedicated accelerators.
In this series, I'll walk through how I converted the Hugging Face model rednote-hilab/dots.ocr
into MLX format on my Mac Studio M3 Ultra with 512 GB unified memory.
dots.ocr
is a vision-language model (VLM). In this first part, we'll focus on converting the Qwen2 language backbone — getting the text side running natively in MLX. In Part 2, I'll extend this into full OCR by adding the vision tower.
Why MLX?
Apple designed MLX to run AI models with tight integration to Apple Silicon's architecture. That means:
- GPU acceleration without custom CUDA installs.
- Unified memory that scales seamlessly across CPU/GPU/NPU.
- Lightweight Python APIs that feel familiar if you've worked with PyTorch or NumPy.
For Mac developers, this translates into less friction and more speed when experimenting with state-of-the-art models.
Setting Up The Environment
First set up the python environment that can handle the MLX conversion:
brew install python@3.12 /opt/homebrew/bin/python3.12 -m venv ~/venvs/mlx-dots-py312 source ~/venvs/mlx-dots-py312/bin/activate python -m pip install --upgrade pip python -m pip install torch safetensors transformers==4.51.0
Downloading the Model
I started by pulling down the Hugging Face model repository locally:
huggingface-cli download rednote-hilab/dots.ocr --local-dir ~/models/DotsOCR --local-dir-use-symlinks True
Then I made a copy dedicated to the text-only conversion:
rsync -a ~/models/DotsOCR/ ~/models/DotsOCR_textonly/
Preparing the Config
MLX recognizes models like LLaMA, Mistral, and Qwen — but it doesn't know about the custom dots_ocr
type. Since the backbone is Qwen2, I patched the config so MLX would treat it as such:
python - <<'PY' import json, pathlib p = pathlib.Path('~/models/DotsOCR_textonly/config.json').expanduser() cfg = json.loads(p.read_text()) cfg['model_type'] = 'qwen2' cfg['architectures'] = ['Qwen2ForCausalLM'] for k in ['vision_config','image_token_id','video_token_id']: cfg.pop(k, None) p.write_text(json.dumps(cfg, indent=2, ensure_ascii=False)) print("Patched", p) PY
This small edit made the model compatible with MLX's conversion tool.
Stripping Vision Weights
The original shards contained both language and vision weights. Since we're only targeting the text backbone here, I filtered out all vision-related tensors and merged the remaining Qwen2 weights into a single shard.
First we move the original safetensors files to a backup folder:
mkdir -p ~/models/DotsOCR_textonly/original_safetensors mv ~/models/DotsOCR_textonly/model-*.safetensors ~/models/DotsOCR_textonly/original_safeensors/ 2>/dev/null || true
Then move the safetensors files into the folder:
mv ~/models/DotsOCR_textonly/model-*.safetensors ~/models/DotsOCR_textonly/original_safetensors/ 2>/dev/null || true
Strip the vision tensors from the originals
(This reads the originals from original_safetensors/ and writes stripped shards into text_only_weights/)
python - <<'PY' from safetensors import safe_open from safetensors.torch import save_file from pathlib import Path root = Path('~/models/DotsOCR_textonly').expanduser() orig = root / 'original_safetensors' dst = root / 'text_only_weights' dst.mkdir(parents=True, exist_ok=True) def keep(k: str) -> bool: k = k.lower() drop = ( 'vision_tower', 'vision.', '.vision', 'visual.', 'image_proj', 'mm_projector', 'pixel', 'patch_embed', 'vision_proj', 'visiongrid', 'visionnorm' ) return not any(d in k for d in drop) any_written = False for shard in sorted(orig.glob('model-*.safetensors')): with safe_open(shard, framework="pt") as f: keys = [k for k in f.keys() if keep(k)] tensors = {k: f.get_tensor(k) for k in keys} out = (dst / shard.name).as_posix() if tensors: save_file(tensors, out) print(f"[STRIP] {shard.name}: kept {len(tensors)} tensors → {out}") any_written = True else: print(f"[STRIP] {shard.name}: kept 0 tensors (vision-only shard)") print("Done. Output dir:", dst if any_written else "No text tensors written.") PY
We see shard 1 kept 339 tensors and the vision only shard kept 0 tensors.
Now we merge all of the kept tensors into a single shard and write a new index:
python - <<'PY' from safetensors import safe_open from safetensors.torch import save_file from pathlib import Path import json, torch root = Path('~/models/DotsOCR_textonly').expanduser() src_dir = root / 'text_only_weights' out_name = 'model-00001-of-00001.safetensors' out_path = root / out_name # 1) Collect all kept tensors from text_only_weights/ all_tensors = {} for sf in sorted(src_dir.glob('model-*.safetensors')): with safe_open(sf, framework="pt") as f: for k in f.keys(): if k in all_tensors: raise RuntimeError(f"Duplicate tensor key across shards: {k}") all_tensors[k] = f.get_tensor(k) if not all_tensors: raise SystemExit("No text tensors found to merge. Did the strip step produce anything?") # 2) Save a single merged shard save_file(all_tensors, out_path.as_posix()) print(f"[MERGE] Wrote merged shard: {out_path} with {len(all_tensors)} tensors") # 3) Compute total_size for index metadata total_size = 0 for t in all_tensors.values(): total_size += t.element_size() * t.numel() index = { "metadata": {"total_size": total_size}, "weight_map": {k: out_name for k in all_tensors.keys()}, "format": "safetensors" } # 4) Write fresh index referencing exactly one shard idx_path = root / "model.safetensors.index.json" idx_path.write_text(json.dumps(index, indent=2)) print(f"[INDEX] Wrote {idx_path.name} (total_size={total_size})") # 5) Clean any old root shards (we'll keep only the merged one) for old in root.glob('model-*.safetensors'): if old.name != out_name: old.unlink() print("[CLEAN] removed old shard", old.name) print("[DONE] Single-shard layout ready.") PY
Now ~/models/DotsOCR_textonly/
should have:
model-00001-of-00001.safetensors
our merged text-only shard- Updated
model.safetensors.index.json
pointing to the new shard
Let's do a quick check:
ls -lh ~/models/DotsOCR_textonly
Conversion to MLX
With the patched config and text-only weights, conversion is straightforward:
First we be sure we have the latest mlx-lm:
python -m pip install --upgrade mlx-lm
Now let's run the mlx conversion:
python -m mlx_lm convert --hf-path ~/models/DotsOCR_textonly --mlx-path ~/mlx-checkpoints/dotsocr-text -q
The -q
flag enabled quantization, reducing memory usage while preserving performance. On my M3 Ultra, this isn't necessary.
Sanity Check
To verify everything worked, I generated text with the converted model:
python -m mlx_lm generate \ --model ~/mlx-checkpoints/dotsocr-text \ --prompt "You are a helpful assistant.\nUser: Say hello in one short sentence.\nAssistant:" \ --max-tokens 64 \ --temp 0.7 \ --top-p 0.9
The model responds with a coherent continuation — confirming that the Qwen2 text backbone of Dots-OCR is alive and running in MLX.
What We Have
At this point, we've successfully:
- Converted
dots.ocr
into a text-only MLX model. - Verified it runs natively and efficiently on Apple Silicon.
What's missing? The vision tower. Without it, the model won't yet handle images for OCR. That's exactly what Part 2 will address.
Coming Up in Part 2
In the next post, we'll bring the vision tower into MLX by:
- Porting
DotsVisionTransformer
into MLX. - Converting the vision weights from the original shards.
- Connecting vision embeddings into the Qwen2 text stack.
- Running end-to-end OCR on real images.
With the text backbone already working, the next step is to unlock full multimodal OCR performance on Apple Silicon.
Stay tuned for Part 2.
✍️ Written and tested on a Mac Studio M3 Ultra with 512 GB unified memory.