|
|
We present an MR-based multi-language learning support system for Apple Vision Pro that overlays translations, context explanations, and question prompts onto real-world materials. The display and interaction are handled on visionOS, while a macOS companion captures an external camera stream and hosts local LLMs for layout correction, translation, and explanation with a cloud fallback when requested. OCR is performed with Apple Vision; LLMs handle text post-processing and multilingual generation. The UI links commentary windows and source regions via gaze-based cross-highlighting, aiming to maintain users’ spatial context and reduce manual window management. We also describe the communication protocol between devices and privacy-first data handling. Finally, we outline latency bottlenecks and design trade-offs observed during iterative prototyping, and list concrete directions for gesture and window behaviors to be evaluated in future work. |