We present an MR-based multi-language learning support system for Apple Vision Pro that overlays translations, context explanations, and question prompts onto real-world materials. The display and interaction are handled on visionOS, while a macOS companion captures an external camera stream and hosts local LLMs for layout correction, translation, and explanation with a cloud fallback when requested. OCR is performed with Apple Vision; LLMs handle text post-processing and multilingual generation. The UI links commentary windows and source regions via gaze-based cross-highlighting, aiming to maintain users’ spatial context and reduce manual window management. We also describe the communication protocol between devices and privacy-first data handling. Finally, we outline latency bottlenecks and design trade-offs observed during iterative prototyping, and list concrete directions for gesture and window behaviors to be evaluated in future work.
Brief papers, short notes, research notes and others (publications of university or research institution)