Technical Exploration of JD.Vision on Apple Vision Pro: Spatial Computing, 3D Product Display, Custom Gestures, and Shader Optimization
This article details JD.Vision's development for Apple Vision Pro, covering the device's spatial‑computing capabilities, 3D product visualization, custom hand‑gesture recognition using AI models, shader customization, and performance optimizations for immersive AR shopping experiences.
01. Differences Between Vision Pro and Previous Headsets
Apple Vision Pro, released in China on June 28, introduces true spatial‑computing with ultra‑low‑latency video‑see‑through (VST) at 12 ms, high‑precision eye‑tracking for hand‑eye control, and a dedicated "space computing" paradigm that digitizes the real environment for immersive interaction.
02. Technical Exploration of JD.Vision on Vision Pro
JD.Vision leverages Vision Pro's spatial‑computing to let users drag 1:1 scale 3D models of appliances and gadgets into their real rooms for realistic layout previews. Development challenges include adapting to visionOS’s infinite‑canvas UI, handling limited native 3D examples, and extending the platform with custom gestures, collision handling, and component systems.
The app uses three content containers—Windows, Volumes, and Spaces. Windows host the home UI, while RealityView loads 3D models for dynamic display. Coordinate transformations between image, camera, and world spaces are managed via SwiftUI’s CoordinateSpaceProtocol and RealityKit’s RealityCoordinateSpace .
Virtual‑Real Fusion Applications
Vision Pro’s cameras, LiDAR, and M2/R1 chips enable high‑precision environment mapping, allowing virtual products to be placed, rotated, and scaled on real surfaces. Collision detection uses simple shapes (boxes, spheres, capsules) with physics properties to simulate realistic interactions between virtual items and the physical world.
Custom Gesture Recognition
Beyond Apple’s built‑in tap, pinch, zoom, and rotate gestures, JD.Vision adds AI‑driven custom gestures. Hand‑tracking provides 25 key points per hand; these are fed into rule‑based systems, DNNs, and LSTM networks to recognize dynamic gestures for precise 3D object manipulation.
Custom Shaders
Powered by the M2 and R1 chips, the team creates custom shaders via Composer Shadergraph to render special material effects, collision‑triggered mesh changes, and UI overlays that remain undistorted across varying model sizes.
Spatial Computing Optimizations
To handle the extra dimension of data, the app dynamically adjusts model polygon counts, compresses assets with Reality Composer Pro, and employs pre‑loading, lazy loading, and caching strategies to maintain smooth frame rates and low latency in an "unbounded" 3D scene.
03. Future Exploration Directions
Future work will expand 3D content, introduce depth‑video resources, and develop features such as 3D scene search, intelligent recommendation, and virtual try‑on, further enhancing the immersive shopping experience as Vision Pro matures.
04. References
[1] Andrei, Constantin‑Octavian. “3D affine coordinate transformations.” (2006). [2] A novel hybrid bidirectional unidirectional LSTM network for dynamic hand gesture recognition with Leap Motion. [3] Dynamic Hand Gesture Recognition Based on Short‑Term Sampling Neural Networks. [4] https://www.cnblogs.com/ghjnwk/p/10852264.html [5] https://developer.mozilla.org/en-US/docs/Games/Techniques/3D_collision_detection [6] https://developer.apple.com/documentation/realitykit/ [7] https://github.com/apple/ARKitScenes [8] https://developer.apple.com/documentation/arkit/arkit_in_ios/configuration_objects/understanding_world_tracking
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.