Best-in-class open omni-modal reasoning model delivers the highest efficiency and accuracy to power agentic workflows such as computer use, document intelligence and audio-video reasoning. AI agent systems today juggle separate models for vision, speech and language — losing time and context as they pass data from one model to the other. Unveiled today, NVIDIA Nemotron […]
Teaching AI to read a map
We propose a system for synthetic data generation to train AI systems to visually follow any route on any map, finally teaching language models to navigate our world. Look at a map of a shopping mall or a theme park. Within seconds, your brain processes the visual information, identifies your location, and traces the optimal […]