Vision and Language Navigation in Continuous Environments

VLNCE trajectory gif

Vision and Language Navigation in Continuous Environments (VLN-CE) is an instruction-guided navigation task with crowdsourced instructions, realistic environments, and unconstrained agent navigation. The agent is given first-person (egocentric) vision and a human-generated instruction, such as "Go down the hall and turn left at the wooden desk. Continue until you reach the kitchen and then stop by the kettle". Using this input alone, the agent must take control actions (e.g. MOVE-FORWARD 0.25m, TURN-LEFT 15 degrees) to navigate to the goal. VLN-CE lifts assumptions of the original VLN task and aims to bring simulated agents closer to reality.

VLNCE setting

The VLN-CE codebase and baseline models are available at:

Jul 2020 — Paper accepted to ECCV 2020!
Dec 2019VLN-CE v1 dataset is now available in the official Habitat API!

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee
ECCV 2020 [Bibtex] [PDF] [Code]


Jacob Krantz
Oregon State University
Erik Wijmans
Georgia Tech & Facebook AI Research
Arjun Majumdar
Georgia Tech
Dhruv Batra
Georgia Tech & Facebook AI Research
Stefan Lee
Oregon State University


We thank Anand Koshy for his implementation of the dynamic time warping metric. The Georgia Tech effort was supported in part by NSF, AFRL, DARPA, ONR YIPs, ARO PECASE, Amazon. The Oregon State effort was supported in part by DARPA. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government, or any sponsor.

Email —