Vision and Language Navigation in Continuous Environments

Vision and Language Navigation in Continuous Environments (VLN-CE) is an instruction-guided navigation task with crowdsourced instructions, realistic environments, and unconstrained agent navigation. The agent is given first-person (egocentric) vision and a human-generated instruction, such as "Go down the hall and turn left at the wooden desk. Continue until you reach the kitchen and then stop by the kettle". Using this input alone, the agent must take control actions (e.g. MOVE-FORWARD 0.25m, TURN-LEFT 15 degrees) to navigate to the goal. VLN-CE lifts assumptions of the original VLN task and aims to bring simulated agents closer to reality.

The VLN-CE codebase and baseline models are available at:
github.com/jacobkrantz/VLN-CE

The test server and leaderboard is live on EvalAI:
eval.ai/web/challenges/challenge-page/719

Jan 2021 — The public leaderboard is now live on EvalAI!

Jul 2020 — Paper accepted to ECCV 2020!

Apr 2020 — PyTorch code for training and evaluating VLN-CE models is now available!

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee

ECCV 2020 [Bibtex] [PDF] [Code]

Video Examples

People

Jacob Krantz
Oregon State University

Erik Wijmans
Georgia Tech & Facebook AI Research

Arjun Majumdar
Georgia Tech

Dhruv Batra
Georgia Tech & Facebook AI Research

Stefan Lee
Oregon State University

Acknowledgements

We thank Anand Koshy for his implementation of the dynamic time warping metric. The Georgia Tech effort was supported in part by NSF, AFRL, DARPA, ONR YIPs, ARO PECASE, Amazon. The Oregon State effort was supported in part by DARPA. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government, or any sponsor.

Email — krantzja@oregonstate.edu