Abstract
Learning rewards from expert videos offers an affordable and effective solution to specify the intended behaviors for reinforcement learning tasks. In this work, we propose Diffusion Reward, a novel framework that learns rewards from expert videos via conditional video diffusion models for solving complex visual RL problems. Our key insight is that lower generative diversity is observed when conditioned on expert trajectories. Diffusion Reward is accordingly formalized by the negative of conditional entropy that encourages productive exploration of expert-like behaviors. We show the efficacy of our method over 10 robotic manipulation tasks from MetaWorld and Adroit with visual input and sparse reward. Moreover, Diffusion Reward could even solve unseen tasks successfully and effectively, largely surpassing baseline methods.
Method Overview
Diffusion Reward Visualization (Real)
We evaluate Diffusion Reward on a real Franka arm attached with an Allegro hand, to pick up a bowl on the table. Videos are recorded by a RealSense D435i camera.
Diffusion Reward Visualization (Sim)
We show the learned reward curve of expert and random trajectories for each task with Diffusion Reward.
Main Results
We report the learning curves for our method and baselines on 7 gripper manipulation tasks from MetaWorld and 3 dexterous manipulation tasks from Adroit with image observations. Our method achieves prominent performance on all tasks, and significantly outperforms baselines on complex door and hammer tasks.
Zero-shot Generalization on Unseen Tasks
Diffusion Reward could generalize to unseen tasks directly and produce reasonable rewards, largely surpassing other baselines.
Citation
If you find this project helpful, please cite us: