Harry H. Zhang

I am a graduate student in the SPARK Lab of MIT LIDS. I am extremely fortunate to be advised by Prof. Luca Carlone.

Prior to MIT, I was a MS-Research student in the CMU Robotics Institute studying Artificial Intelligence and Robotics, advised by Prof. David Held. I also worked at Amazon as an Applied Scientist II.

Prior to CMU, I earned my B.S. (2017-2021) with Honors from UC Berkeley with a major in EECS and a minor in Mechanical Engineering. During my time at Berkeley, I did research under Prof. Ken Goldberg and Dr. Jeffrey Ichnowski in AUTOLab. I maintain and curate a popular deep reinforcement learning tutorial on my Github.

Outside of school, I do quantitative finance.

Email  /  CV  /  LinkedIn  /  Google Scholar  /  Github  /  Twitter

profile photo
News and Updates

In reverse chronological order:

  • Jan. 2024: Multi-model fitting paper accepted to ICRA, see you in Yokohama!
  • Oct. 2023: DiffCLIP accepted to WACV, see you in Hawaii!
  • Aug. 2023: FlowBot++ accepted to CoRL, see you in ATL!
  • Sep. 2022: TAX-Pose accepted to CoRL, see you in New Zealand!
  • May 2022: I am joining Amazon this summer as an applied research scientist, working on 3D learning problems.
  • Apr. 2022: FlowBot3D accepted to RSS, see you in NYC!
  • May. 2021: Won the Warren Y. Dere Award from UC Berkeley EECS.
  • Mar. 2021: Dynamic cable manipulation paper accepted to ICRA 2021.
  • Nov. 2020: Dynamic cable manipulation paper featured at Bay Area Robotics Symposium (BARS) hosted by Stanford.
  • Jun. 2020: Dex-Net AR got featured on VentureBeat.
  • Jun. 2020: Dex-Net AR got featured on Sohu.

Research Interests

My current research focuses on trustworthy autonomy. Specifically, I wish to make autonomous systems robust, certifiable, and trustworthy.

Peer-Reviewed Publications
Multi-Model 3D Registration: Finding Multiple Moving Objects in Cluttered Point Clouds
David Jin, Sushrut Karmalkar, Harry Zhang, Luca Carlone
Accepted to IEEE International Conference on Robotics and Automation (ICRA), 2024.
Arxiv | Code | Video

We investigate a variation of the 3D registration problem, named multi-model 3D registration. In the multi-model registration problem, we are given two point clouds picturing a set of objects at different poses (and possibly including points belonging to the background) and we want to simultaneously reconstruct how all objects moved between the two point clouds.

DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification
Sitian Shen, Zilin Zhu, Linqian Fan, Harry Zhang, Xinxiao Wu
Accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024.
Arxiv | Code | Video

We propose DiffCLIP, a new pre-training framework that incorporates stable diffusion with ControlNet to minimize the domain gap in the visual branch. Additionally, a style-prompt generation module is introduced for few-shot tasks in the textual branch.

FlowBot++: Learning Generalized Articulated Objects Manipulation via Articulation Projection
Harry Zhang, Benjamin Eisner, David Held
Accepted to Conference on Robot Learning (CoRL), 2023.
Arxiv | Code | Video | Open Review

We explore yet another novel method to perceive and manipulate 3D articulated objects that generalizes to enable the robot to articulate unseen classes of objects.

TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation
Brian Okorn*, Chu Er Pan*, Harry Zhang*, Benjamin Eisner*, David Held
Accepted to Conference on Robot Learning (CoRL), 2022 (* indicates equal contribution)
Arxiv | Code | Video | Open Review

We conjecture that the task-specific pose relationship between relevant parts of interacting objects is a generalizable notion of a manipulation task that can transfer to new objects. We call this task-specific pose relationship "cross-pose". We propose a vision-based system that learns to estimate the cross-pose between two objects for a given manipulation task.

FlowBot3D: Learning 3D Articulation Flow to Manipulate Articulated Objects
Benjamin Eisner*, Harry Zhang*, David Held
Accepted to Robotics Science and Systems (RSS), 2022 (* indicates equal contribution) - Long talk, Best Paper Award Finalist (Selection Rate 1.5%).
Arxiv | Code | Video | Berkeley CPAR Talk | MIT Technology Review China | Synced Review Sohu | CMU Research Highlights

We explore a novel method to perceive and manipulate 3D articulated objects that generalizes to enable the robot to articulate unseen classes of objects.

AVPLUG: Approach Vector Planning for Unicontact Grasping amid Clutter
Yahav Avigal*, Vishal Satish*, Harry Zhang, Huang Huang, Michael Danielczuk, Jeffrey Ichnowski, Ken Goldberg
Accepted to Conference on Automation Science and Engineering (CASE), 2021.
Arxiv | Code | Video

We present present AVPLUG: Approach Vector PLanning for Unicontact Grasping: an algorithm for efficiently finding the approach vector using an efficient oct-tree occupancy model and Minkowski sum computation to maximize information gain.

project image Robots of the Lost Arc: Self-Supervised Learning to Dynamically Manipulate Fixed-Endpoint Cables
Harry Zhang, Jeffrey Ichnowski, Daniel Seita, Jonathan Wang, Huang Huang, Ken Goldberg
Accepted to International Conference on Robotics and Automation (ICRA), 2021
Arxiv | Code | Bay Area Robotics Symposium Coverage | ICRA 2022 Deformable Object Manipulation Workshop

We propose a self-supervised learning framework that enables a UR5 robot to perform these three tasks. The framework finds a 3D apex point for the robot arm, which, together with a task-specific trajectory function, defines an arcing motion that dynamically manipulates the cable to perform tasks with varying obstacle and target locations.

Dex-Net AR: Distributed Deep Grasp Planning Using a Commodity Cellphone and Augmented Reality App
Harry Zhang, Jeffrey Ichnowski, Yahav Avigal, Joseph Gonzalez, Ion Stoica, Ken Goldberg
Accepted to International Conference on Robotics and Automation (ICRA), 2020
Arxiv | Code | Video | VentureBeat Coverage | Sohu Coverage (in Mandarin)

We present a distributed pipeline, Dex-Net AR, that allows point clouds to be uploaded to a server in our lab, cleaned, and evaluated by Dex-Net grasp planner to generate a grasp axis that is returned and displayed as an overlay on the object.

Orienting Novel Objects using Self-Supervised Rotation Estimation
Shivin Devgon, Jeffrey Ichnowski, Ashwin Balakrishna, Harry Zhang, Ken Goldberg
Accepted to Conference on Automation Science and Enigeering (CASE), 2020.
Arxiv | Code | Video

We present an algorithm to orient novel objects given a depth image of the object in its current and desired orientation.

Preprints
Self-Supervised Learning of Dynamic Planar Manipulation of Free-End Cables
Jonathan Wang*, Huang Huang*, Vincent Lim, Harry Zhang, Jeffrey Ichnowski, Daniel Seita, Yunliang Chen, Ken Goldberg
Preprint, in submission to International Conference on Robotics and Automation (ICRA), 2022.
Arxiv | Code | Video

We present an algorithm to train a robot to control free-end cables in a self-supervised fashion.

Safe Deep Model-Based Reinforcement Learning with Lyapunov Functions
Bobby Yan*, Harry Zhang*, Huang Huang*,
Preprint, 2022.
Arxiv | Code | Video

We introduce andexplore a novel method for adding safety constraints for model-based RL during training and policy learning.

Teaching

10-725: Graduate Convex Optimization
16-385: Computer Vision

CS 189: Introduction to Machine Learning

EE 127: Introduction to Convex Optimization

CS 188: Introduction to Artificial Intelligence

CS 170: Algorithms

ME C231A: Model Predictive Control


Website template from Jon Barron