|
News and Updates
In reverse chronological order:
-
Jan. 2026: H2OFlow accepted to ICLR, see you in Rio!
-
May. 2025: CUPS accepted to ICML, see you in Vancouver!
-
Jan. 2025: Conformalized HPE paper accepted to ICLR, see you in Singapore!
-
Aug. 2023: FlowBot++ accepted to CoRL, see you in ATL!
-
Sep. 2022: TAX-Pose accepted to CoRL, see you in New Zealand!
-
Apr. 2022: FlowBot3D accepted to RSS, see you in NYC!
-
May. 2021: Won the Warren Y. Dere Award from UC Berkeley EECS.
-
Jun. 2020: Dex-Net AR got featured on VentureBeat.
-
Jun. 2020: Dex-Net AR got featured on Sohu.
|
|
Research Interests
My current research focuses on trustworthy AI and autonomous systems. Specifically, I design algorithms for machines to learn representations for more robust real-world generalization and better certifiability. My research revolves around the theme of learning-based perception systems and robotic systems.
|
|
Peer-Reviewed Publications
|
|
H2OFlow: Grounding Human-Object Affordances with 3D Generative Models and Dense Diffused Flows
Harry Zhang,
Luca Carlone
Accepted to International Conference on Learning Representations (ICLR), 2026.
Arxiv
We introduce
H2OFlow, a novel framework that comprehensively learns 3D HOI affordances —
encompassing contact, orientation, and spatial occupancy— using only synthetic
data generated from 3D generative models. H2OFlow employs a dense 3D-flowbased representation, learned through a dense diffusion process operating on point
clouds. This learned flow enables the discovery of rich 3D affordances without the
need for human annotations.
|
|
Max Entropy Moment Kalman Filter for Polynomial Systems with Arbitrary Noise
Sangli Teng,
Harry Zhang,
David Jin,
Ashkan Jasour,
Ram Vasudevan,
Maani Ghaffari,
Luca Carlone
Accepted to Conference on Neural Information Processing Systems (NeurIPS), 2025.
Arxiv
We model the noise in the process and observation model of nonlinear non-Gaussian systems
as Max-Entropy Distributions (MED). We propagate the moments through the process model
and recover the distribution as MED, thus avoiding symbolic integration, which is
generally intractable. All the steps in MEM-KF, including the extraction of a point
estimate, can be solved via convex optimization. We showcase the MEM-KF in
challenging robotics tasks, such as localization with unknown data association.
|
|
CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty
Harry Zhang,
Luca Carlone
Accepted to International Conference on Machine Learning (ICML), 2025.
Arxiv |
Video
We introduce CUPS, a novel method for learning sequence-
to-sequence 3D human shapes and poses from RGB videos
with uncertainty quantification. To improve on top of prior
work, we develop a method to score multiple hypothe-
ses proposed during training, effectively integrating uncer-
tainty into the learning process. This process results in a
deep uncertainty function that is trained end-to-end with the
3D pose estimator. Post-training, the learned deep uncer-
tainty model is used as the conformity score. Since the data in human
pose-shape learning is not fully exchangeable, we also pro-
vide two practical bounds for the coverage gap in confor-
mal prediction, developing theoretical backing for the un-
certainty bound of our model.
|
|
CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators
Harry Zhang,
Luca Carlone
Accepted to International Conference on Learning Representations (ICLR), 2025.
Arxiv |
Code |
Video
We introduce CHAMP, a novel method for learning sequence-to-sequence, multi-hypothesis 3D human poses from 2D keypoints by leveraging a conditional distribution with a diffusion model. To predict a single output 3D pose sequence, we generate and aggregate multiple 3D pose hypotheses. For better aggregation results, we develop a method to score these hypotheses during training, effectively integrating conformal prediction into the learning process. This process results in a differentiable conformal predictor that is trained end-to-end with the 3D pose estimator. Post-training, the learned scoring model is used as the conformity score, and the 3D pose estimator is combined with a conformal predictor to select the most accurate hypotheses for downstream aggregation.
|
|
CRISP: Object Pose and Shape Estimation with Test-Time Adaptation
Jingnan Shi,
Rajat Talak,
Harry Zhang,
David Jin,
Luca Carlone
Accepted to Conference on Computer Vision and Pattern Recognition (CVPR), 2025. Spotlight.
Arxiv |
Code |
Video
We consider the problem of estimating object pose and shape
from an RGB-D image. Our first contribution is to introduce
CRISP, a category-agnostic object pose and shape estimation pipeline. The pipeline implements an encoder-decoder
model for shape estimation. It uses FiLM-conditioning for
implicit shape reconstruction and a DPT-based network for
estimating pose-normalized points for pose estimation. As
a second contribution, we propose an optimization-based
pose and shape corrector that can correct estimation errors
caused by a domain gap.
|
|
Multi-Model 3D Registration: Finding Multiple Moving Objects in Cluttered Point Clouds
David Jin,
Sushrut Karmalkar,
Harry Zhang,
Luca Carlone
Accepted to IEEE International Conference on Robotics and Automation (ICRA), 2024.
Arxiv |
Code |
Video
We investigate a variation of the 3D registration
problem, named multi-model 3D registration. In the multi-model
registration problem, we are given two point clouds picturing a
set of objects at different poses (and possibly including points
belonging to the background) and we want to simultaneously
reconstruct how all objects moved between the two point clouds.
|
|
FlowBot++: Learning Generalized Articulated Objects Manipulation via Articulation Projection
Harry Zhang,
Benjamin Eisner,
David Held
Accepted to Conference on Robot Learning (CoRL), 2023.
Arxiv |
Code |
Video |
Open Review
We explore yet another novel method to perceive and manipulate 3D articulated objects that generalizes to enable the robot to articulate unseen classes of objects.
|
|
TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation
Brian Okorn*,
Chu Er Pan*,
Harry Zhang*,
Benjamin Eisner*,
David Held
Accepted to Conference on Robot Learning (CoRL), 2022 (* indicates equal contribution)
Arxiv |
Code |
Video |
Open Review
We conjecture that the task-specific pose relationship between relevant parts of interacting objects is a generalizable notion of a manipulation task that can transfer to new objects. We call this task-specific pose relationship "cross-pose". We propose a vision-based system that learns to estimate the cross-pose between two objects for a given manipulation task.
|
|
FlowBot3D: Learning 3D Articulation Flow to Manipulate Articulated Objects
Benjamin Eisner*,
Harry Zhang*,
David Held
Accepted to Robotics Science and Systems (RSS), 2022 (* indicates equal contribution) - Long talk, Best Paper Award Finalist (Selection Rate 1.5%).
Arxiv |
Code |
Video |
Berkeley CPAR Talk |
MIT Technology Review China |
Synced Review Sohu |
CMU Research Highlights
We explore a novel method to perceive and manipulate 3D articulated objects that generalizes to enable the robot to articulate unseen classes of objects.
|
|
AVPLUG: Approach Vector Planning for Unicontact Grasping amid Clutter
Yahav Avigal*,
Vishal Satish*,
Harry Zhang,
Huang Huang,
Michael Danielczuk,
Jeffrey Ichnowski,
Ken Goldberg
Accepted to Conference on Automation Science and Engineering (CASE), 2021.
Arxiv |
Code |
Video
We present present AVPLUG: Approach Vector PLanning for Unicontact Grasping: an algorithm for efficiently finding the approach vector using an efficient oct-tree occupancy model and Minkowski sum computation to maximize information gain.
|
|
Robots of the Lost Arc: Self-Supervised Learning to Dynamically Manipulate Fixed-Endpoint Cables
Harry Zhang,
Jeffrey Ichnowski,
Daniel Seita,
Jonathan Wang,
Huang Huang,
Ken Goldberg
Accepted to International Conference on Robotics and Automation (ICRA), 2021
Arxiv |
Code |
Bay Area Robotics Symposium Coverage |
ICRA 2022 Deformable Object Manipulation Workshop
We propose a self-supervised learning framework that enables a UR5 robot to perform these three tasks. The framework finds a 3D apex point for the robot arm, which, together with a task-specific trajectory function, defines an arcing motion that dynamically manipulates the cable to perform tasks with varying obstacle and target locations.
|
|
Dex-Net AR: Distributed Deep Grasp Planning Using a Commodity Cellphone and Augmented Reality App
Harry Zhang,
Jeffrey Ichnowski,
Yahav Avigal,
Joseph Gonzalez,
Ion Stoica,
Ken Goldberg
Accepted to International Conference on Robotics and Automation (ICRA), 2020
Arxiv |
Code |
Video |
VentureBeat Coverage |
Sohu Coverage (in Mandarin)
We present a distributed pipeline, Dex-Net AR, that allows point clouds to be uploaded to a server in our lab, cleaned, and evaluated by Dex-Net grasp planner to generate a grasp axis that is returned and displayed as an overlay on the object.
|
|
Orienting Novel Objects using Self-Supervised Rotation Estimation
Shivin Devgon,
Jeffrey Ichnowski,
Ashwin Balakrishna,
Harry Zhang,
Ken Goldberg
Accepted to Conference on Automation Science and Enigeering (CASE), 2020.
Arxiv |
Code |
Video
We present an algorithm to orient novel objects given a depth image of the object in its current and desired orientation.
|
|
Self-Supervised Learning of Dynamic Planar Manipulation of Free-End Cables
Jonathan Wang*,
Huang Huang*,
Vincent Lim,
Harry Zhang,
Jeffrey Ichnowski,
Daniel Seita,
Yunliang Chen,
Ken Goldberg
Preprint, in submission to International Conference on Robotics and Automation (ICRA), 2022.
Arxiv |
Code |
Video
We present an algorithm to train a robot to control free-end cables in a self-supervised fashion.
|
|
Safe Deep Model-Based Reinforcement Learning with Lyapunov Functions
Bobby Yan*,
Harry Zhang*,
Huang Huang*,
Preprint, 2022.
Arxiv |
Code |
Video
We introduce andexplore a novel method for adding safety constraints for model-based RL during training and policy learning.
|
|
10-725: Graduate Convex Optimization
16-385: Computer Vision
|
|
CS 189: Introduction to Machine Learning
EE 127: Introduction to Convex Optimization
CS 188: Introduction to Artificial Intelligence
CS 170: Algorithms
ME C231A: Model Predictive Control
|
|