没有合适的资源?快使用搜索试试~ 我知道了~
HAC探索通过分层强化学习加速探索_HAC Explore Accelerating Exploration with Hier
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 111 浏览量
2022-01-17
21:59:19
上传
评论
收藏 3.36MB PDF 举报
温馨提示
试读
18页
HAC探索通过分层强化学习加速探索_HAC Explore Accelerating Exploration with Hierarchical Reinforcement Learning.pdf
资源推荐
资源详情
资源评论
Distributional Depth-Based Estimation
of Object Articulation Models
Ajinkya Jain
∗
UT Austin
Stephen Giguere
†
UT Austin
Rudolf Lioutikov
†
Karlsruhe Institute of Technology
Scott Niekum
UT Austin
Abstract: We propose a method that efficiently learns distributions over articula-
tion model parameters directly from depth images without the need to know artic-
ulation model categories a priori. By contrast, existing methods that learn articu-
lation models from raw observations typically only predict point estimates of the
model parameters, which are insufficient to guarantee the safe manipulation of ar-
ticulated objects. Our core contributions include a novel representation for distri-
butions over rigid body transformations and articulation model parameters based
on screw theory, von Mises-Fisher distributions, and Stiefel manifolds. Combin-
ing these concepts allows for an efficient, mathematically sound representation
that implicitly satisfies the constraints that rigid body transformations and articu-
lations must adhere to. Leveraging this representation, we introduce a novel deep
learning based approach, DUST-net, that performs category-independent articula-
tion model estimation while also providing model uncertainties. We evaluate our
approach on several benchmarking datasets and real-world objects and compare
its performance with two current state-of-the-art methods. Our results demon-
strate that DUST-net can successfully learn distributions over articulation models
for novel objects across articulation model categories, which generate point esti-
mates with better accuracy than state-of-the-art methods and effectively capture
the uncertainty over predicted model parameters due to noisy inputs. [webpage]
Keywords: Articulated Objects, Model Learning, Uncertainty Estimation
1 Introduction
Articulated objects, such as drawers, staplers, refrigerators, and dishwashers, are ubiquitous in hu-
man environments. These objects consist of multiple rigid bodies connected via mechanical joints
such as hinge joints or slider joints. Robots in human environments will need to interact with these
objects often while assisting humans in performing day-to-day tasks. To interact safely with such
objects, a robot must reason about their articulation properties while manipulating them. An ideal
method for learning such properties might estimate these parameters directly from raw observations,
such as RGB-D images while requiring limited or no a priori information about the task. The ability
to additionally provide a confidence over the estimated properties, would allow such a method to be
leveraged in the development of safe motion policies for articulated objects [1].
The majority of existing methods to learn articulation models for objects from visual data either
need fiducial markers to track motion between object parts [2–5] or require textured objects [6–10].
Recent deep-learning based methods address this by predicting articulation properties for objects
from raw observations, such as depth images [11–14] or PointCloud data [15, 16]. However, the
majority of these methods [11, 12, 15, 16] require knowledge of the articulation model category for
the object (e.g., whether it has a revolute or prismatic joint) which may not be available in many
realistic settings. Alleviating this requirement, Jain et al. [14] introduced ScrewNet, which uses a
unified representation based on screw transformations to represent different articulation types and
performs category-independent articulation model estimation directly from raw depth images. How-
ever, ScrewNet [14] and related methods [11–13, 15, 16] only predict point estimates for an object’s
articulation model parameters. Nonetheless, reasoning about the uncertainty in the estimated param-
∗
†
Equal contribution, presented alphabetically
arXiv:2108.05875v2 [cs.RO] 25 Oct 2021
Figure 1: DUST-net uses a sequence of images I
1:n
to compute the parameters, Φ, of the conditional
distribution over the joint parameters S and configurations {θ, d}
1:n−1
. This distribution allows
for inference and reasoning, such as uncertainty and confidence, over both the parameters and the
configurations. Using a von Mises-Fisher distribution on a Stiefel manifold allows for an efficient
reparameterization that inherently obeys multiple constraints that define rigid body transformations.
eters can provide significant advantages for ensuring success in robot manipulation tasks, and allows
for further advancements such as robust planning [1], active learning using human queries [17], and
the learning of behavior policies that provide safety assurances [18]. Motivated by these advantages,
we propose a method for learning articulation models, which estimates the uncertainty over model
parameters using a novel distribution over the set of screw transformations based on the matrix von
Mises-Fisher distribution over Stiefel manifolds [19]. We introduce DUST-net, Deep Uncertainty
estimation on Screw Transforms-network, a novel deep learning-based method that, in addition to
providing point estimates of the object’s articulation model parameters, leverages raw depth images
to provide uncertainty estimates that can be used to guide the robot’s behavior without requiring to
knowledge of the object’s articulation model category a priori.
DUST-net garners numerous benefits over existing methods. First, DUST-net estimates articulation
properties for objects with uncertainty estimates, unlike most current methods [11–16]. These un-
certainty estimates, apart from helping robots to manipulate objects safely [1], could allow a robot
to take information-gathering actions when it is not confident and enhance its chances of success
in completing the task. Second, similar to ScrewNet [14], DUST-net can estimate model parame-
ters without the need to to know the articulation model category a priori, by leveraging the unified
representation for different articulation model types. Third, this unified representation helps DUST-
net to be more computationally and data-efficient than other state-of-the-art methods [11, 12], as
it uses a single network to estimate model parameters for all common articulation models, unlike
other methods that require a separate network for each articulation model category [11, 12, 15, 16].
Empirically, DUST-net outperforms other methods even when trained using only half the training
data in comparison. Fourth, the distributional learning setting yields more robustness to outliers and
noise. Fifth, DUST-net is able to reliably estimate distributions over articulation model parameters
for objects in the robot’s camera frame. By contrast, ScrewNet [14], the most closely related ap-
proach to ours, can only predict point estimates for articulation model parameters in the object’s
local frame.
We evaluate DUST-net through experiments on two benchmarking datasets: a simulated articulated
objects dataset [11] and the PartNet-Mobility dataset [20–22], as well as three real-world objects: a
microwave, a drawer, and a toaster oven. We compare DUST-net with two state-of-the-art methods,
namely ScrewNet [14] and an MDN-based method proposed by Abbatematteo et al. [11], as well
as two baseline methods. The experiments demonstrate that the samples drawn from the distribu-
tions learned by DUST-net result in significantly better estimates for articulation model parameters
in comparison to the point estimates predicted by other methods. Additionally, the experiments
show that DUST-net can successfully and accurately capture the uncertainty over articulation model
parameters resulting from noisy inputs.
2
2 Related Work
Articulation model estimation from visual observations: A widely used approach for estimating
articulation models is based on the probabilistic framework proposed by Sturm et al. [2]. It uses
the time-series observations of 6D poses of different parts of an articulated object to learn the re-
lationship between them [2, 5, 6, 10]. More recently, Abbatematteo et al. [11] and Li et al. [12]
proposed methods to learn articulation properties for objects from raw depth images given articu-
lation model category. In a related body of work on object parts mobility estimation, Wang et al.
[15] and Yan et al. [16] proposed approaches to segment different parts of the object in an input
point cloud and estimate their mobility relationships, given a known articulation model category.
Alleviating the requirement of having a known articulation model category, Jain et al. [14] recently
proposed ScrewNet that performs category-independent articulation model estimation from depth
images. However, these methods only predict point estimates for the articulation model parameters,
while DUST-net predicts a distribution over their values.
Rigid Body Pose Estimation: Our contributions are related to existing work on estimating distribu-
tions describing the orientation of rigid bodies. Gilitschenski et al. [23], Arun Srivatsan et al. [24],
Srivatsan et al. [25] and Rosen et al. [26] propose strategies that can be used to estimate the rigid
body transformation of an object using a combination of Bingham and Gaussian distributions, and
the von Mises-Fisher distribution, respectively. The mathematical model used by our approach is
inspired by these works, but 1) extends them to also represent uncertainty over the configuration of
articulated object components about screw axes, and 2) integrates them into a deep learning model
that is capable of learning these configurations from raw depth images. In addition, while these
approaches use distributions over orientations and rigid body transformations to produce estimates,
DUST-net directly outputs a distribution that can be used to facilitate further applications such as
uncertainty-aware behavior planning.
Interactive perception (IP): Katz and Brock [3] introduced IP as a method to leverage a robot’s
interaction with objects to generate a rich perceptual signal for articulation model estimation for
planar objects, and extended it to learn 3D articulation models for objects [4]. Mart
´
ın-Mart
´
ın et al.
[8] used hierarchical recursive Bayesian filters to make estimation more robust and developed online
methods for articulation model estimation from RGB images [7–9]. A comprehensive survey on
IP methods in robotics was presented by Bohg et al. [27]. While IP presents a powerful tool for
estimating articulation properties for objects, a wide majority of existing IP methods require textured
objects, unlike DUST-net, which learns these properties using depth images.
Further approaches: Articulation motion models can be viewed as geometric constraints imposed
on multiple rigid bodies. Such constraints can be learned from human demonstrations by leveraging
different sensing modalities [13, 28–31]. Recently, Daniele et al. [30] proposed a multimodal learn-
ing framework that incorporates both vision and natural language information for articulation model
estimation. However, these approaches predict point estimates for the articulation model parameters,
unlike DUST-net, which predicts a distribution over the articulation model parameters.
3 Problem Formulation:
Given a sequence of n depth images I
1:n
of motion between two parts of an articulated object, we
estimate the parameters of a probability distribution p(φ|I
1:n
) representing uncertainty over the pa-
rameters φ of the articulation model M governing the motion between the two parts. Following
Jain et al. [14], we define the model parameters φ as the parameters of the screw axis of motion,
S = (l, m), where both l and m are elements of R
3
. This unified parameterization can be used in ar-
ticulation models with at most one degree-of-freedom (DoF), namely rigid, revolute, prismatic, and
helical [14]. Additionally, we estimate the parameters of a distribution p(q
1:n−1
|I
1:n
) representing
uncertainty over the configurations q
1:n−1
identifying the rigid body transformations between the
two parts in the given sequence of images I
1:n
under model M with parameters φ. Configurations
q
i
, i ∈ {1...n − 1} correspond to a set of tuples, q
i
= (θ
i
, d
i
), defining a rotation around and a
displacement along the screw axis S
3
. We assume that the relative motion between the two object
parts is determined by a single articulation model.
3
Please refer to the supplementary material for further details
3
4 Approach
Given a sequence of depth images I
1:n
of motion between two parts of an articulated object, DUST-
net estimates parameters of the joint probability distribution p(φ, q
1:n−1
|I
1:n
) representing uncer-
tainty over the articulation model parameters φ governing the motion between the two parts and
the observed configurations q
1:n−1
. When deciding how to learn this distribution, two goals arise.
While some parameters, such as the translation of an object part along a screw axis, are defined
on Euclidean space, the set of valid screw axes exhibits constraints that prevent standard distribu-
tions defined on R
6
from being applied without complicating the learning process. For example, a
standard representation for distributions over screw axes can be the product of a Bingham distribu-
tion over the line’s orientation and a multivariate normal distribution over its position in space [32].
However, this representation produces non-unique estimation targets. A rotation of θ about the
screw axis with orientation l results in the same transformation as a rotation of −θ about the screw
axis with orientation −l. Similarly, a displacement d along l results in the same transformation as
a displacement −d along −l. This leads to ambiguities in the targets in the estimation problem and
can hinder the performance of the trained estimator. By selecting a representation that accounts for
these symmetries, these non-unique estimation targets are removed. Second, once a suitable param-
eterization is chosen, we seek a parametric form for the joint distribution which can be learned by a
deep network.
First, we consider the problem of parameterizing the set of screw axes. As noted earlier, we define
the model parameter φ as the parameters of the screw axis of motion S = (l, m). However, this
parameterization requires that l has unit norm, and that l and m are orthogonal. To eliminate these
constraints, we rewrite the moment vector of a screw axis as m = kmk
ˆ
m, where kmk and
ˆ
m rep-
resent its magnitude and a unit vector along it respectively, and the Pl
¨
ucker coordinates for the screw
axis as S = (l,
ˆ
m, kmk). The Pl
¨
ucker coordinates can then be seen as an unconstrained point in the
space S := V
2,3
× R
+
, where (l,
ˆ
m) ∈ V
2,3
with V
2,3
denoting the Stiefel manifold of 2-frames in
R
3
and kmk ∈ R
+
with R
+
denoting the set of positive real numbers. The Stiefel manifold V
k,m
is the space whose points are sets of k orthonormal vectors in R
m
, called k-frames in R
m
(k ≤ m)
1
[19]. Consequently, because of the one-to-one mapping from elements of V
2,3
× R
+
to screw axes,
the non-unique estimation targets described above are eliminated. Based on this parametrization of
screw axes, we define the set of valid configuration parameters as follows. We restrict the range of
values for the rotation about the screw axis to be θ ∈ [0, 2π) and restrict the displacement along
the axis to be d ∈ R
+
. Note that these constraints do not reduce the representational power of the
screw transform (l, m, θ, d) to denote a general rigid body transform, but merely ensure a unique
representation.
Having described the parameterization of the set of screw axes and configurations, we now consider
the task of defining a joint probability distribution over their values. We propose to represent the
distribution over predicted screw axis parameters, p(S | I
1:n
) with S ∈ S, as a product of a matrix
von Mises-Fisher distribution F(·|3, F) defined on the Stiefel manifold V
2,3
1
and a truncated normal
distribution N
+
(·|µ, σ) with truncation interval [0, +∞) over R
+
. Formally,
p(S | I
1:n
) = p
l,
ˆ
m, kmk
I
1:n
, F, µ
m
, σ
2
m
= F ( l,
ˆ
m | 3, F) N
+
kmk | µ
m
, σ
2
m
, (1)
where F is a 3×2 matrix representing the parameters of the matrix von Mises-Fisher distribution over
V
2,3
, and µ
m
and σ
m
denote the mean and standard deviation of the truncated normal distribution.
Given the sequence of n images, we also wish to estimate the posterior over configurations q
1:n−1
=
{θ
1:n−1
, d
1:n−1
} corresponding to the rotations about and displacements along the screw axis S. We
define the joint posterior representing the uncertainty over the screw axis S and the configurations
{θ
1:n−1
, d
1:n−1
} about it as a product of the aforementioned distribution and a set of distributions
defined over the configuration parameters,
p(S, θ
1:n−1
, d
1:n−1
| I
1:n
, Φ) = p(S; F, µ
m
, σ
2
m
) Ψ(θ
1:n−1
; ψ) Υ(d
1:n−1
; υ) (2)
where Φ = {F, µ
m
, σ
2
m
, ψ, υ} is the set of parameters for the distribution and Ψ and Υ represent the
set of distributions having parameters ψ and υ over the configurations θ
1:n−1
and d
1:n−1
, respec-
tively. For the sake of brevity, we present further details on modeling assumptions in the supple-
mentary material (see Appendix B). In this work, we consider Ψ and Υ to be products of truncated
normal distributions such that Ψ =
Q
n−1
i=1
N
+
(θ
i
|M
i
θ
, σ
2
θ
) and Υ =
Q
n−1
i=1
N
+
(d
i
|M
i
d
, σ
2
d
) with
1
Please refer to the supplementary material for further details
4
剩余17页未读,继续阅读
资源评论
易小侠
- 粉丝: 6474
- 资源: 9万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 基于opencv+yolov8实现目标追踪及驻留时长统计源码.zip
- 水稻病害基于Yolov8算法优化目标检测识别与AI辅助决策python源码+模型+使用说明.zip
- 海尔618算价表_七海5.20_16.00xlsx(1)(2).xlsx
- WebCrawler.scr
- 【计算机专业毕业设计】大学生就业信息管理系统设计源码.zip
- YOLO 数据集:8种路面缺陷病害检测【包含划分好的数据集、类别class文件、数据可视化脚本】
- JAVA实现Modbus RTU或Modbus TCPIP案例.zip
- 基于YOLOv8的FPS TPS AI自动锁定源码+使用步骤说明.zip
- JAVA实现Modbus RTU或Modbus TCPIP案例.zip
- 基于yolov8+streamlit的火灾检测部署源码+模型.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功