HAC探索通过分层强化学习加速探索_HACExploreAcceleratingExplorationwithHier资源-CSDN文库

版权申诉

111 浏览量 2022-01-17 21:59:19 上传评论收藏 3.36MB PDF 举报

资源推荐

资源详情

资源评论

Distributional Depth-Based Estimation

of Object Articulation Models

Ajinkya Jain

∗

UT Austin

Stephen Giguere

†

UT Austin

Rudolf Lioutikov

†

Karlsruhe Institute of Technology

Scott Niekum

UT Austin

Abstract: We propose a method that efﬁciently learns distributions over articula-

tion model parameters directly from depth images without the need to know artic-

ulation model categories a priori. By contrast, existing methods that learn articu-

lation models from raw observations typically only predict point estimates of the

model parameters, which are insufﬁcient to guarantee the safe manipulation of ar-

ticulated objects. Our core contributions include a novel representation for distri-

butions over rigid body transformations and articulation model parameters based

on screw theory, von Mises-Fisher distributions, and Stiefel manifolds. Combin-

ing these concepts allows for an efﬁcient, mathematically sound representation

that implicitly satisﬁes the constraints that rigid body transformations and articu-

lations must adhere to. Leveraging this representation, we introduce a novel deep

learning based approach, DUST-net, that performs category-independent articula-

tion model estimation while also providing model uncertainties. We evaluate our

approach on several benchmarking datasets and real-world objects and compare

its performance with two current state-of-the-art methods. Our results demon-

strate that DUST-net can successfully learn distributions over articulation models

for novel objects across articulation model categories, which generate point esti-

mates with better accuracy than state-of-the-art methods and effectively capture

the uncertainty over predicted model parameters due to noisy inputs. [webpage]

Keywords: Articulated Objects, Model Learning, Uncertainty Estimation

1 Introduction

Articulated objects, such as drawers, staplers, refrigerators, and dishwashers, are ubiquitous in hu-

man environments. These objects consist of multiple rigid bodies connected via mechanical joints

such as hinge joints or slider joints. Robots in human environments will need to interact with these

objects often while assisting humans in performing day-to-day tasks. To interact safely with such

objects, a robot must reason about their articulation properties while manipulating them. An ideal

method for learning such properties might estimate these parameters directly from raw observations,

such as RGB-D images while requiring limited or no a priori information about the task. The ability

to additionally provide a conﬁdence over the estimated properties, would allow such a method to be

leveraged in the development of safe motion policies for articulated objects [1].

The majority of existing methods to learn articulation models for objects from visual data either

need ﬁducial markers to track motion between object parts [2–5] or require textured objects [6–10].

Recent deep-learning based methods address this by predicting articulation properties for objects

from raw observations, such as depth images [11–14] or PointCloud data [15, 16]. However, the

majority of these methods [11, 12, 15, 16] require knowledge of the articulation model category for

the object (e.g., whether it has a revolute or prismatic joint) which may not be available in many

realistic settings. Alleviating this requirement, Jain et al. [14] introduced ScrewNet, which uses a

uniﬁed representation based on screw transformations to represent different articulation types and

performs category-independent articulation model estimation directly from raw depth images. How-

ever, ScrewNet [14] and related methods [11–13, 15, 16] only predict point estimates for an object’s

articulation model parameters. Nonetheless, reasoning about the uncertainty in the estimated param-

∗

Corresponding author: [email protected]

†

Equal contribution, presented alphabetically

arXiv:2108.05875v2 [cs.RO] 25 Oct 2021

Figure 1: DUST-net uses a sequence of images I

1:n

to compute the parameters, Φ, of the conditional

distribution over the joint parameters S and conﬁgurations {θ, d}

1:n−1

. This distribution allows

for inference and reasoning, such as uncertainty and conﬁdence, over both the parameters and the

conﬁgurations. Using a von Mises-Fisher distribution on a Stiefel manifold allows for an efﬁcient

reparameterization that inherently obeys multiple constraints that deﬁne rigid body transformations.

eters can provide signiﬁcant advantages for ensuring success in robot manipulation tasks, and allows

for further advancements such as robust planning [1], active learning using human queries [17], and

the learning of behavior policies that provide safety assurances [18]. Motivated by these advantages,

we propose a method for learning articulation models, which estimates the uncertainty over model

parameters using a novel distribution over the set of screw transformations based on the matrix von

Mises-Fisher distribution over Stiefel manifolds [19]. We introduce DUST-net, Deep Uncertainty

estimation on Screw Transforms-network, a novel deep learning-based method that, in addition to

providing point estimates of the object’s articulation model parameters, leverages raw depth images

to provide uncertainty estimates that can be used to guide the robot’s behavior without requiring to

knowledge of the object’s articulation model category a priori.

DUST-net garners numerous beneﬁts over existing methods. First, DUST-net estimates articulation

properties for objects with uncertainty estimates, unlike most current methods [11–16]. These un-

certainty estimates, apart from helping robots to manipulate objects safely [1], could allow a robot

to take information-gathering actions when it is not conﬁdent and enhance its chances of success

in completing the task. Second, similar to ScrewNet [14], DUST-net can estimate model parame-

ters without the need to to know the articulation model category a priori, by leveraging the uniﬁed

representation for different articulation model types. Third, this uniﬁed representation helps DUST-

net to be more computationally and data-efﬁcient than other state-of-the-art methods [11, 12], as

it uses a single network to estimate model parameters for all common articulation models, unlike

other methods that require a separate network for each articulation model category [11, 12, 15, 16].

Empirically, DUST-net outperforms other methods even when trained using only half the training

data in comparison. Fourth, the distributional learning setting yields more robustness to outliers and

noise. Fifth, DUST-net is able to reliably estimate distributions over articulation model parameters

for objects in the robot’s camera frame. By contrast, ScrewNet [14], the most closely related ap-

proach to ours, can only predict point estimates for articulation model parameters in the object’s

local frame.

We evaluate DUST-net through experiments on two benchmarking datasets: a simulated articulated

objects dataset [11] and the PartNet-Mobility dataset [20–22], as well as three real-world objects: a

microwave, a drawer, and a toaster oven. We compare DUST-net with two state-of-the-art methods,

namely ScrewNet [14] and an MDN-based method proposed by Abbatematteo et al. [11], as well

as two baseline methods. The experiments demonstrate that the samples drawn from the distribu-

tions learned by DUST-net result in signiﬁcantly better estimates for articulation model parameters

in comparison to the point estimates predicted by other methods. Additionally, the experiments

show that DUST-net can successfully and accurately capture the uncertainty over articulation model

parameters resulting from noisy inputs.

2 Related Work

Articulation model estimation from visual observations: A widely used approach for estimating

articulation models is based on the probabilistic framework proposed by Sturm et al. [2]. It uses

the time-series observations of 6D poses of different parts of an articulated object to learn the re-

lationship between them [2, 5, 6, 10]. More recently, Abbatematteo et al. [11] and Li et al. [12]

proposed methods to learn articulation properties for objects from raw depth images given articu-

lation model category. In a related body of work on object parts mobility estimation, Wang et al.

[15] and Yan et al. [16] proposed approaches to segment different parts of the object in an input

point cloud and estimate their mobility relationships, given a known articulation model category.

Alleviating the requirement of having a known articulation model category, Jain et al. [14] recently

proposed ScrewNet that performs category-independent articulation model estimation from depth

images. However, these methods only predict point estimates for the articulation model parameters,

while DUST-net predicts a distribution over their values.

Rigid Body Pose Estimation: Our contributions are related to existing work on estimating distribu-

tions describing the orientation of rigid bodies. Gilitschenski et al. [23], Arun Srivatsan et al. [24],

Srivatsan et al. [25] and Rosen et al. [26] propose strategies that can be used to estimate the rigid

body transformation of an object using a combination of Bingham and Gaussian distributions, and

the von Mises-Fisher distribution, respectively. The mathematical model used by our approach is

inspired by these works, but 1) extends them to also represent uncertainty over the conﬁguration of

articulated object components about screw axes, and 2) integrates them into a deep learning model

that is capable of learning these conﬁgurations from raw depth images. In addition, while these

approaches use distributions over orientations and rigid body transformations to produce estimates,

DUST-net directly outputs a distribution that can be used to facilitate further applications such as

uncertainty-aware behavior planning.

Interactive perception (IP): Katz and Brock [3] introduced IP as a method to leverage a robot’s

interaction with objects to generate a rich perceptual signal for articulation model estimation for

planar objects, and extended it to learn 3D articulation models for objects [4]. Mart

ın-Mart

ın et al.

[8] used hierarchical recursive Bayesian ﬁlters to make estimation more robust and developed online

methods for articulation model estimation from RGB images [7–9]. A comprehensive survey on

IP methods in robotics was presented by Bohg et al. [27]. While IP presents a powerful tool for

estimating articulation properties for objects, a wide majority of existing IP methods require textured

objects, unlike DUST-net, which learns these properties using depth images.

Further approaches: Articulation motion models can be viewed as geometric constraints imposed

on multiple rigid bodies. Such constraints can be learned from human demonstrations by leveraging

different sensing modalities [13, 28–31]. Recently, Daniele et al. [30] proposed a multimodal learn-

ing framework that incorporates both vision and natural language information for articulation model

estimation. However, these approaches predict point estimates for the articulation model parameters,

unlike DUST-net, which predicts a distribution over the articulation model parameters.

3 Problem Formulation:

Given a sequence of n depth images I

1:n

of motion between two parts of an articulated object, we

estimate the parameters of a probability distribution p(φ|I

1:n

) representing uncertainty over the pa-

rameters φ of the articulation model M governing the motion between the two parts. Following

Jain et al. [14], we deﬁne the model parameters φ as the parameters of the screw axis of motion,

S = (l, m), where both l and m are elements of R

. This uniﬁed parameterization can be used in ar-

ticulation models with at most one degree-of-freedom (DoF), namely rigid, revolute, prismatic, and

helical [14]. Additionally, we estimate the parameters of a distribution p(q

1:n−1

1:n

) representing

uncertainty over the conﬁgurations q

1:n−1

identifying the rigid body transformations between the

two parts in the given sequence of images I

1:n

under model M with parameters φ. Conﬁgurations

, i ∈ {1...n − 1} correspond to a set of tuples, q

= (θ

, d

), deﬁning a rotation around and a

displacement along the screw axis S

. We assume that the relative motion between the two object

parts is determined by a single articulation model.

Please refer to the supplementary material for further details

4 Approach

Given a sequence of depth images I

1:n

of motion between two parts of an articulated object, DUST-

net estimates parameters of the joint probability distribution p(φ, q

1:n−1

1:n

) representing uncer-

tainty over the articulation model parameters φ governing the motion between the two parts and

the observed conﬁgurations q

1:n−1

. When deciding how to learn this distribution, two goals arise.

While some parameters, such as the translation of an object part along a screw axis, are deﬁned

on Euclidean space, the set of valid screw axes exhibits constraints that prevent standard distribu-

tions deﬁned on R

from being applied without complicating the learning process. For example, a

standard representation for distributions over screw axes can be the product of a Bingham distribu-

tion over the line’s orientation and a multivariate normal distribution over its position in space [32].

However, this representation produces non-unique estimation targets. A rotation of θ about the

screw axis with orientation l results in the same transformation as a rotation of −θ about the screw

axis with orientation −l. Similarly, a displacement d along l results in the same transformation as

a displacement −d along −l. This leads to ambiguities in the targets in the estimation problem and

can hinder the performance of the trained estimator. By selecting a representation that accounts for

these symmetries, these non-unique estimation targets are removed. Second, once a suitable param-

eterization is chosen, we seek a parametric form for the joint distribution which can be learned by a

deep network.

First, we consider the problem of parameterizing the set of screw axes. As noted earlier, we deﬁne

the model parameter φ as the parameters of the screw axis of motion S = (l, m). However, this

parameterization requires that l has unit norm, and that l and m are orthogonal. To eliminate these

constraints, we rewrite the moment vector of a screw axis as m = kmk

m, where kmk and

m rep-

resent its magnitude and a unit vector along it respectively, and the Pl

ucker coordinates for the screw

axis as S = (l,

m, kmk). The Pl

ucker coordinates can then be seen as an unconstrained point in the

space S := V

2,3

× R

, where (l,

m) ∈ V

2,3

with V

2,3

denoting the Stiefel manifold of 2-frames in

and kmk ∈ R

with R

denoting the set of positive real numbers. The Stiefel manifold V

k,m

is the space whose points are sets of k orthonormal vectors in R

, called k-frames in R

(k ≤ m)

[19]. Consequently, because of the one-to-one mapping from elements of V

2,3

× R

to screw axes,

the non-unique estimation targets described above are eliminated. Based on this parametrization of

screw axes, we deﬁne the set of valid conﬁguration parameters as follows. We restrict the range of

values for the rotation about the screw axis to be θ ∈ [0, 2π) and restrict the displacement along

the axis to be d ∈ R

. Note that these constraints do not reduce the representational power of the

screw transform (l, m, θ, d) to denote a general rigid body transform, but merely ensure a unique

representation.

Having described the parameterization of the set of screw axes and conﬁgurations, we now consider

the task of deﬁning a joint probability distribution over their values. We propose to represent the

distribution over predicted screw axis parameters, p(S | I

1:n

) with S ∈ S, as a product of a matrix

von Mises-Fisher distribution F(·|3, F) deﬁned on the Stiefel manifold V

2,3

and a truncated normal

distribution N

(·|µ, σ) with truncation interval [0, +∞) over R

. Formally,

p(S | I

1:n

) = p



m, kmk



1:n

, F, µ

, σ



= F ( l,

m | 3, F) N



kmk | µ

, σ



, (1)

where F is a 3×2 matrix representing the parameters of the matrix von Mises-Fisher distribution over

2,3

, and µ

and σ

denote the mean and standard deviation of the truncated normal distribution.

Given the sequence of n images, we also wish to estimate the posterior over conﬁgurations q

1:n−1

{θ

1:n−1

, d

1:n−1

} corresponding to the rotations about and displacements along the screw axis S. We

deﬁne the joint posterior representing the uncertainty over the screw axis S and the conﬁgurations

{θ

1:n−1

, d

1:n−1

} about it as a product of the aforementioned distribution and a set of distributions

deﬁned over the conﬁguration parameters,

p(S, θ

1:n−1

, d

1:n−1

| I

1:n

, Φ) = p(S; F, µ

, σ

) Ψ(θ

1:n−1

; ψ) Υ(d

1:n−1

; υ) (2)

where Φ = {F, µ

, σ

, ψ, υ} is the set of parameters for the distribution and Ψ and Υ represent the

set of distributions having parameters ψ and υ over the conﬁgurations θ

1:n−1

and d

1:n−1

, respec-

tively. For the sake of brevity, we present further details on modeling assumptions in the supple-

mentary material (see Appendix B). In this work, we consider Ψ and Υ to be products of truncated

normal distributions such that Ψ =

n−1

i=1

(θ

, σ

) and Υ =

n−1

i=1

, σ

) with

Please refer to the supplementary material for further details

剩余17页未读，继续阅读

评论收藏

内容反馈

版权申诉

易小侠

粉丝: 6474
资源: 9万+

HAC探索通过分层强化学习加速探索_HAC Explore Accelerating Exploration with Hier

最新资源

HAC探索通过分层强化学习加速探索_HAC Explore Accelerating Exploration with Hier

hac-zip_rar_ace_arj_pdf

破解文件密码HAC-ZIP_RAR_ACE_ARJ_PDF

ABB-IRC5-Conveyor Tracking 3HAC16587-1

GBase 8s HAC集群简介

Robotstudio说明书3HAC031146-003_rev-_de.pdf

论文研究-基于HAC估计视角的格兰杰伪因果关系检验.pdf

论文研究 - 使用HAC估计器进行干预分析

HAC-Hysnap

中文解压缩软件HAC-WinRAR311-Ronnier.exe

IRC5 For Training培训_20170327 3HAC024480-011 译至P32.pdf

HAC运维安全审计系统介绍.pptx

录音软件HAC-MP3Rec273-NW

hac.zip_CTF信息隐写_ctf

HAC-S-Spline22-fy

HAC-LBee V3.2-2.3 说明书 V3.3 20131107

论文研究-中值无偏修正的预白化HAC法在伪回归中的应用.pdf

RobWare4.0.122-3HAC6811-2.22.rar

Technical reference manual_RAPID_3HAC16581-1_revJ_en.pdf

3HAC050917-zh-cn.pdf

Cobalt Strike下载

北京邮电大学计算机考研复试笔试资料

计算机系统-笔记-HUN2021级

cs1.6老版本供下载

合成孔径雷达的经典成像算法cs(matlab)仿真代码（吐血整理，内容全，注释全）

港大CS（MSC）面试整理

合成孔径雷达RD CS OmegaK算法点目标仿真.rar

计算机科学导论原书第二版答案.zip

Cobalt-Strike-4.5

cobaltstrike4.3.zip

最新资源