(英)基于视觉的多阶段深度神经网络机械手姿态一致遥操作.pdf资源-CSDN文库

需积分: 10 163 浏览量 2020-09-08 19:14:30 上传评论收藏 1.05MB PDF 举报

在自动化技术和智能机器领域中，视觉遥操作技术作为实现机器人与人类操作者远程协作的关键手段之一，受到了广泛的关注和研究。特别是对于那些环境复杂、人类直接操作存在困难或者安全风险的场景，基于视觉的遥操作方法能够极大地拓展人类的操作能力，提高作业的灵活性和安全性。本文探讨了一种创新的基于深度神经网络的视觉遥操作方法，目的是实现机器人手臂与操作者姿势的高一致性遥操作。在传统机械手操作中，控制信号通常是通过机械臂的硬件接口直接传递的。而视觉遥操作技术则是在操作者与机械手之间引入了视觉反馈机制，通过图像数据来捕捉和解析操作者的动作意图，进而通过一系列计算处理，映射到机器人手臂上。这种方法的关键在于如何准确地从视觉信息中提取操作者的动作意图，并将其转换为机械手可执行的控制信号。为了达到这一目标，本文提出了一种多阶段深度神经网络结构。该网络由多个处理层组成，每一层都有特定的职责，共同工作以完成从视觉数据到机器人动作控制的映射。在这一过程中，网络首先需要识别和提取关键的视觉特征，然后根据这些特征预测出操作者关节的角度变化，最终生成对应的机器人动作指令。为了实现这一映射，网络需要经过大量的样本数据训练，以学习到动作的一致性和转换规则。为了构建有效的训练数据集，本文提出了一种新颖的人机姿势一致映射方法。这种方法通过构建和解决一个受约束的非线性矩阵函数来生成大量的视觉遥操作数据。这些数据能够模拟操作者动作与机械手响应之间的对应关系，使深度神经网络能够在多样化的条件下训练出准确的映射函数。数据生成器扮演了重要的角色，它负责生成丰富多样的训练样本，帮助网络捕捉到不同操作环境下动作的一致性。经过充分训练的多层次视觉遥操作网络，能够学习到将人类动作映射为机械手动作的高精度映射关系。在实际的远程操作实验中，验证了该网络的有效性和可靠性。实验结果显示，所提出的视觉遥操作方法能够实现人机姿势的高度一致性，即便是在复杂的动态变化环境中也能维持高稳定性和准确性。本文所提出的视觉遥操作技术不仅为机器人与人类操作者之间的远程协作提供了创新的解决方案，而且在多个领域具有重要的应用潜力。例如，在工业制造领域，远程操作技术可以用于远程质量检测、精密装配等工作；在医疗手术领域，它可以用于远程手术操作，提供给医生更多的操作选择和灵活性；在危险环境探索领域，它可以帮助人类操作员在远离危险的控制室内，遥控操作机器人探索和执行任务。此外，这项技术的推广也可能推动相关领域的研究进步，如提高网络的实时性能，优化数据生成策略，以及提高系统的鲁棒性和适应性。基于深度神经网络的视觉遥操作技术在实现人机动作一致性方面展现了巨大的潜力和应用前景。通过继续深入研究与开发，这一技术有望在未来的工业自动化、远程医疗服务、以及危险环境探索中扮演关键角色，为人类提供更加安全、高效和灵活的操作解决方案。

资源推荐

资源详情

资源评论

Robotics and Autonomous Systems 131 (2020) 103592

Contents lists available at ScienceDirect

Robotics and Autonomous Systems

journal homepage: www.elsevier.com/locate/robot

Vision-based posture-consistent teleoperation of robotic arm using

multi-stage deep neural network

Bin Fang

∗

, Xiao Ma

, Jiachun Wang

, Fuchun Sun

Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua

University, Beijing 100084, China

Institute of Mechanical Engineering, Yanshan University, Qinhuangdao 066004, China

a r t i c l e i n f o

Article history:

Available online 19 June 2020

Keywords:

Visual teleoperation

Deep neural networks

Human–robot posture-consistent mapping

Data generator

a b s t r a c t

This paper proposes a visual teleoperation with human–robot posture-consistent based on deep neural

network. A multi-stage structure of visual teleoperation network, in which the angles of robotic

joints are obtained from human, is deduced. Furthermore, a novel human–robot posture-consistent

mapping method is developed to generate dataset of the visual teleoperation network by solving

constrained nonlinear matrix functions. Based on the designed framework, the data generator and a

well trained multi-stage visual teleoperation network are presented. Finally teleoperation experiments

are implemented to demonstrate that the proposed method is effectiveness and reliable.

1. Introduction

With recent rapid advances in robotics, skill learning by

demonstration become more important for robots to finish vari-

ous tasks [1,2]. Through teaching by various demonstration meth-

ods, prior knowledge from human is transferred to robotic sys-

tems, and the robots may perform better in changing environ-

ments [3,4]. Teleoperation demonstration is superior to tradi-

tional programming demonstration because the latter one is

commonly time consuming and not flexible to adapt modern

manufacturing. [5] proposed the method of contacting teleopera-

tion demonstration that is applied for robot learning in dynamic

environment. In [6], the wearable device based on inertial sen-

sors is developed for transferring the interaction skills to the

robot by teleoperation demonstration. In [7], both movement and

stiffness features are learned to robots from human tutors by a

wearable device-based teleoperation. Comparing with contacting

and wearable device-based teleoperation demonstration, the ad-

vantage of visual teleoperation demonstration is showing natural

human-limb motions of human. Through the visual-haptic aid

teleoperation system, the virtual environment observed by depth

camera generates aiding force which helps human to feel real

touching forces and to teleoperate robots [8]. The online brain–

machine interface system based on visual evoked potentials is

developed and it is able to perform visual stimuli classification

∗

Corresponding author.

E-mail addresses: fangbin@mail.tsinghua.edu.cn (B. Fang),

learn0forever@sina.com (X. Ma), wjczth@ysu.edu.cn (J. Wang),

fcsun@mail.tsinghua.edu.cn (F. Sun).

using support vector machines [9]. The visual servoing in teleop-

eration is designed to provide a human operator with the position

information of a remote teleoperation for improving the task

execution [10]. However, visual methods are not limited to assist

in building a teleoperation system which contains rocker and

wearable device. A pose mapping interprets the function of the

task rather than replicating end-point position [11]. Therefore,

it is a very interesting research to design a visual teleoperation

framework which maps the arm posture of human operator to

the robot arm posture directly.

Compared to analytical methods, data-driven techniques

based on deep neural network place more weight on object rep-

resentation and perceptual processing such as pose estimation,

object detection and recognition [12–14]. Effect of convolutional

network depth is investigated in large-scale image recognition

setting [15,16]. In [17], deep convolution neural network methods

are proposed for 3D human pose estimation from monocular

images. A deep convolutional neural network is trained by a

spherical part model for hand pose estimation [18]. Inspired by

the competence of neural network methods in human posture

estimation, an end-to-end deep neural network which receives

the depth image of a human operator arm and outputs the

corresponding joint angles of a robot arm is suitable for visual

teleoperation. An issue of neural network methods direction

mapping from a human depth image to the joint angles of a

robot arm is highly nonlinear, which causes difficulties in a

learning procedure. Therefore, we propose a multi-stage visual

network which contains human body estimation and human–

robot posture-consistent. Besides the structure of the multi-stage

visual teleoperation network, the performance of the multi-stage

https://doi.org/10.1016/j.robot.2020.103592

2 B. Fang, X. Ma, J. Wang et al. / Robotics and Autonomous Systems 131 (2020) 103592

Fig. 1. The procedure of training the visual teleoperation network.

visual teleoperation network is also influenced by the quantity

and quality of train dataset directly. A dataset named UTD-

MHAD which consists of RGB images, depth images and skeleton

positions of human is provided in [19]. [20] includes 3.6 million

accurate 3D Human poses for training realistic human sensing

systems and evaluating human pose estimation algorithms. How-

ever, a human–robot posture-consistent dataset which contains

human–robot paired data needs to be established to train the

multi-stage visual teleoperation network. Therefore, it is mean-

ingful to develop a human–robot posture-consistent mapping for

the establishment of a train database.

A visual teleoperation framework based on deep neural net-

works is proposed for human–robot posture-consistent teleoper-

ation in the paper. A dataset with human–robot posture-

consistent data is generated by a novel mapping method which

maps human body data to the corresponding robot arm joint

angle data. A multi-stage visual teleoperation network is trained

by the human–robot posture-consistent dataset and then used to

teleoperate a robot arm. An illustrative experiment is conducted

to verify the visual teleoperation scheme developed. The main

contributions of this paper are shown as follows,

i A visual teleoperation framework is proposed, which fea-

tures a deep neural network structure and a posture map-

ping method.

ii A human–robot posture-consistent dataset is established

by a data generator, which is able to calculate the corre-

sponding robot arm joint angle data from the human body

data.

iii A multi-stage network structure has been proposed to in-

crease flexibility in training and using of the visual teleop-

eration network.

In this paper, a structure of visual a teleoperation network

is proposed and introduced in Section 2. In Section 3, a novel

human–robot posture-consistent mapping method is designed.

Finally, the experiments are described in Section 4 to test the

visual teleoperation network. A procedure of training a visual

teleoperation network is shown in Fig. 1.

Notation: Throughout this paper, R

is the n-dimensional Eu-

clidean space. ∥ · ∥

represents the 2-norm of vectors.

2. Visual teleoperation network

Training a deep neural network to solve a robot arm joint

angle data regression problem from human body data is chal-

lenging, because the regression problem is a highly nonlinear

mapping which causes difficulties in the learning procedure. To

overcome these difficulties, a multi-stage visual teleoperation

network which contains overall losses is proposed to generate

robot arm joint angle data from human body data.

2.1. Network structure

The proposed multi-stage visual teleoperation network con-

sists of three stages included a human arm keypoint position

estimation, a robot arm posture estimation and a robot arm

joint angle generation. The structure of the multi-stage visual

teleoperation network is show in Fig. 2.

Skeleton point estimation stage: To supervise human arm key-

point positions from a human depth image, a pixel-to-pixel part

and a pixel-to-point part are proposed in designing the human

arm keypoint position estimation stage. Three kinds of building

blocks are given in the pixel-to-pixel part. The first one is a

residual block which consists of convolution layers, batch nor-

malization layers and a activation function. The second block is

a downsampling block which is identical to a max pooling layer.

The last block is a upsampling block which contains deconvolu-

tion layers, batch normalization layers and a activation function.

The kernel size of the residual blocks is 3 × 3 and that of the

downsampling and upsampling layers is 2 × 2 with stride 2.

Furthermore, Max pooling layers, fully connected layers, batch

normalization and a activation function are proposed in the pixel

to point part.

Robot arm posture estimation: Robot arm directional angles are

estimated from human arm keypoint positions in the robot arm

posture estimation stage. Max pooling layers, fully connected lay-

ers, batch normalization and a activation function are proposed in

this stage.

Robot arm joint angle generation: In this stage, robot arm

joint angles are generated from robot arm directional angles by

fully connected layers, batch normalization layer and a activation

function.

2.2. Loss function

A overall loss function for training the multi-stage visual tele-

operation network consists of a human arm keypoint position

estimation loss, a robot arm posture estimation loss, a robot arm

joint angle generation loss and a physical constraint loss.

Skeleton point estimation loss: A mean squared error (MSE)

function is adopted as the human arm keypoint position loss L

as follows,

∑

n=1

∥X

−

∥

(1)

where X

= (x

, y

, z

) ∈ R

and

= (

) ∈ R

denote

the groundtruth coordinate and the estimated coordinate of the

nth keypoint of human arm, respectively. N denotes the number

of keypoints selected on the human arm.

Robot arm posture loss: A MSE function is given for the robot

arm posture loss L

as follows,

∑

n=1

∥R −

R∥

(2)

where R = (r

, r

) ∈ R

and

R = (

) ∈ R

are

groundtruth robot arm directional angles and estimated robot

arm directional angles of nth robot keypoints, respectively. Note

that N denotes the number of the robot keypoints.

Robot arm joint angle generation loss: The robot arm joint angle

generation loss L

for the robot angle generation stage supervised

by a MSE function is shown as follows,

= ∥Θ −

Θ∥

(3)

where Θ = (θ

, . . . , θ

) ∈ R

and

Θ = (

, . . . ,

) ∈ R

are the

groundtruth robot arm joint angles and estimated robot arm joint

剩余7页未读，继续阅读

评论收藏

内容反馈

senven0419

粉丝: 0
资源: 1

(英)基于视觉的多阶段深度神经网络机械手姿态一致遥操作.pdf

基于深度神经网络的视觉手部姿态追踪系统.pdf

基于ASPP的高分辨率卷积神经网络2D人体姿态估计研究.pdf

基于深度学习的飞行器姿态分析研究.pdf

基于深度神经网络的点云识别算法研究.pdf

基于卷积神经网络的改进机械臂抓取方法.pdf

基于感知深度神经网络的视觉跟踪.pdf

基于姿态辨识的遥操作液压工程机器人视觉提示.pdf

基于深度卷积神经网络的街景门牌号识别方法.pdf

基于深度神经网络的PET 瓶坯缺陷检测的研究.pdf

基于深度神经网络的表面划痕识别方法.pdf

基于长短时记忆和深度神经网络的视觉手势识别技术.pdf

基于前馈神经网络的3D人体姿态估计.pdf

基于姿态机和卷积神经网络的手的关键点估计.pdf

基于SLAM算法和深度神经网络的语义地图构建研究.pdf

基于BP神经网络的手写数字识别实验报告.pdf

深度学习实时多人姿态估计与跟踪.pdf

基于金字塔式双通道卷积神经网络的深度图像超分辨率重建.pdf

基于卷积神经网络的2D人体姿态估计综述.pdf

基于卷积神经网络的手写数字图像识别方法.pdf

基于卷积神经网络的人体姿态估计算法综述.pdf

基于机器视觉和卷积神经网络的轨道表面缺陷检测方法.pdf

基于神经网络的双目视觉下头部姿态估计.pdf

网络游戏-基于深度神经网络的视觉位姿估计方法研究.zip

基于卷积神经网络的人体姿态估计方法研究.pdf

基于深度学习的计算机视觉：原理与实践 深度学习原理.pdf

机械手视觉定位引导算法.pdf

基于机器视觉和深度神经网络的零件装配检测.pdf

最新资源

基于深度学习的计算机视觉：原理与实践深度学习原理.pdf