CrowdLearn：面向基于深度学习的损伤评估应用的众智-人工智能混合系统.docx资源-CSDN文库

版权申诉

101 浏览量 2024-04-24 11:24:17 上传评论收藏 2MB DOCX 举报

### CrowdLearn：面向基于深度学习的损伤评估应用的众智-人工智能混合系统 #### 概述《CrowdLearn：面向基于深度学习的损伤评估应用的众智-人工智能混合系统》是一篇研究论文，主要讨论了如何将人类智能与人工智能（AI）相结合，以提高在自然灾害后基于图像报告进行损害评估的准确性和可靠性。该研究由美国圣母大学计算机科学与工程系的研究人员完成。 #### 关键知识点 **1. 深度学习在损伤评估中的应用** - **定义与背景**：深度学习是一种机器学习技术，通过多层神经网络来处理复杂的数据。近年来，随着计算能力的提升和大数据的发展，深度学习在许多领域都取得了显著的成功。 - **应用案例**：在灾害管理领域，利用深度学习可以从卫星图像、航拍照片等数据中自动识别受灾地区的损坏程度。这种方法能够快速地提供初步评估结果，帮助救援机构更有效地分配资源。 **2. 人工智能的局限性** - **准确性问题**：尽管人工智能算法能够大大减少检测时间和人力成本，但在某些情况下，其准确度可能不如领域专家。 - **可解释性问题**：由于深度学习模型通常被视为“黑盒”，即难以理解其内部工作原理，这使得当算法性能不佳时，很难对其进行调试和改进。 **3. CrowdLearn系统的设计与实现** - **概念介绍**：为了解决上述问题，研究人员提出了CrowdLearn系统。这是一个将众包平台与AI算法相结合的混合系统，旨在利用人类智慧来增强机器智能，特别是在深度学习基础上的损害评估任务中。 - **工作原理**： 1. **故障排查**：当AI算法的表现低于预期时，可以利用众包平台上的人类工作者来校验和修正错误的结果。 2. **性能调优**：通过收集人类反馈，对AI模型进行微调，以提高其准确性和可靠性。 3. **持续改进**：随着时间的推移，系统会逐渐学习并适应新的模式，从而不断提升自身的性能。 - **特点分析**：CrowdLearn特别适用于那些人类工作者虽然响应速度较慢但准确率更高的场景。例如，在自然灾害发生后的损害评估过程中，人类专家往往能做出更加准确的判断。 **4. 实验验证与结果分析** - **实验设计**：为了验证CrowdLearn的有效性，研究团队在Amazon Mechanical Turk上进行了一项真实世界案例研究。 - **实验结果**：结果显示，CrowdLearn能够提供及时且更加准确的评估结果。这意味着该系统不仅能够弥补AI算法的不足，还能在一定程度上提高整体系统的可靠性和效率。 **总结** CrowdLearn是一个创新性的众智-人工智能混合系统，它通过结合机器智能与人类智慧的力量，有效解决了传统AI算法在准确性和可解释性方面的局限性。特别是在基于深度学习的损伤评估应用中，CrowdLearn展现出了巨大的潜力和优势。这项研究成果不仅对于改善现有的灾害管理系统具有重要意义，也为未来开发更加高效可靠的AI应用提供了新的思路和方向。

资源推荐

资源详情

资源评论

CrowdLearn: A Crowd-AI Hybrid System for Deep

Learning-based Damage Assessment Applications

Daniel (Yue) Zhang, Yang Zhang, Qi Li, Thomas Plummer, Dong Wang

Department of Computer Science and Engineering

University of Notre Dame, IN, USA

{yzhang40, yzhang42, qli8, tplumme2, dwang5}@nd.edu

Abstract—Artificial Intelligence (AI) has been widely adopted

in many important application domains such as speech recog-

nition, computer vision, autonomous driving, and AI for social

good. In this paper, we focus on the AI-based damage assessment

applications where deep neural network approaches are used to

automatically identify damage severity of impacted areas from

imagery reports in the aftermath of a disaster (e.g., earthquake,

hurricane, landslides). While AI algorithms often significantly

reduce the detection time and labor cost in such applications,

their performance sometimes falls short of the desired accuracy

and is considered to be less reliable than domain experts.

To exacerbate the problem, the black-box nature of the AI

algorithms also makes it difficult to troubleshoot the system

when their performance is unsatisfactory. The emergence of

crowdsourcing platforms (e.g., Amazon Mechanic Turk, Waze)

brings about the opportunity to incorporate human intelligence

into AI algorithms. However, the crowdsourcing platform is

also black-box in terms of the uncertain response delay and

crowd worker quality. In this work, we propose the CrowdLearn,

a crowd-AI hybrid system that leverages the crowdsourcing

platform to troubleshoot, tune, and eventually improve the black-

box AI algorithms by welding crowd intelligence with machine

intelligence. The system is specifically designed for deep learning-

based damage assessment (DDA) applications where the crowd

tend to be more accurate but less responsive than machines.

Our evaluation results on a real-world case study on Amazon

Mechanic Turk demonstrate that CrowdLearn can provide timely

and more accurate assessments to natural disaster events than

the state-of-the-art AI-only and human-AI integrated systems.

INTRODUCTION

The recent advances in artificial intelligence (AI) has trans-

formed many important domains of modern life (e.g., trans-

portation, finance, education, healthcare, and entertainment)

and its encroachment is expected to intensify [1]–[4]. In

this paper, we focus on an important AI application - deep

learning-based disaster damage assessment (DDA) where

deep neural network approaches are used to automatically

identify damage severity of impacted areas from imagery re-

ports in the aftermath of a disaster (e.g., earthquake, hurricane,

landslides) [5], [6]. The assessments are then delegated to

emergency response agencies (e.g., FEMA and police depart-

ments) to adopt appropriate countermeasures. Traditionally,

the damage assessment was primarily done by domain experts,

which suffers from the apparent limitation of the heavy labor

cost of labeling and low efficiency in the presence of massive

amount of data [7]. A set of AI algorithms have recently

been developed to automatically label the damage severity

in images from social media posts without the presence of

domain experts [5], [6].

While AI algorithms can significantly reduce the labor cost

and improve the detection efficiency in DDA applications,

they are prone to various failure scenarios (see Figure 1). For

example, the AI algorithms mistakenly report a severe damage

for the images in Figure 1(a) and 1(b) and report no damage

for images in Figure 1(c) and 1(d) (please refer to the detailed

discussions under Figure 1). One main reason for the above

failure scenarios is that the AI-based DDA algorithms can only

capture the low level features of the images (e.g., color, layout,

shapes) but fail to “understand” the high level context of the

images (e.g., the story behind the image). Such failure may

lead to severe consequences (e.g., the rescue team may be

sent to the wrong places while places where people’s lives are

at stake are not responded). In contrast, human intelligence

(HI) is often more accurate in such failure scenarios [8]. For

example, humans can reliably assess the damage severity by

identifying fake or irrelevant images (e.g., Figure 1(a) and

1(b)) and observing the actual events happening in the images

(e.g., Figure 1(c) and 1(d)).

(a) Fake Image (b) Close Up

Image (a) is a fake image showing a car falling from a huge

cleavage of a road. Image (b) is a close-up of a crack on a road.

The AI algorithms mistakenly return false detection result of

“severe damage” for both images. Image (c) shows a disaster scene

image with low resolution. Image (d) shows kids were injured and

taken away from a damaged area. The AI algorithms mistakenly

return false detection result of “no damage” for both images.

Figure 1: Examples failures of AI algorithm of DDA

Motivated by the unique and complementary advantages and

limitations of AI and HI, we propose CrowdLearn, a crowd-AI

hybrid system that leverages HI to troubleshoot, tune and even-

tually improve the performance of AI-based DDA application.

To acquire HI, we leverage the crowdsourcing platform (i.e.,

Amazon Mechanical Turk or MTurk) that provides a massive

amount of freelance workers with low cost. However, two

critical pitfalls exist by leveraging crowdsourcing platform:

the freelance workers may not be able to provide responses

that are as accurate as domain experts due to the lack of

experience/expertise; 2) the delay of the crowd workers can

be potentially too high to be acceptable for DDA applications.

These two pitfalls are further exacerbated by the black-box

challenges of both the AI and crowdsourcing platform that

are not well addressed by the existing literature in human-AI

systems [9], [10]. We elaborate the challenges below.

Black-box AI Challenge: the first challenge in combining

HI and AI lies in the black-box nature of AI algorithms. In

particular, the lack of interpretability of the results from AI

algorithms makes it extremely hard to diagnose the failure

scenarios such as performance deficiency - why the AI model

fails? Is this due to lack of training data or the model itself?

Such questions make it hard for the crowd to effectively

improve the black-box AI model. The interpretability issue

has been initially identified in [10], [11] where accountable

AI solutions were proposed to leverage humans as annotators

to troubleshoot and correct the outputs of AI algorithms. How-

ever, these solutions simply use humans to verify the results of

AI and ignore the issue where human annotators can be both

slow and expensive. There also exist some human-AI systems

that use crowdsourcing platforms to obtain labels or features

to retrain the model [12], [13]. However, these systems do not

address the problem where the AI algorithms themselves are

problematic in which no matter how many training samples

are added, the AI performance will not increase. Given the

black-box nature of AI, the research question we address here

is: how do we accurately identify the failure scenarios of AI

that can be effectively addressed by the crowd?

Black-box Crowdsourcing Platform Challenge: the sec-

ond unique challenge lies in the black-box nature of the

crowdsourcing platform, which is characterized by two unique

features. First, the requester (the DDA application that queries

the platform) often cannot directly select and manage the

workers in the crowdsourcing platform. In fact, the requester

can only submit tasks and define the incentives for each

task. The lack of control makes the incentive design for the

crowdsourcing platform very difficult since we cannot cherry-

pick the highly reliable and responsive workers to complete the

tasks. For this reason, the current incentive design solutions

that assume the full control of the crowd workers cannot

be applied to our problem [14]–[18]. Second, the time and

quality of the responses from the crowd workers are highly

dynamic and unpredictable and their relationships to incentives

are not trivial to model. Existing solutions often assume that

more incentives will lead to less response time and high

response quality [13], [19]. However, we found the quality

of the responses from the crowd workers is diversified and

does not simply depend on the level of incentives provided in

our experiments (e.g., the quality can be high even with low

incentives provided). Similarly, we observe the response delay

from crowd is not simply proportional to the incentive level.

With these unique features, the research question to tackle here

is: how to effectively incentivize the crowd to provide reliable

and timely responses to improve AI performance?

In this work, we design a CrowdLearn framework that

leverages human feedback from the crowdsourcing platform to

troubleshoot, calibrate and boost the AI performance in DDA

applications. In particular, CrowdLearn address the black-

box challenges of AI and the crowdsourcing platform by

developing four new schemes: 1) a query set selection (QSS)

scheme to find the best strategy to query the crowdsourcing

platform for feedback; 2) a new incentive policy design (IPD)

scheme to incentivize the crowd to provide timely and accurate

response to the query; 3) a crowd quality control (CQC)

scheme that refines the responses from the crowd and provides

trustworthy feedback to the AI algorithms; 4) a machine

intelligence calibration (MIC) scheme that incorporates the

feedback from the crowd to improve the AI algorithms by

alleviating various failure scenarios of AI. The four compo-

nents are integrated into a holistic closed-loop system that

allows the AI and crowd to effectively interact with each

other and eventually achieve boosted performance for the

DDA application. The CrowdLearn framework was evaluated

using Amazon Mechanical Turk (MTurk) and a real-world

DDA application. We compared CrowdLearn with the state-

of-the-art baselines in both AI-only algorithms and human-

AI frameworks. The results show that our scheme achieves

significant performance gain in terms of classification accuracy

in disaster damage assessment with reasonably low response

time and costs.

II.

RELATED WORK

Human-AI Systems

Humans have traditionally been an integral part of artificial

intelligence systems as a means of generating labeled training

data [3], [11], [20]. Such a paradigm has been proven to be

effective in supervised learning tasks such as image classifica-

tion [21], speech recognition [22], autonomous driving [23],

social media mining [24], and virtual reality [25]. However, it

also suffers from two key limitations. First, some applications

(e.g., damage assessment) may require a large amount of

training data to achieve reasonable performance, which could

be impractical due to the labor cost [5], [9]. Second, the

AI models are often black-box systems and it is difficult to

diagnose in the event of failure and unsatisfactory perfor-

mance. To address these limitations, a few human-AI hybrid

frameworks have been developed in recent years. For example,

Holzinger et al. proposed the notion of interactive human

machine learning (“iML”) where humans directly interact

with AI by identifying useful features that could be incor-

porated into the AI algorithms [26]. Branson et al. invented

a human-in-the-loop visual recognition system to accurately

classify the objects in the picture based on the descriptions

of the picture from humans [12]. Nushi et al. developed

an accountable human-AI system that leverages workers on

MTurk to identify the limitations of the AI algorithms [10]

and provide suggestions to improve them. However, the above

solutions largely ignored the innate limitations of the AI

algorithms that cannot be simply improved by retraining the

model with more data. In contrast, CrowdLearn proactively

identifies the innate limitations of AI and develops a set of

machine intelligence calibration strategies to address various

failure cases. Moreover, the above human-AI systems also

ignore the black-box nature of crowdsourcing platform and

adopt a fixed-incentive strategy that randomly assigns data for

the crowd to label. Such an approach could cause significant

delay in acquiring the human labels. In contrast, CrowdLearn

incorporates a context-aware reinforcement learning scheme

to ensure quick and reliable response from the crowd.

Active Learning Frameworks

Active Learning (AL) is a common technique to combine

machine and human intelligence in human-AI systems [13]. In

an active learning framework, an AI algorithm actively asks

for the labels of some instances from domain experts [27].

The major benefit of AL is that it selects a “subset” of data

samples to be labelled and significantly reduces the labeling

costs and improves the efficiency. For example, Ambati et al.

proposed Active Crowd Translation (ACT), a new machine

translation paradigm where active learning technique is applied

to dynamically query the crowd for annotations of texts. The

annotations are then used to train a AI model to automatically

translate low resource languages [28]. Laws et al. proposed

an active learning framework using a retraining technique for

supervised learning tasks - the algorithm iteratively identify

instances for the crowd to obtain the labels and retrain the

model using the newly obtained labels [13]. However, these

solutions could not handle scenarios where AI algorithms fail

due to the flaws in their model design instead of insufficient

training data. In contrast, CrowdLearn is able to diagnose

the model and query the crowd to directly take over the AI

algorithm in such failure scenarios. We compare our scheme

with representative active learning frameworks in Section V.

AI-based Disaster Response

locate the damage area by combining CNN and Grad-CAM to

generate a damage heatmap of a given image [5]. However, the

above AI-driven solutions are incapable of providing accurate

damage assessments in cases that deal with low-resolution or

deceptive images. In this paper, we propose a novel scheme

to significantly improve the performance of AI algorithm

by welding the crowd wisdom with AI. To the best of our

knowledge, CrowdLearn is the first Crowd-AI hybrid system

in this application domain.

III.

PROBLEM FORMULATION

In this section, we first introduce the AI and crowd models

respectively and then formally define our problem.

AI-based Disaster Damage Assessment Model

We first introduce the AI-based Disaster Damage Assess-

ment (DDA) model. In a DDA application, images posted

from social media related to a disaster event are dynamically

crawled and classified based on the levels of the damage

reported in the image. Figure 2 shows an example of different

levels of damage from images in an DDA application. The

damage assessment provides the critical information for emer-

gency responses (e.g., sending out the rescue teams, allocating

resources). The DDA application is constantly running and the

images of the disaster are periodically crawled and analyzed.

We refer to the updating period as a sensing cycle, which is

formally defined below.

(a) No Damage (b) Moderate Damage (c) Severe Damage

Figure 2: Examples Output Labels of DDA

DEFINITION 1. Sensing Cycle (Ω): a period of time where

new (unseen) data samples are collected.

We assume a DDA application has a total of T sensing

cycles. The input data samples to the DDA algorithm is a

set of N images, denoted as

, I

...,

, where

denotes

1 2

N i

Disaster response is a critical application to ensure im-

mediate resolution to emergent and hazardous events [29]–

[33]. A critical step in disaster response is to perform damage

assessment (e.g., determine the severity of the damage caused

by a disaster based on imagery data). Traditionally, the damage

assessment models were built on remote sensing data (e.g.,

satellite images). For example, Facebook recently proposed

an AI framework to identify the areas that were severely

affected by a disaster using convolutional neural networks

(CNNs) on satellite imagery [34]. In a more recent work,

Nguyan et al. developed a deep CNN model with domain-

specific fine-tuning (referred to as VGG16) to effectively

detect the level of damage from social media images [6].

Li et. al further extends the VGG16 model to accurately

the

input image at the

sensing cycle. Each image

associated with a ground truth label O

and an estimated label

(i.e., classification result) from the AI algorithm

As discussed in the introduction, we make a few observa-

tions about the deep learning-based DDA algorithms below.

Black-box: the DDA algorithms are black-box deep neu-

ral network models and the classification results in gen-

eral lack interpretability.

Failure accountability: the AI-based DDA algorithms can

fail (i.e., providing wrong classification labels for images)

and the failure scenarios cannot be easily diagnosed

without human scrutiny [11].

The above observations are critical in the design of the

CrowdLearn scheme. To alleviate the performance deficiency

剩余11页未读，继续阅读

评论收藏

内容反馈

版权申诉

百态老人

粉丝: 6779
资源: 2万+

CrowdLearn：面向基于深度学习的损伤评估应用的众智-人工智能混合系统.docx

智联世界 众智成城.docx

地下管线管理信息系统.docx

购物-众智云手机版 v1.2.65.zip

众智科学题库以及答案.rar

人工智能的数据、算法和处理，三者缺一不可.docx

智慧城市运营中心方案助力智慧城市建设.docx

Jarvis大数据分析挖掘平台介绍.docx

计算机行业点评：世界人工智能大会成功举办，智慧城市将为AI创造丰富的落地场景.pdf

督导部代理商管理办法.docx

全球数字经济十大发展趋势.docx

“现代服务业共性关键技术研发及应用示范”重点专项2021年度项目申报指南.docx

专题04：疫情防控素材整理：精选标题+疫情金句+金句运用+段落素材.docx

众智建筑规范资源速查.exe

申报建筑专业中级职称述职报告.docx

2021年部编版七年级道德与法治上册期中测试卷（各版本）.docx

众智 HGM6100中文通讯协议.pdf

2021年部编版七年级道德与法治上册期中考试一.docx

关于今年疫情主题满分作文最新大全5篇.docx

相关实用应用程序（Windows可用）

李飞飞自传 我看见的世界 The World I see

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

eetop.cn-07-1射频电路设计理论与应用-王子宇 -课后答案1-10章

更多目录以及详细说明（年份、来源、截图等）

【中国航空学会-2024研报】2024低空经济场景白皮书v1.0.pdf

哈尔滨工业大学-ChatGPT调研报告-2023.3.6-94页.pdf

学术海报模板+论文科研+研究生

4个亲测好用的ChatGPT4渠道

车载毫米波雷达DOA估计综述博文仿真代码

最新资源

智联世界众智成城.docx

李飞飞自传我看见的世界 The World I see