没有合适的资源?快使用搜索试试~ 我知道了~
资源详情
资源评论
资源推荐
i
AI Enabling Technologies: A Survey
Vijay Gadepally, Justin Goodwin, Jeremy Kepner, Albert Reuther, Hayley Reynolds,
Siddharth Samsi, Jonathan Su, David Martinez
MIT Lincoln Laboratory
244 Wood Street
Lexington, MA, 02421
ABSTRACT
Artificial intelligence (AI) has the opportunity to revolutionize the way the United States Department
of Defense (DoD) and Intelligence Community (IC) address the challenges of evolving threats, data deluge,
and rapid courses of action. Developing an end-to-end AI system involves parallel development of different
pieces that must work together in order to provide capabilities that can be used by decision makers, warfighters,
and analysts. These pieces include data collection, data conditioning, algorithms, computing, robust AI, and
human–machine teaming. Although much of the popular press today surrounds advances in algorithms and
computing, most modern AI systems leverage advances across numerous different fields. Further, while
certain components may not be as visible to end-users as others, our experience has shown that each of these
interrelated components play a major role in the success or failure of an AI system. This article is meant to
highlight many of these technologies that are involved in an end-to-end AI system. The goal of this article is
to provide readers with an overview of terminology, technical details, and recent highlights from academia,
industry, and government. Where possible, we indicate relevant resources that can be used for further reading
and understanding.
DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.
This material is based upon work supported by the United States Air Force under Air Force Contract No. FA8702-
15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the
author(s) and do not necessarily reflect the views of the United States Air Force.
© 2019 Massachusetts Institute of Technology.
Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb
2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-
7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S.
Government may violate any copyrights that exist in this work.
1
1 INTRODUCTION
Figure 1.1. Canonical AI architecture consists of sensors, data conditioning, algorithms, modern computing, robust
AI, human–machine teaming, and users (missions). Each step is critical in developing end-to-end AI applications and
systems.
AI has the opportunity to revolutionize the way the DoD and ICs address the challenges of evolving
threats, data deluge, and rapid courses of action. AI solutions involve a number of different pieces that must
work together in order to provide capabilities that can be used by decision makers, warfighters, and analysts.
Consider the canonical architecture of an AI system in Figure 1.1. This figure outlines many of the important
components needed when developing an end-to-end AI solution. While much of the popular press surrounds
advances in algorithms and computing, most modern AI systems leverage advances across numerous different
fields. Further, while certain components may not be as visible to end users as others, our experience has
shown that each of these interrelated components play a major role in the success or failure of an AI system.
On the left side of Figure 1.1, we have data coming in from a variety of structured and unstructured
sources. Often, these structured and unstructured data sources together provide different views of the same
entities and/or phenomenology. Often data from sensors are categorized as structured data because the raw
digital data are accompanied by metadata. In contrast, “unstructured data” is how we refer to data in which
there is no predefined structure or source of metadata contained.
2
These raw data are often fed into a data conditioning step in which they are fused, aggregated, structured,
accumulated, and converted to information. The main objective for this subcomponent is to transform data
into information. An example of information is a new sensor image (after data labeling) that we need to use
to classify if the object of interest is present in that image or not (like a vehicle of interest). Typical functions
performed under this subcomponent are: standardization of data formats complying with a data ontology, data
labeling, highlights of missing or incomplete data, errors/biases in the data, etc.
The information generated by the data conditioning step feeds into a host of supervised and
unsupervised algorithms such as neural networks. These algorithms are used to extract patterns, predict new
events, fill in missing data, or look for similarities across datasets. These algorithms essentially convert the
input information to actionable knowledge. In our definition, we use the term “knowledge” to describe
information that has been converted into a higher-level representation that is ready for human consumption.
With the knowledge extracted in the algorithms phase, it is important to include the human being in the
decision-making process. This is done in the human–machine teaming phase. Although there are a few
applications that may be amenable to autonomous decision making (e.g., email spam filtering), recent AI
advances of relevance to the DoD have largely been in fields where a human is either in- or on- the-loop. The
phase of human–machine teaming is critical in connecting the data and algorithms to the end user and in
providing the mission users with useful and relevant insight. Human–machine teaming is the phase in which
knowledge can be turned into actionable intelligence or insight by effectively utilizing human and machine
resources as appropriate.
Underpinning all of these phases is the bedrock of modern computing systems made up of a number of
heterogenous computing elements. For example, sensor processing may occur on low power embedded
computers whereas algorithms may be computed in very large data centers. With the end of Moore’s law [1],
we’ve seen a Cambrian explosion of computing technologies and architectures. Understanding the relative
Figure 1.2. Example categories and video screen shots from the Moments in Time Dataset.
Pulling
Applauding
Asking
Drawing
Adult+Male+
Speaking
Skating
3
benefits of these technologies is of particular importance to applying AI to domains under significant
constraints such as size, weight, and power.
Another foundational technology underpinning AI development is robust or trusted AI. In this area,
researchers are looking at ways to explain AI outcomes (for example, why a system is recommending a
particular course of action); metrics to measure the effectiveness of an AI algorithm (going beyond the
traditional accuracy and precision metrics for complex applications or decisions); verification and validation
(ensuring that results are provably correct under adversarial conditions); security (dealing with malicious or
counter-AI technology); and policy decisions that govern the safe, responsible, and ethical use of AI
technology. Although traditional academic and commercial players are looking at these issues, some non-
profit initiatives such as OpenAI or the Allen Institute are taking a leading role in this area.
In the following sections, we highlight some of the salient technical concepts, research challenges, and
opportunities for each of these core components of an AI system. In order to elucidate these components, we
also use a running example based on research applying high-performance computing (HPC) to video
classification. We would also like to note that each of the components of the AI architecture are vast academic
areas with rich histories and numerous well published results. In order to provide readers with an overall view
of all the components within this section, we concentrate on high-level concepts and also include vignettes of
select research highlights or application examples.
1.1 VIDEO CLASSIFICATION EXAMPLE OVERVIEW
Over the course of this section, in order to provide concrete examples of components of the AI
architecture being discussed, we use a running example based on our research of using high performance
computing for video classification purposes. Specifically, we concentrate on the recently developed Moments
in Time Dataset [2] developed at the Massachusetts Institute of Technology (MIT) Computer Science and
Artificial Intelligence Laboratory (CSAIL). This dataset consists of one million videos given a label
corresponding to an action being performed in the video. Each video is approximately three seconds in length
and is labeled according to what a human observer believes is happening in the video. For example, a video
of a dog barking is classified as “barking” and a video of people clapping would be labeled as “clapping.”
Figure 1.2 shows a few screenshots of videos from the dataset and associated labels. Of course, there are many
areas where a particular label may not be as precise. For example, videos with the action label “cramming”
could imply a person studying before an exam or someone putting something into a box. As of now, each
video in the Moments in Time Dataset is labeled with one of approximately 380 possible labels. Some of the
video clips also contain audio, but it is not necessarily present for all videos.
The Moments in Time Dataset is an example of a well-curated dataset that can be used to conduct
research on video classification. To this effect, the creators of the dataset held a competition in 2018 to
encourage dataset usage and share results that highlight the state of the field. Information about this
competition can be found at: https://moments.csail.mit.edu/challenge2018/
As a metric to present the quality of a particular algorithm, the competition called for presentation of a
top-k accuracy score. This metric is defined as follows: An algorithm will label each of the videos with one
of k labels. The top-k accuracy says that a video was correctly identified if one of its top k labels is the correct
label. For example, a video may be classified (in decreasing probability) as: (barking, yelling, running, …). If
4
the correct label (as judged by a human observer) is “yelling”, the top-five accuracy for this would be 1. The
top-one accuracy would be 0. As of June 2018, competition winners had top-one accuracies of approximately
0.3 and top-five accuracies of approximately 0.6 [3–5].
剩余49页未读,继续阅读
gaoyazhao
- 粉丝: 3
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0