CIRA环境科学中神经网络自定义损失函数指南第1版_CIRAGuidetoCustomLossFunctionsfo资源-CSDN文库

版权申诉

66 浏览量 2022-01-17 15:10:33 上传评论收藏 4.62MB PDF 举报

CIRA环境科学中神经网络自定义损失函数指南第1版_CIRA Guide to Custom Loss Functions for Neural Networks in Environmental Sciences -- Version 1.pdf 在环境科学领域，神经网络的应用日益广泛。神经网络模型通过最小化损失函数进行训练，而损失函数的选择对于环境科学应用至关重要，因为它决定了优化的具体目标。标准的损失函数并不完全满足环境科学的所有需求，因此科学家们需要能够开发自定义损失函数，以便将已有的环境科学性能度量标准（包括空间模型验证的度量）融入到模型中。目前，关于自定义损失函数开发的基础资源非常有限，特别是针对环境科学家需求的资源更是匮乏。为此，"CIRA环境科学中神经网络自定义损失函数指南第1版"应运而生，旨在提供一个针对环境科学应用编写自定义损失函数的指南。本指南涵盖了自定义损失函数的基本概念，包括如何创建自定义损失函数、常见的陷阱、损失函数中可使用的函数，以及具体实例，如使用分数技能得分作为损失函数。此外，还介绍了如何将物理约束纳入损失函数，以及离散化（discrete discretization）和软离散化（soft discretization）的概念。此外，还探讨了聚焦损失（focal loss）、鲁棒损失（robust loss）和适应性损失（adaptive loss）等高级主题。虽然本指南中的示例是使用Python与Keras以及TensorFlow后端实现的，但基本概念同样适用于其他环境，比如Python与PyTorch。样本代码和解释可以帮助读者理解如何在实际操作中实施这些理论。在环境科学中，模型的评估和优化通常需要考虑特定领域的特性，例如空间相关性、时间序列的连续性和非线性关系。自定义损失函数允许科学家们根据这些特性定制模型的目标，从而提高模型的预测能力和解释性。例如，通过使用分数技能得分（Fractional Skill Score, FSS）作为损失函数，可以更准确地衡量模型在分类任务中的性能，特别是在处理类别不平衡数据时。在考虑物理约束时，损失函数可以确保模型的预测结果符合已知的物理规律，如能量守恒或质量平衡，这在气候建模和大气科学等领域尤为重要。离散化技术则有助于处理连续和离散变量之间的转换问题，软离散化则提供了一种平滑过渡的方式，能够在一定程度上减少模型对离散值的过度敏感。这份指南为环境科学家提供了一个宝贵的工具，帮助他们更好地利用神经网络技术解决复杂环境问题，通过自定义损失函数提升模型的性能，同时与现有的环境科学评价标准保持一致。它不仅教授了技术方法，也强调了理论和实践相结合的重要性，对于推动环境科学的计算建模具有深远的影响。

资源推荐

资源详情

资源评论

CIRA GUIDE TO CUSTOM LOSS FUNCTIONS FOR NEURAL

NETWORKS IN ENVIRONMENTAL SCIENCES - VERSION 1

Imme Ebert-Uphoff

∗

CIRA, ECE

iebert@colostate.edu

Ryan Lagerquist

CIRA, NOAA-GSL

ralager@colostate.edu

Kyle Hilburn

CIRA

Kyle.Hilburn@colostate.edu

Yoonjin Lee

CIRA

Yoonjin.Lee@colostate.edu

Katherine Haynes

CIRA

Katherine.Haynes@colostate.edu

Jason Stock

Jason.Stock@rams.colostate.edu

Christina Kumler

CIRES, NOAA-GSL

christina.e.kumler@noaa.gov

Jebb Q. Stewart

NOAA-GSL

jebb.q.stewart@noaa.gov

June 21, 2021

ABSTRACT

Neural networks are increasingly used in environmental science applications. Furthermore, neural

network models are trained by minimizing a loss function, and it is crucial to choose the loss

function very carefully for environmental science applications, as it determines what exactly is being

optimized. Standard loss functions do not cover all the needs of the environmental sciences, which

makes it important for scientists to be able to develop their own custom loss functions so that they can

implement many of the classic performance measures already developed in environmental science,

including measures developed for spatial model veriﬁcation. However, there are very few resources

available that cover the basics of custom loss function development comprehensively, and to the best

of our knowledge none that focus on the needs of environmental scientists. This document seeks to

ﬁll this gap by providing a guide on how to write custom loss functions targeted toward environmental

science applications. Topics include the basics of writing custom loss functions, common pitfalls,

functions to use in loss functions, examples such as fractions skill score as loss function, how to

incorporate physical constraints, discrete and soft discretization, and concepts such as focal, robust,

and adaptive loss. While examples are currently provided in this guide for Python with Keras and

the TensorFlow backend, the basic concepts also apply to other environments, such as Python with

PyTorch. Similarly, while the sample loss functions provided here are from meteorology, these are

just examples of how to create custom loss functions. Other ﬁelds in the environmental sciences have

very similar needs for custom loss functions, e.g., for evaluating spatial forecasts effectively, and the

concepts discussed here can be applied there as well. All code samples are provided in a GitHub

repository.

∗

CIRA:

Cooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, CO.

ECE:

Electrical

and Computer Engineering, Colorado State University, Fort Collins, CO.

NOAA-GSL:

National Oceanic and Atmospheric

Administration (NOAA), Global Systems Laboratory (GSL), Boulder, Colorado.

CS:

Computer Science, Colorado State

University, Fort Collins, CO.

CIRES:

Cooperative Institute for Research in Environmental Sciences, University of Colorado

Boulder, Boulder, CO.

arXiv:2106.09757v1 [cs.LG] 17 Jun 2021

CIRA GUIDE TO CUSTOM LOSS FUNCTIONS

1 Why would environmental scientists care about custom loss functions?

The use of neural networks in environmental science applications is growing at a rapid pace. In order to train a

neural network one has to choose a cost function, called a loss function in the context of neural networks, which

represents the error of the neural network. The neural network is then trained, i.e. its parameters are chosen through an

iterative process, such that the loss function, and thus the error, is minimized. It is crucial to choose the loss function

very carefully for environmental applications, as it determines what exactly the neural network is optimizing. Many

pre-deﬁned loss functions exist. The most popular examples for regression (predicting a continuous value such as wind

speed) include mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE). The most

popular example for classiﬁcation is cross-entropy. There are many other predeﬁned loss functions, and their number

keeps growing, but they do not cover everything environmental scientists care about, since they were developed for other

applications. In fact, environmental scientists have a long tradition of developing meaningful performance measures for

forecasting tasks, such as for single-category forecasts (accuracy, frequency bias, probability of detection, success ratio,

etc.); for multi-category forecasts (Heidke score, Gerrity score, etc.); for continuous forecasts (correlation, reliability

diagram, etc.); for probabilistic forecasts (reliability diagram, Brier score, etc.); for spatial forecasts (neighborhood

methods, such as fractions skill score, and scale decomposition, such as wavelet decomposition), and many others. For an

extremely comprehensive overview of these and other performance measures, see the guide of the WWRP/WGNE Joint

Working Group on Forecast Veriﬁcation Research at https://www.cawcr.gov.au/projects/verification/.

However, it is not obvious which ones of those performance measures can easily be used in a neural network and how

to do it, for the following reasons:

•

There are various limitations of what can be implemented in a neural network loss function. Functions must

be differentiable and execute extremely quickly, which makes it tricky to implement custom loss functions.

•

The loss functions required by environmental scientists are unlike any loss functions typically used in computer

science, and the community has not yet developed comprehensive resources, such as a large collection of

customized loss functions.

The above reasons make the topic of loss functions a signiﬁcant hurdle for practitioners striving to implement meaningful

loss functions for their applications. We seek to close this gap here by providing comprehensive instructions, including

many examples and discussion of common pitfalls, on how to code custom loss functions. While the sample loss

functions provided here are from meteorology, these are just examples of how to create custom loss functions. Other

ﬁelds in the environmental sciences have very similar needs for custom loss functions, e.g., for evaluating spatial

forecasts effectively, and the concepts discussed here can be applied there as well. Making it possible, and even easy, to

use a variety of meaningful loss functions will go a long way to more effectively tune neural networks to focus on the

types of performance criteria that are truly important in environmental science applications, thus helping the science

community to make the most of neural networks for their applications. Scientists in other areas have invested in similar

efforts, e.g., researchers in the medical imaging community have developed a collection of loss functions speciﬁcally

for medical image segmentation [

]. Loss function development for different purposes also remains a very active topic

in computer science. See [2, 3, 4, 5] for just a small sample of recent research.

1.1 A case in point: using measures from spatial model veriﬁcation for neural networks

Here we brieﬂy discuss one area with particularly high potential for the development and use of custom loss functions,

namely neural networks for spatial forecasts in the environmental sciences. Meteorologists and other environmental

scientists have developed an extensive set of evaluation measures for spatial model veriﬁcation, and ideally those

evaluation measures should be used directly for neural network training for forecast models. Why train a neural network

on anything other than the criteria we seek to optimize? However, many networks are still trained using pixel-based

measures, such as MAE, MSE, or RMSE, mainly because those are easily available as loss functions.

Gilleland et al. [

] conducted a large scale comparison of different model veriﬁcation methods that focus on methods to

compare a spatial forecast (image) to an observation (also an image). In [

] they distinguish four primary classes of

methods for model veriﬁcation

• Neighborhood:

Methods that apply some kind of neighborhood averaging to both the forecast and the

observation before applying a pixel-based comparison of the resulting smoothed images.

Example: Fractions skill score.

A later classiﬁcation by the same group cites ﬁve [

], but we prefer the separation into the original four categories described in

[6, 7].

CIRA GUIDE TO CUSTOM LOSS FUNCTIONS

• Scale separation:

Methods that separate the signals in forecast and observation images into different spatial

scales (orthogonal decomposition).

Example: Applying Fourier or wavelet transformation, then spatial ﬁltering, to both images before pixel-wise

comparison.

• Features based:

Methods that seek to identify features, such as connected regions, in both forecast and

observation, then seek to match those features between forecast and observation and compare their properties.

Example: Applying the Method for Object-Based Diagnostic Evaluation (MODE) framework developed by

[9], which utilizes fuzzy logic.

• Field deformation:

Method that morphs one of the ﬁelds so that its locations match the other ﬁeld, by

producing a ﬁeld of distortion vectors.

Example: Using the technique of optical ﬂow to calculate the distortion ﬁeld.

Many approaches fall within more than one category above [6, 7]. This list indicates the large variety of sophisticated

evaluation methods developed by meteorologists over decades. In contrast, neural networks developed for meteorological

forecasting are lagging far behind this development. In fact, most of them still use simplistic, pixel-based loss functions,

to describe the optimal behavior to be achieved by the neural network. Thus there is huge potential to develop more

meaningful loss functions that truly optimize the measures that scientists care about.

In particular, one may wonder which ones of the four spatial model veriﬁcation categories listed above can be

implemented in loss functions. Both neighborhood based and scale separation based methods are feasible for neural

network implementation - in fact we have already implement some of each. Field deformation methods may become

feasible, given that some optical ﬂow algorithms are being implemented as neural networks and that loss functions

can call other neural networks without a problem. We believe the feasibility of that approach depends on how fast,

reliable, and accurate those implementations can be in meteorological applications. Current features based methods are

likely the hardest to implement in loss functions, due to their extreme discontinuities when extracting features. Thus

we expect at most some very simple approximations of feature based methods to potentially become feasible for loss

function implementation.

1.2 Organization of this document

The remainder of this document is organized as follows.

Section 2

provides information on this guide, its intended audience, and the coding environment used for the examples.

Section 3

covers introductory material that can be easily found through online resources and in literature. In contrast,

the material covered from Section 4 on goes beyond what is easily accessible through other sources.

Section 4 looks into common implementation pitfalls.

Section 5

discusses how to feed additional information into the loss functions, such as parameters or supplementary

information about each sample.

Section 6 dives into which kinds of functions can be used in loss functions, including unusual but powerful examples,

such as how to use existing neural network layer functionality in loss functions. It also discusses how to properly

include conditions in loss functions.

Section 7

uses all of these concepts to generate several loss functions for environmental science applications, including

loss functions for semantic segmentation (IOU, Dice, Tversky coefﬁcients) and an implementation of the fractions skill

score.

Section 8

discusses important loss concepts that have not yet been deeply explored, but that we believe deserve

more attention, including incorporating physical constraints, focal loss, robust loss, adaptive loss, the discriminator in

generative adversarial networks, and perceptual loss.

Section 9 provides insights and practical examples from practitioners in environmental science.

Section 10 provides some concluding comments.

2 About this guide and its intended audience

This guide was developed by scientists who work at, or collaborate closely with, the Cooperative Institute for Research

in the Atmosphere (CIRA) at Colorado State University. It is simply a collection of the lessons we learned trying to

implement meaningful custom loss function for our own applications in the environmental sciences. When writing this

CIRA GUIDE TO CUSTOM LOSS FUNCTIONS

document we asked ourselves the following question: “What knowledge do we have now that we wish we had when we

started developing custom loss functions?” This document is thus not written from a computer science perspective, but

from a practitioner’s perspective. Namely, it is written by scientists working in the environmental sciences for scientists

working in the environmental sciences.

Disclaimer:

The contents herein represent our understanding of loss functions to the best of our abilities. However,

there are bound to be some (hopefully minor) mistakes somewhere. We thus offer the contents, including the code

snippets, “as is,” without warranty, and disclaim liability for damages resulting from its use.

2.1 How you can contribute to this effort

There are several ways in which you, the reader, can help and contribute:

•

If you ﬁnd any errors (there are bound to be some) or have any other suggestions for improvements, please let

us know. We intend to ﬁx any bugs in future versions of this document to be posted here and would mention

you for bug ﬁxes, etc. Please email the ﬁrst author with any comments you may have.

•

Do you have loss functions to contribute? We would be happy to add them to the Github repository (with

acknowledgements) and, if of interest, to this guide.

•

Are you a PyTorch user and willing to translate some of the examples over to PyTorch? We would be happy to

include those in the Github repository - with proper acknowledgement of course.

Please cite us:

If you ﬁnd this guide useful for your work, please cite it. Citing it will help us get recognized for our

effort and help us stay motivated to post updated versions from which the entire community may beneﬁt. With your

help, hopefully, we can move the entire community forward on this important topic.

2.2 Coding Environment

All discussions and examples here are written based on using Python and Keras with TensorFlow 2 backend. Tensor-

Flow/Keras and PyTorch appear to be the most common frameworks used in the meteorological community at the time

this manuscript is being written, and it should not be very difﬁcult to transfer the examples provided here in TensorFlow

over to PyTorch, as the basic concepts are the same. Namely, what is possible to code in TensorFlow/Keras should be

possible to code in PyTorch in a similar way. On the other hand, many of the pitfalls described here might be unique to

TensorFlow, so are less likely to transfer.

When using Keras we highly encourage the use of the functional application programming interface (API) to deﬁne

neural networks (see

https://www.tensorflow.org/guide/keras/functional

). In contrast to the sequential

(old style) API in Keras, the functional API in Keras provides much more ﬂexibility, including for the deﬁnition of

custom loss functions. This document assumes that the functional API is used, although many examples will also work

with the sequential API.

TensorFlow functions usually appear as

tf.function_name

in the code. It is always assumed that

is deﬁned

accordingly before. The equivalent holds for Keras functions, which appear as

K.function_name

. This can be

achieved as follows:

import tensorflow as tf

import tensorflow.keras.backend as K

Coding Styles:

Different examples provided here come from different contributors, which may have different styles of

coding. There are always many ways to do the same thing in Python/TensorFlow/Keras. We did not try to make the

styles uniform, because style is a question of taste and we think it is helpful to see different styles to choose from.

Color blocks:

Throughout this document code samples are indicated by a gray color block in the background. Pitfalls

and warnings are indicated by a cyan color block in the background.

GitHub repository:

All code samples included here are available in a GitHub repository at

https://github.com/

CIRA-ML/custom_loss_functions.

2.3 Commonly used acronyms and expressions

AI: Artiﬁcial intelligence

ML: Machine learning

CIRA GUIDE TO CUSTOM LOSS FUNCTIONS

NN: Neural network

CNN: Convolutional Neural network

NN parameters:

The model parameters of a neural network that are learned during training, namely the weights and

biases of all layers.

GOES: Geostationary Operational Environmental Satellite, see [10] for more information.

2.4 Other resources

The TensorFlow and Keras online documentation provide key information on custom loss functions and pointers to

many relevant parts of that documentation are provided throughout this document. In terms of books, we recommend

Geron’s machine learning book [

], Chapter 12 of which discusses custom elements for TensorFlow models, including

custom metrics, custom loss functions, custom layers, custom activation functions, and much more. Finally, web forums

remain a crucial source of information to ﬁnd solutions for speciﬁc problems that others already discovered.

3 Introductory material

This section provides some introductory material, such as the general purpose of loss functions, metrics, and lambda

layers, and introductory material speciﬁc to custom loss functions, such as ﬁrst simple examples and how to save and

load a model that contains a custom loss function. The type of material covered in this section can easily be found in

many online resources and in literature, and is included here for convenience. Many readers can probably skip this

section.

3.1 Custom elements: metrics, loss functions, and lambda layers

Neural network environments, such as TensorFlow and PyTorch, provide many ready-to-use elements, such as standard

loss functions, standard metrics, and standard neural network layers (e.g., convolutional and pooling layers). While for

many applications these standard elements are sufﬁcient, we often ﬁnd that for environmental science applications it is

useful to go beyond these ﬁxed elements. Using custom elements allows us to do so. Before diving into these custom

elements, here are a few words about the role of these custom elements for neural network development.

Loss functions:

The loss function describes what exactly we want the neural network to minimize. Choosing a loss

function is thus critical to truly optimize what we care about the most. Signiﬁcant time and effort should be dedicated to

deﬁning what is most important to the problem and to choose a corresponding loss function. For example, to compare an

observed and estimated satellite image, is a pixelwise comparison using MSE truly the best option? In some applications

that might be sufﬁcient, but in many cases that is sub-optimal. Furthermore, loss functions, because they are being

optimized through gradient descent during the training, must be differentiable. They must also execute extremely fast to

allow for fast training of the neural network.

Metrics:

While we can only deﬁne a single loss function, we can use as many metrics as we please while training the

neural network. The metrics are not optimized, but they allow us to track other criteria during and after the training

process to spot problems or weaknesses of the trained network early on. For example, for a binary classiﬁcation

problem one might choose cross-entropy as the loss function, then simultaneously track forecast veriﬁcation metrics –

such as hits, misses, false alarms, and correct negatives – as auxiliary metrics during training. These metrics allow to

track whether and when the neural network learns to perform well for rare events. (We will see later that we can also

minimize hits, etc., directly in the loss function if we care to do so.)

Another common scenario is that we might choose a loss function that is a sum of several different criteria, e.g., one

criterion focusing on general performance of the network across all samples and another focusing on performance

speciﬁcally for rare events. Whenever we use a sum in the loss function, we want to know how much each part of the

sum contributes to the loss function, so in this case we recommend to assign a metric for each part of the loss function -

in the example above that would be one for general performance, one for rare events. Furthermore, whenever L

or L

regularization is used in TensorFlow layers, the L

or L

penalty value is automatically added to the value of the loss

function itself, and the sum of those values is returned as a combined loss value. To disentangle these penalties from the

rest of the loss, we recommend to deﬁne a metric that is identical to the loss function, so that it is easy to determine how

much of the reported loss is due to the actual loss function (value returned by the metric) and how much is due to the

regularization penalty (combined loss value minus the metric value).

In summary, we highly recommend making ample use of the ability to track any number of metrics simultaneously to

learn as much as possible about the performance of the network from many different viewpoints, i.e., tracking a broad

剩余36页未读，继续阅读

评论收藏

内容反馈

版权申诉

易小侠

粉丝: 6625
资源: 9万+

CIRA环境科学中神经网络自定义损失函数指南第1版_CIRA Guide to Custom Loss Functions fo

Experimenting_With_Custom_Loss_Functions:我们将构建一个简单的线性回归神经网络以测试自定义函数

自定义损失函数长短期神经网络，自定义损失函数LSTM神经网络（代码完整，数据齐全，公式齐全）

CustomLoss

UNB入侵检测数据库 CIRA-CIC-DoHBrw-2020数据集

matlab开发-CIRAAtmosphere

CIRA大气：大气条件达2000公里-matlab开发

cira:Cira算法交易变得容易。 来自羊驼市场的羊驼贸易API的一个更简单的库

卫星数据下载网址汇总

SLIDER-cli:RAMMBCIRA SLIDER卫星图像查看器的命令行实现和Golang库

Python库 | iri2016-1.5.1.tar.gz

Jira to Trello-crx插件

元器件应用中的CIC系列CIC9102E集成电路实用检测数据

什么是近场辐射热传递？ ：在本教程中，我们解释了近场辐射传热 (NFRHT) 的基本原理和局限性。 [测试版]-matlab开发

mps:Management Presence Server（MPS）是与云无关的微服务，可通过Internet管理具有Intel:registered:AMT的平台

node-env-express-benchmark

NRLMSISE-00 大气模型：计算地球大气中从地面到热层高度的中性温度和密度。-matlab开发

DoHlyzer_fixed

激光雷达光束自动准直系统设计与实现

飞机发动机结冰研究进展 (2011年)

Tom's Virtual Twm:带多主机平移桌面的小型Viirtual Window Manager-开源

bgumbel

MSIS-86 大气模型：热层的中性密度和温度-matlab开发

OpenCV人脸识别数据集

BugCode.rar

Cobalt Strike下载

计算机系统-笔记-HUN2021级

北京邮电大学计算机考研复试笔试资料

cs1.6老版本供下载

合成孔径雷达的经典成像算法cs(matlab)仿真代码（吐血整理，内容全，注释全）

港大CS（MSC）面试整理

最新资源

cira:Cira算法交易变得容易。来自羊驼市场的羊驼贸易API的一个更简单的库

什么是近场辐射热传递？：在本教程中，我们解释了近场辐射传热 (NFRHT) 的基本原理和局限性。 [测试版]-matlab开发