没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Computers and Electronics in Agriculture 222 (2024) 108988
Available online 14 May 2024
0168-1699/© 2024 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Contents lists available at ScienceDirect
Computers and Electronics in Agriculture
journal homepage: www.elsevier.com/locate/compag
Original papers
AgroCounters—A repository for counting objects in images in the
agricultural domain by using deep-learning algorithms: Framework and
evaluation
Guy Farjon
∗
, Yael Edan
Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Beer Sheva, Israel
A R T I C L E I N F O
Keywords:
Deep learning
Counting framework
Computer vision
Visual counting
Agriculture
Precision agriculture
Guidelines
A B S T R A C T
AgroCounters is an open-source repository for counting objects in images in the agricultural domain by
utilizing deep-learning algorithms. In this paper, we present the framework of AgroCounters, which integrates
state-of-the-art deep learning models, including regression-based counting, detection-based counting, and
density-estimation-based counting, to accurately count various agricultural objects, such as fruits, vegetables,
and livestock, in single images. The framework utilizes transfer learning techniques to optimize model
performance on the limited labeled data available in the agricultural domain. We provide an open-source
implementation of AgroCounters, which includes a multitude of algorithms for counting applications and a
toolbox that includes metrics, training data tools, visualizations, and a simple installation guide for several
open-source implementations of counting methods. We evaluated the performance of AgroCounters on multiple
agricultural datasets acquired from RGB sensors, including plant leaves, melons, wheat grains, cherry tomatoes,
grapes, apple flowers, bananas (fruit and leaves), pears, and chickens. We compared the results of the various
implemented methods over these datasets and showcased the most suitable solution for each. YOLOv5, the
most recent of the compared object detectors, provided the best results on all the examined datasets, and
there was no clear ’winner’ between Faster-RCNN and RetinaNet. Based on the analyzed datasets, when higher
accuracy is required, the direct regression network (DRN) should be used; for small datasets, multiple scale
regression (MSR) gives superior results. Based on the developments, we proposed guidelines for developing
deep-learning-based counting solutions for agricultural applications, focusing on solutions and best practices
for the agricultural domain. Overall, AgroCounters presents a promising solution for automated counting in
the agricultural domain, offering significant potential for reducing manual labor, improving crop management,
and increasing productivity.
1. Introduction
Object counting is essential in a variety of agricultural tasks for
monitoring plants and livestock. Traditionally, counting was performed
manually, with all its inherent drawbacks, namely, manual counting is
costly, time consuming, laborious, subjective, and error prone. Recent
developments in computer vision and machine learning tools, have
led to the development of automatic counting algorithms that have
been incorporated into many plant and livestock monitoring and cul-
tivation applications. Plant applications include chemical thinning of
apple trees by monitoring the number of flowers during the blooming
season (Farjon et al., 2020), monitoring plant health by estimating the
number of leaves (Setyawan et al., 2020), estimating yield potential
by counting the number of fruits (Bargoti and Underwood, 2017),
detecting diseases and pests (Durmuş et al., 2017; Liu and Wang, 2021),
∗
Corresponding author.
E-mail addresses: guyfar@post.bgu.ac.il (G. Farjon), yael@bgu.ac.il (Y. Edan).
phenotyping plant stress (Singh et al., 2018), and more. Livestock appli-
cations include efficient livestock monitoring (Xu et al., 2020), disease
detection (Lee et al., 2017), individual animal identification (Bezen
et al., 2020), monitoring animal behavior (Chen et al., 2020), and more.
Automatic counting can be achieved by using classic computer vi-
sion tools in a generic feature extraction pipeline, following a machine
learning algorithm (such as support vector machines (SVM), random
forest, Gaussian mixture models (GMM), etc.) to perform the counting
task. Pipelines such as these have been demonstrated in many academic
papers (Gutiérrez et al., 2019; Albuquerque et al., 2019; Alharbi et al.,
2018; Bao et al., 2023; Kim et al., 2018; Syazwani et al., 2022; Zhang
et al., 2020). The main limitation of these approaches is that they
cannot be generalized, mainly due to significant variability issues in
the agricultural domain Farjon et al. (2023). During the past decade,
https://doi.org/10.1016/j.compag.2024.108988
Received 7 July 2023; Received in revised form 20 April 2024; Accepted 25 April 2024
Computers and Electronics in Agriculture 222 (2024) 108988
2
G. Farjon and Y. Edan
deep learning has become dominant in many, if not all, computer
vision applications (Krizhevsky et al., 2017; Kingma and Ba, 2014;
He et al., 2016; Redmon et al., 2016; Ren et al., 2015), and in agri-
culture, deep learning has also advanced significantly (Kamilaris and
Prenafeta-Boldú, 2018).
The paper is organized as follows: in Section 2 we discuss the
different counting methods, and in Section 3, we present the counting
methods, datasets, and evaluation metrics. This is followed in Section 4
by a detailed description of the framework structure and the provided
tools. Section 5 is devoted to experimental results and discussion,
and general guidelines are presented in Section 6. Finally, concluding
remarks are presented in Section 7.
2. Related work
Counting objects in agriculture using deep learning-based computer
vision can be divided into three main approaches: direct regression,
detection-based counting, and density estimation, as follows.
1. Direct regression—Direct regression counters receive an input
image and directly output the number of objects in it. The
number of objects is the only guidance that the network re-
ceives. Direct regressors based on convolutional neural networks
(CNNs) are a natural choice when only image-level counts are
available. Since most CNN architectures (e.g., ResNet-50, VGG-
16, EfficientNet, etc.) are designed for classification tasks, the
top classification layer is replaced by a regression head, which
can contain fully connected layers, batch normalization, and
other components. This is a straightforward approach, which is
relatively easy to annotate and is fast for training and inference.
For example, in Bhattarai and Karkee (2022) the authors used
CountNet, which learned to count apple flowers and fruits based
only on image-level count annotations. In Dobrescu et al. (2017),
the authors used this approach to count leaves in potted plants.
The advantages of this method are thus its simplicity and the
requirement for less annotation compared to other methods.
However, a major disadvantage of direct regression is its inabil-
ity to explicitly identify objects in the image. It is best suited to
cases with a few objects, as having too many objects can confuse
the network and result in poor performance.
2. Detection-based counting—Counting using object detection is
performed by locating the objects in the image and counting
the number of occurrences. In object detection, the object is
detected using an axis-aligned bounding box. This is a natural
approach, since the final count estimate is the summation of
the found objects. Deep CNNs also excel in detecting objects,
namely, locating and classifying multiple occurrences of objects
in an image. Hence, by using state-of-the-art object detectors,
we can provide a count estimation based on their output. The
first step to successfully train an object detector is to collect and
annotate images surrounding each object instance with a bound-
ing box. However, annotating a complete dataset could result
in a considerable burden, since each object must be carefully
annotated, including occluded and small objects. Each object in
the image is annotated using five numbers; the first indicates the
object’s class and the remaining four are used for its location
in the image. Although the annotation procedure is much more
complex than providing a single count for each image, addi-
tional supervision yields good results. For example, the authors
in Hong et al. (2021) compared multiple detection-based meth-
ods to detect and count pests and insects caught in a pheromone
trap. In another example, Mosley et al. (2020) used YOLOv3 to
detect and count sorghum heads using aerial images captured
by an Unmanned Aerial Vehicle (UAV). This method enables
extracting the object for other tasks (if needed). For example,
the object location can be used for further processing of disease
detection, ripeness, or other tasks. One of the key benefits of
using this method is the availability of numerous off-the-shelf
implementations. However, this approach requires more anno-
tations, in terms of both examples and effort, compared to other
techniques. In addition, high accuracy is necessary for correct
counting, which is not always possible without a regression com-
ponent. In addition, the method may not be appropriate when
dealing with dense scenes, such as those in many agricultural
applications, as objects frequently occlude one another. Finally,
most detectors find objects using axis-aligned rectangles, leading
to poor localization for many target objects.
3. Density estimation—Counting objects using a density estimation
component is considered a regression task, in which the network
endeavors to estimate the number of objects by predicting a
heat-map representation of the image and then regressing the
number of objects from it. During training, a set [𝐼
𝑗
, 𝑃
𝑗
]
𝑁
𝑗=1
is fed
into the network, where 𝐼 denotes a collection of images, and
𝑃
𝑖
represents a collection of manually annotated points that in-
dicate the center of the target object. Each [𝑝
𝑘
∈ 𝑃
𝑗
]
𝑃
𝑗
𝑘=1
contains
two parameters that denote a pixel in the image. For example,
CentroidNet (Dijkstra et al., 2019) was developed as a density
estimation method. The author based their network on U-Net
and tested the results using potato plant images (captured by a
UAV) and microscopic images of cells. Tian et al. used a deep
learning-based density-estimation method to count pigs (Tian
et al., 2019). Density estimation-based counting involves a rela-
tively low annotation burden because the annotator is required
only to place a single dot at the object’s center. However, it
is necessary to place the center-dot annotation on the object’s
center of mass, which is not always easy to estimate, particularly
when objects are not round or when they are partially occluded.
However, in such cases with non-detected objects due to overlapped
objects the methods based on density estimation provide a good work
around solution.
Each approach thus has its merits and limitations, which differ in
several aspects:
1. Annotation effort—Annotation refers to the labeling of the data
used to train the model. This can be a time-consuming and
resource-intensive process. Some approaches require extensive
annotation, including manually labeling every object in each
image or providing detailed outlines of object boundaries. Such
approaches are often more accurate but require larger amounts
of labeled data to perform satisfactorily. Other approaches re-
quire minimal or no annotation, relying instead on unsupervised
learning or pre-training on large datasets. While these methods
require less annotation effort, they may be less accurate and
less applicable to the counting task. The trade-off between the
annotation effort and the resulting model accuracy is an impor-
tant consideration in the development of a deep learning-based
computer vision counting system.
2. Model complexity—This is evaluated in terms of the number
of hidden layers and parameters. A more complex model can
learn more intricate patterns, but may also be more prone to
overfitting and have higher computational requirements. The
complexity can range from relatively simple to extremely in-
tricate. For example, a simple direct regression model is less
complex than a detection-based counting model. The tradeoff
between model complexity and performance is an important
consideration in deep learning. Increasing the model’s complex-
ity can lead to improved performance on training data, but it
can also make the model more prone to overfitting. On the
other hand, reducing the model’s complexity will result in faster
models but will damage the model’s accuracy.
Computers and Electronics in Agriculture 222 (2024) 108988
3
G. Farjon and Y. Edan
3. Dataset size—The sample size is another important aspect to
consider in deep learning counting in images. A large and di-
verse dataset is necessary to train a deep-learning model ef-
fectively. This is because deep learning models typically have
many parameters and require a significant amount of data to
learn and generalize well. However, generating this data can
be challenging, as manually annotating large datasets can be
time consuming and costly. Furthermore, in agriculture it is dif-
ficult and sometimes infeasible to acquire large datasets. Some
researchers have explored transfer learning to overcome this
issue, in which pre-trained models are fine-tuned for a specific
task with a smaller dataset. This approach can significantly
reduce the required annotated data and improve the model’s
performance.
4. Additional processing of the target object—Detection and
segmentation-based counters can be useful for counting the num-
ber of objects in an image. Such counters also offer the ability
to further process the image data beyond simple counting. For
example, object detection can be used to identify the boundaries
of each object, making it possible to isolate and analyze specific
regions of interest within the image. Segmentation takes this
a step further by separating the objects from the background,
which can provide additional information on the spatial distribu-
tion of the objects within the image. By leveraging the additional
information provided by object detection and segmentation, it
may be possible to gain deeper insights into the characteristics of
the objects being counted, such as their size, shape, and position
relative to other objects in the scene. Density estimation is a
technique that estimates the density of objects in an image by
creating an object density map. It is well suited for counting
large quantities of objects, such as crowds in stadiums and cars
in large parking lots, and even for counting sheep (Xu et al.,
2022). However, it may not be the best method for counting
small quantities of objects, such as leaves. Nonetheless, certain
ideas from the density estimation approach can be implemented
to further process the target objects. Specifically, the concept
of a ‘‘heat map’’ is utilized in the proposed framework. This
demonstrates the utility of incorporating aspects of various
techniques to create a comprehensive and effective solution for
a specific task.
During the past decade, automatically counting objects in agricul-
ture has received increasing attention. For example, in Farjon et al.
(2023), the authors showed that a Scopus search using the words
‘‘counting’’ and ‘‘agriculture’’ yielded 112 papers for 2021 alone. Review
of object detection methods for counting was presented in Huang et al.
(2023), Other recent reviews of deep learning methods for counting
include for example oil palm tree counting (Kipli et al., 2023), counting
in UAVs images (Lu et al., 2023), and in banana orchards (Wu et al.,
2023).
Although there are various methods for counting objects in images,
as noted above, researchers usually do not thoroughly compare alterna-
tives; instead, they tend to use a single counting method for a specific
application and to conduct the evaluation on a single database. More-
over, many researchers use their own developed codes, mainly based
on different public implementations of state-of-the-art algorithms. Each
researcher thus develops his/her own code, many parts of which have
probably already been implemented in previous research.
Several problems arise from this methodological approach: (1) com-
paring different methods and improving upon existing methods is
difficult (if not impossible); (2) collaborating to solve a major issue is
out of reach; (3) developing the code takes longer; and (4) the same
methods may produce different results (when developed more than
once by different researchers).
Without the comparative evolution of multiple algorithms and a
thorough analysis of the results for many different datasets, it is not
certain that a model will generalize well to new data beyond the
training set. Thus, this paper aims to provide a framework for de-
veloping counting algorithms in agriculture. We present a repository
which we designate AgroCounters, with one (or more) algorithms from
each of the above approaches. For direct regression (see 1 above), we
implemented the straightforward method presented by Dobrescu et al.
(2017) and the multiple scale regression method from Farjon et al.
(2021). For detection-based counting (see 2 above), the Faster-RCNN
+ feature pyramid network (FPN), which is similar to Mask-RCNN
without the segmentation branch (He et al., 2017), RetinaNet (Lin
et al., 2017b) and YOLOv5 (Jocher et al., 2020) were implemented.
For density estimation-based counting (see 3 above), the detection and
regression network from Farjon et al. (2021) was implemented. All code
is open access with a detailed manual. Additionally, we demonstrated
the implementation of the repository for various agriculture test cases,
along with links to databases that can be used for future benchmark-
ing. Here, we present, in detail, the results of running the different
algorithms on the different datasets, acquired from RGB sensors.
3. Method
3.1. AgroCounters - an overview
The repository is based on several counting methods (detailed in
Section 3). The code implementation (detailed in Section 4) is demon-
strated with 10 different datasets (presented in Section 3.4), repre-
senting a wide range of imaging systems (RGB, RGB-D cameras with
different resolutions) and acquisitions systems (manual, robotic, UAV)
and conditions (indoors, in the field, controlled and non-controlled il-
lumination conditions). The datasets include objects of different shapes
[(long (e.g., melons, cucumber, bananas) vs. round (e.g., apples, cherry
tomatoes) vs. irregular (e.g., chieckns))], sizes (melons vs. wheat spikes
vs. cherry tomatoes), and different numbers and types of images (dif-
ferent seasons, different growing conditions). Commonly used metrics
are detailed in Section 3.5.
3.2. Counting methods
The following counting methods were implemented in the repos-
itory: direct counting, object detection-based counting, and a variant
of the density estimation method, termed detection and regression
network (DRN).
3.2.1. Direct regression
Two direct regression approaches were implemented. In the first,
we used different architectures with a regression head instead of the
top classification layer, as demonstrated by Dobrescu et al. (2017).
The architecture is shown in Fig. 1. The second approach, termed
multiple scale regression (MSR) (Farjon et al., 2021), aims to improve
the standard direct approach by taking into consideration counting
estimations in multiple scales. The MSR architecture is based on Reti-
naNet (Lin et al., 2017b) and is designed to find objects in different
scales. The MSR produces five counting estimations, one for each
level in the feature pyramid network (Lin et al., 2017a), followed
by a fusion component to yield a single estimation. The complete
architecture is shown in Fig. 2. In this work, we implemented two
variants of the MSR architecture. The first variant, referred to in this
paper as MSR
1
, predicts the final count estimation using only the 𝑃
3
pyramid layer. The loss function for MSR
1
was the 𝑙
2
loss function.
The second variant, referred to as MSR
2
, uses the five pyramid layers
representations and fuses the five estimates by using the maximum-
likelihood estimation (MLE). The loss function was the uncertainty loss
function taken from Kendall and Gal (2017). The annotations for these
methods were stored in a .𝑐𝑠𝑣 file, where each row holds the image
name and the number of objects in the image.
Computers and Electronics in Agriculture 222 (2024) 108988
4
G. Farjon and Y. Edan
Fig. 1. Direct regression. This architecture is the simplest counting implementation. It is based on a CNN backbone with a regression head instead of the original classification
head.
Fig. 2. Multiple scale regression (MSR) model. This architecture is based on the RetinaNet implementation. In this work, we use two variants: MSR
1
and MSR
2
. The first uses
only the 𝑃
3
pyramid layer to predict the final count estimation. It has fewer parameters than MSR
2
and therefore runs faster. MSR
2
uses each of the five pyramid layers and
attaches a counting sub-model for each level, where each sub-model outputs a counting estimation. Since there are five counting estimates, we used the MLE Gaussian fusion
method from Farjon et al. (2021) to output the final count estimation.
3.2.2. Detection based counting
As mentioned above, we used the current state-of-the-art object
detectors, Faster-RCNN (Ren et al., 2015) with FPN (Lin et al., 2017a),
RetinaNet (Lin et al., 2017b), and YOLOv5 (Jocher et al., 2020). These
methods were developed to produce the best results on benchmark
datasets (e.g., MS-COCO) and have an open-source code repository for
researchers and practitioners. Since these methods constitute different
approaches to solving the same task, we set out to explore them to
enable us to choose the one giving the best results. This work explores
the above mentioned methods and presents their results for datasets
collected from the agricultural domain.
The objective of the object detector is to locate all the objects and
classify them into the correct class. Usually, in agriculture problems,
there is only a single target class for a single application. The objects in
agriculture are sometimes highly occluded, with significantly different
lightning conditions and a minimal object-to-image area ratio. Thus,
even when using state-of-the-art detectors in agriculture, success is not
trivial. However, object detectors remain the most common counting
component for object counting in agriculture. A description of the
object detector used in the current study follows:
1. Faster-RCNN Fig. 3 - This architecture is a two-stage network.
The first stage of the detector is to find regions of interest (ROIs),
namely, regions that are likely to contain objects. Then, these
ROIs are further processed to answer two questions: Do they
contain an object, and to which class does it belong (𝐶 classes
classification)? A region proposal network (RPN) performs the
first stage (ROI extraction). Since objects vary in size and scale,
different shapes of ROIs are extracted using 𝑘 anchor boxes,
which are represented by differently shaped rectangles. The
RPN minimizes two loss functions, namely, binary classification
loss (obj./ not obj.) and a regression loss for corrections of
the anchor’s boxes. The extracted ROI then moves to the final
processing layers, which refine the anchor boxes further and
classify the ROI to 1 of 𝐶 classes. The implementation is taken
from Wu et al. (2019).
2. RetinaNet—This architecture is a single-stage detector, which
also introduced the focal loss function for dense object detec-
tion (Lin et al., 2017b). The implementation is taken from Wu
et al. (2019). The architecture is shown in Fig. 5. RetinaNet
consists of three main components:
Fig. 3. Faster-RCNN model. This algorithm is a two-step object detector. The first step
is to generate ROIs that are likely to contain objects. The second step is to classify the
ROI into the target class.
• Base network - This is a pre-trained network used as a
feature extractor. Images are fed into the base network and
then moved to the FPN (Lin et al., 2017a).
• FPN—The FPN is an architecture design for creating a
feature pyramid of the semantic features from the base
network. Using it, we obtain semantic features from high
layers of the network but in different resolutions. In this
way, the model can find objects of different scales. On top
of each of the five scales of the FPN, there are classification
and regression sub-models.
• Classification and regression sub-models—These sub-models
share weights across the five scales of the FPN to reduce
the computational burden. The classification sub-model is
designed to classify object proposals from the FPN into 𝐶
classes, and the regression sub-model predicts offset to the
bounding box dimensions.
3. YOLOv5—The main idea in You Only Look Once (YOLO) (Red-
mon et al., 2016) is that it runs the image on a CNN model
剩余17页未读,继续阅读
资源评论
tutu啊秋
- 粉丝: 1
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 西电计组课设:西电Web工程小组力作:西电健康管理系统-专注于学生体检项目管理与预约功能含web架构和应用设计需求分析报告
- Qt实现手机密码锁项目,小程序参考项目
- 自考-02141计算机网络技术教材整理的思维导图,涵盖考试第一章所有内容
- 使用Qt开发五子棋,小游戏参考编程项目
- 使用Qt开发俄罗斯方块游戏,小游戏参考项目
- 综合的能源系统中CAES代码
- 基于改进二进制粒子群算法的含需求响应机组组合问题研究代码
- 西门子STEP7与Windows系统及其选件包版本兼容性指南
- TaskEquipActivity;TaskEquipActivity;TaskEquipActivity
- 安卓少时诵诗书所所所所所所
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功