基于TensorFlowLite物体识别安卓APP的设计与实现.zip

共2000个文件

flat：1052个

xml：277个

json：249个

人工智能

深度学习

tensorflow

80 浏览量 2024-03-27 16:52:42 上传评论收藏 96.3MB ZIP 举报

在本项目中，我们主要探讨的是如何利用TensorFlow Lite这一轻量级的机器学习框架，在Android平台上设计并实现一个物体识别应用。TensorFlow Lite是Google为移动和嵌入式设备优化的TensorFlow版本，它允许开发者将训练好的深度学习模型部署到手机、IoT设备等资源受限的平台上。我们要理解物体识别的基本概念。物体识别是计算机视觉领域的一个关键任务，其目标是识别图像中的特定对象或类别。在这个项目中，我们将关注的是花卉识别，即让应用能够识别出不同种类的花朵。 TensorFlow Lite提供了丰富的模型库，包括预训练的物体识别模型，如SSD（Single Shot Multibox Detector）和YOLO（You Only Look Once）。在本案例中，可能使用的是经过微调的InceptionV3或者MobileNetV2模型，这些模型在ImageNet数据集上进行过训练，具有较高的识别精度。开发流程通常包括以下几个步骤： 1. **模型选择与转换**：我们需要选择一个适合物体识别的模型，并将其转换为TensorFlow Lite格式。这可以通过TensorFlow的`tf.lite.TFLiteConverter` API完成，将训练好的.pb或.h5模型文件转换成.tflite文件，以便于在移动端运行。 2. **模型优化**：为了在资源有限的设备上高效运行，我们可能需要对模型进行量化、剪枝等优化操作。例如，通过量化将模型的浮点运算转换为整数运算，可以显著减少模型大小和计算需求。 3. **Android集成**：在Android应用中，我们需要集成TensorFlow Lite库，并创建一个能够加载和执行模型的环境。这通常涉及到在Android Studio中添加依赖项，以及编写Java或Kotlin代码来加载.tflite文件并执行推理。 4. **图像处理**：在实际应用中，我们需要捕获设备摄像头的图像，对其进行预处理，比如缩放、裁剪、归一化等，以满足模型输入的要求。Android的Camera2 API可以用于获取高质量的实时图像流。 5. **识别逻辑**：将预处理后的图像送入模型进行预测，获取物体识别的结果。然后，我们可以根据模型的输出，比如概率最高的几个类别，来显示识别结果。 6. **用户体验**：为了提供良好的用户交互，我们需要设计合适的UI界面，展示识别结果，并可能包含错误处理和反馈机制。 7. **性能测试与调试**：对应用进行性能测试，确保在各种设备上的运行速度和准确性。可能需要进行多次迭代优化，以达到理想的性能表现。 "基于TensorFlow Lite物体识别安卓APP的设计与实现"是一个涵盖深度学习模型部署、移动平台优化、计算机视觉和Android应用开发等多个领域的综合性项目。通过这样的实践，开发者可以深入理解如何将前沿的AI技术应用到实际产品中，为用户提供智能且便捷的服务。

资源推荐

资源详情

资源评论

收起资源包目录

基于TensorFlow Lite物体识别安卓APP的设计与实现.zip （2000个子文件）

1rGVcemo8Tt1MvOmh+34YFzN5io= 12KB

_KnS9Jpamo6GkpyX5cnJfZShqgY= 93KB

resources-debug.ir.ap_ 10.23MB

resources-debugAndroidTest.ap_ 473KB

resources-debug.ap_ 352KB

app-debug.apk 12.2MB

dependencies.apk 1.24MB

slice_0.apk 87KB

slice_7.apk 32KB

slice_4.apk 26KB

slice_6.apk 7KB

slice_1.apk 4KB

slice_3.apk 4KB

slice_9.apk 4KB

slice_5.apk 4KB

slice_2.apk 3KB

slice_8.apk 3KB

aqYXBol0uGePU9JROCgkGc9PEaA= 55KB

fileSnapshots.bin 1.94MB

classAnalysis.bin 1.85MB

jarAnalysis.bin 914KB

fileHashes.bin 256KB

taskHistory.bin 81KB

taskHistory.bin 59KB

taskJars.bin 21KB

resourceHashesCache.bin 20KB

last-build.bin 1B

built.bin 0B

BUILD 1KB

BUILD 410B

AutoFitTextureView.class 39KB

R$styleable.class 39KB

R$styleable.class 38KB

Camera2BasicFragment.class 33KB

CameraActivity.class 31KB

R$styleable.class 30KB

R$style.class 23KB

R$styleable.class 22KB

R$style.class 22KB

Camera2BasicFragment.class 21KB

R$attr.class 15KB

R$attr.class 13KB

Camera2BasicFragment$ErrorDialog.class 12KB

R$attr.class 10KB

ImageClassifier.class 10KB

R$styleable.class 9KB

R$dimen.class 8KB

ImageClassifier.class 7KB

R$id.class 6KB

R$drawable.class 6KB

Camera2BasicFragment$6.class 6KB

R$dimen.class 6KB

R$id.class 6KB

R$drawable.class 6KB

R$drawable.class 5KB

R$color.class 5KB

共 2000 条

MobileNets: Efﬁcient Convolutional Neural Networks for Mobile Vision

Applications

Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko

Weijun Wang Tobias Weyand Marco Andreetto Hartwig Adam

Google Inc.

{howarda,menglong,bochen,dkalenichenko,weijunw,weyand,anm,hadam}@google.com

Abstract

We present a class of efﬁcient models called MobileNets

for mobile and embedded vision applications. MobileNets

are based on a streamlined architecture that uses depth-

wise separable convolutions to build light weight deep

neural networks. We introduce two simple global hyper-

parameters that efﬁciently trade off between latency and

accuracy. These hyper-parameters allow the model builder

to choose the right sized model for their application based

on the constraints of the problem. We present extensive

experiments on resource and accuracy tradeoffs and show

strong performance compared to other popular models on

ImageNet classiﬁcation. We then demonstrate the effective-

ness of MobileNets across a wide range of applications and

use cases including object detection, ﬁnegrain classiﬁca-

tion, face attributes and large scale geo-localization.

1. Introduction

Convolutional neural networks have become ubiquitous

in computer vision ever since AlexNet [19] popularized

deep convolutional neural networks by winning the Ima-

geNet Challenge: ILSVRC 2012 [24]. The general trend

has been to make deeper and more complicated networks

in order to achieve higher accuracy [27, 31, 29, 8]. How-

ever, these advances to improve accuracy are not necessar-

ily making networks more efﬁcient with respect to size and

speed. In many real world applications such as robotics,

self-driving car and augmented reality, the recognition tasks

need to be carried out in a timely fashion on a computation-

ally limited platform.

This paper describes an efﬁcient network architecture

and a set of two hyper-parameters in order to build very

small, low latency models that can be easily matched to the

design requirements for mobile and embedded vision ap-

plications. Section 2 reviews prior work in building small

models. Section 3 describes the MobileNet architecture and

two hyper-parameters width multiplier and resolution mul-

tiplier to deﬁne smaller and more efﬁcient MobileNets. Sec-

tion 4 describes experiments on ImageNet as well a variety

of different applications and use cases. Section 5 closes

with a summary and conclusion.

2. Prior Work

There has been rising interest in building small and efﬁ-

cient neural networks in the recent literature, e.g. [16, 34,

12, 36, 22]. Many different approaches can be generally

categorized into either compressing pretrained networks or

training small networks directly. This paper proposes a

class of network architectures that allows a model devel-

oper to speciﬁcally choose a small network that matches

the resource restrictions (latency, size) for their application.

MobileNets primarily focus on optimizing for latency but

also yield small networks. Many papers on small networks

focus only on size but do not consider speed.

MobileNets are built primarily from depthwise separable

convolutions initially introduced in [26] and subsequently

used in Inception models [13] to reduce the computation in

the ﬁrst few layers. Flattened networks [16] build a network

out of fully factorized convolutions and showed the poten-

tial of extremely factorized networks. Independent of this

current paper, Factorized Networks[34] introduces a similar

factorized convolution as well as the use of topological con-

nections. Subsequently, the Xception network [3] demon-

strated how to scale up depthwise separable ﬁlters to out

perform Inception V3 networks. Another small network is

Squeezenet [12] which uses a bottleneck approach to design

a very small network. Other reduced computation networks

include structured transform networks [28] and deep fried

convnets [37].

A different approach for obtaining small networks is

shrinking, factorizing or compressing pretrained networks.

Compression based on product quantization [36], hashing

arXiv:1704.04861v1 [cs.CV] 17 Apr 2017

Proprietary + Confidential

Landmark Recognition

Finegrain Classification

Object Detection

MobileNets

Photo by Sharon VanderKaay (CC BY 2.0)

Photo by Juanedc (CC BY 2.0)

Photo by HarshLight (CC BY 2.0)

Face Attributes

Google Doodle by Sarah Harrison

Figure 1. MobileNet models can be applied to various recognition tasks for efﬁcient on device intelligence.

[2], and pruning, vector quantization and Huffman coding

[5] have been proposed in the literature. Additionally var-

ious factorizations have been proposed to speed up pre-

trained networks [14, 20]. Another method for training

small networks is distillation [9] which uses a larger net-

work to teach a smaller network. It is complementary to

our approach and is covered in some of our use cases in

section 4. Another emerging approach is low bit networks

[4, 22, 11].

3. MobileNet Architecture

In this section we ﬁrst describe the core layers that Mo-

bileNet is built on which are depthwise separable ﬁlters.

We then describe the MobileNet network structure and con-

clude with descriptions of the two model shrinking hyper-

parameters width multiplier and resolution multiplier.

3.1. Depthwise Separable Convolution

The MobileNet model is based on depthwise separable

convolutions which is a form of factorized convolutions

which factorize a standard convolution into a depthwise

convolution and a 1 × 1 convolution called a pointwise con-

volution. For MobileNets the depthwise convolution ap-

plies a single ﬁlter to each input channel. The pointwise

convolution then applies a 1×1 convolution to combine the

outputs the depthwise convolution. A standard convolution

both ﬁlters and combines inputs into a new set of outputs

in one step. The depthwise separable convolution splits this

into two layers, a separate layer for ﬁltering and a separate

layer for combining. This factorization has the effect of

drastically reducing computation and model size. Figure 2

shows how a standard convolution 2(a) is factorized into a

depthwise convolution 2(b) and a 1 × 1 pointwise convolu-

tion 2(c).

A standard convolutional layer takes as input a D

× M feature map F and produces a D

× D

× N

feature map G where D

is the spatial width and height

of a square input feature map

, M is the number of input

channels (input depth), D

is the spatial width and height of

a square output feature map and N is the number of output

channel (output depth).

The standard convolutional layer is parameterized by

convolution kernel K of size D

×D

×M ×N where D

is the spatial dimension of the kernel assumed to be square

and M is number of input channels and N is the number of

output channels as deﬁned previously.

The output feature map for standard convolution assum-

ing stride one and padding is computed as:

k,l,n

i,j,m

i,j,m,n

· F

k+i−1,l+j−1,m

(1)

Standard convolutions have the computational cost of:

· D

· M · N · D

· D

(2)

where the computational cost depends multiplicatively on

the number of input channels M , the number of output

channels N the kernel size D

× D

and the feature map

size D

× D

. MobileNet models address each of these

terms and their interactions. First it uses depthwise separa-

ble convolutions to break the interaction between the num-

ber of output channels and the size of the kernel.

The standard convolution operation has the effect of ﬁl-

tering features based on the convolutional kernels and com-

bining features in order to produce a new representation.

The ﬁltering and combination steps can be split into two

steps via the use of factorized convolutions called depthwise

We assume that the output feature map has the same spatial dimen-

sions as the input and both feature maps are square. Our model shrinking

results generalize to feature maps with arbitrary sizes and aspect ratios.

separable convolutions for substantial reduction in compu-

tational cost.

Depthwise separable convolution are made up of two

layers: depthwise convolutions and pointwise convolutions.

We use depthwise convolutions to apply a single ﬁlter per

each input channel (input depth). Pointwise convolution, a

simple 1×1 convolution, is then used to create a linear com-

bination of the output of the depthwise layer. MobileNets

use both batchnorm and ReLU nonlinearities for both lay-

ers.

Depthwise convolution with one ﬁlter per input channel

(input depth) can be written as:

k,l,m

i,j

i,j,m

· F

k+i−1,l+j−1,m

(3)

where

K is the depthwise convolutional kernel of size

× D

× M where the m

ﬁlter in

K is applied to

the m

channel in F to produce the m

channel of the

ﬁltered output feature map

Depthwise convolution has a computational cost of:

· D

· M · D

· D

(4)

Depthwise convolution is extremely efﬁcient relative to

standard convolution. However it only ﬁlters input chan-

nels, it does not combine them to create new features. So

an additional layer that computes a linear combination of

the output of depthwise convolution via 1 × 1 convolution

is needed in order to generate these new features.

The combination of depthwise convolution and 1 × 1

(pointwise) convolution is called depthwise separable con-

volution which was originally introduced in [26].

Depthwise separable convolutions cost:

· D

· M · D

· D

+ M · N · D

· D

(5)

which is the sum of the depthwise and 1 × 1 pointwise con-

volutions.

By expressing convolution as a two step process of ﬁlter-

ing and combining we get a reduction in computation of:

· D

· M · D

· D

+ M · N · D

· D

· M · N · D

· D

MobileNet uses 3 × 3 depthwise separable convolutions

which uses between 8 to 9 times less computation than stan-

dard convolutions at only a small reduction in accuracy as

seen in Section 4.

Additional factorization in spatial dimension such as in

[16, 31] does not save much additional computation as very

little computation is spent in depthwise convolutions.

...

(a) Standard Convolution Filters

...

(b) Depthwise Convolutional Filters

...

text of Depthwise Separable Convolution

Figure 2. The standard convolutional ﬁlters in (a) are replaced by

two layers: depthwise convolution in (b) and pointwise convolu-

tion in (c) to build a depthwise separable ﬁlter.

3.2. Network Structure and Training

The MobileNet structure is built on depthwise separable

convolutions as mentioned in the previous section except for

the ﬁrst layer which is a full convolution. By deﬁning the

network in such simple terms we are able to easily explore

network topologies to ﬁnd a good network. The MobileNet

architecture is deﬁned in Table 1. All layers are followed by

a batchnorm [13] and ReLU nonlinearity with the exception

of the ﬁnal fully connected layer which has no nonlinearity

and feeds into a softmax layer for classiﬁcation. Figure 3

contrasts a layer with regular convolutions, batchnorm and

ReLU nonlinearity to the factorized layer with depthwise

convolution, 1 × 1 pointwise convolution as well as batch-

norm and ReLU after each convolutional layer. Down sam-

pling is handled with strided convolution in the depthwise

convolutions as well as in the ﬁrst layer. A ﬁnal average

pooling reduces the spatial resolution to 1 before the fully

connected layer. Counting depthwise and pointwise convo-

lutions as separate layers, MobileNet has 28 layers.

It is not enough to simply deﬁne networks in terms of a

small number of Mult-Adds. It is also important to make

sure these operations can be efﬁciently implementable. For

评论收藏

内容反馈

博士僧小星

粉丝: 2381
资源: 5995

基于TensorFlow Lite物体识别安卓APP的设计与实现.zip

Android&Tensorflow;目标检测

使用TensorFlow进行物体识别1

人工智能项目资料-基于TensorFlow lite的图像识别系统（Android实现）.zip

基于tensorflow serving的模型部署方案以及代码.zip

基于Tensorflow Lite的Mnist模型Android侧实现.zip

基于tensorflow的UI对象识别.zip

基于TensorFlow Lite物体识别安卓APP的设计与实现源码+文档+全部资料+高分项目.zip

基于 TensorFlow Lite 开发的 Android 端中文语音识别 Demo.zip

基于Tensorflow的人脸识别系统设计与实现.pdf

基于TensorFlow lite的图像识别系统（Android实现）.zip

树莓派上基于TensorFlow Lite的图像识别.zip

FlowerTF:使用TensorFlow Lite制作一个Android对象检测应用

Android-tensorflow在android上实现物体识别

基于TensorFlow的AI小游戏.zip

基于android的tensorflow人脸识别系统

基于tensorflow实现VAE.zip

(源码)基于TensorFlow Lite的关键词识别系统.zip

在Android使用深度学习模型实现图像识别（使用的框架如下：Tensorflow Lite、Paddle Lite等）.zip

TensorFlow Lite所有示例应用APK.zip

(源码)基于TensorflowLite的AI狗识别系统.zip

tensorflowlite 图像分类 安卓app

基于tensorflow的智能作曲.zip

基于谷歌tensorflow安卓例程1

基于TensorFlow的实践详解(含项目链接).zip

cameraX tensorflow lite项目.zip

基于Android 计步器app设计与实现源码数据库.zip

【Demo】基于 TensorFlow Lite 的 Android 端中文语音识别.zip

基于Android使用深度学习模型实现图像识别源码，框架Tensorflow Lite+Paddle Lite+TNN.zip

基于Android使用深度学习模型实现图像识别源码，框架Tensorflow Lite+MNN+TNN.zip

(源码)基于TensorFlow Lite Micro的语音识别系统.zip

最新资源

tensorflowlite 图像分类安卓app