本科生毕业设计（论文）外文资料译文

（ 2019 届）

论文题目	基于Android的中医舌诊健康干预系统的设计与实现

外文资料译文规范说明

一、译文文本要求

1．外文译文不少于3000汉字；

2．外文译文本文格式参照论文正文规范（标题、字体、字号、图表、原文信息等）；

3．外文原文资料信息列文末，对应于论文正文的参考文献部分，标题用“外文原文资料信息”，内容包括：

1）外文原文作者；

2）书名或论文题目；

3）外文原文来源：

□出版社或刊物名称、出版时间或刊号、译文部分所在页码

□网页地址

二、外文原文资料（电子文本或数字化后的图片）：

1．外文原文不少于10000印刷字符（图表等除外）；

2．外文原文若是纸质的请数字化（图片）后粘贴于译文后的原文资料处，但装订时请用纸质原文复印件附于译文后。

指导教师意见：

指导教师签名：年月日

一、外文资料译文：

深层神经网络目标检测

概论

　　深度神经网络（DNNs）最近在图像分类任务上表现出了突出的性能。在这篇文章中，我们进一步深入探究使用DNNs进行目标检测的问题，这个问题不仅需要对物体进行分类，并且还需要对各种各样类别的物体进行精确定位。我们提出了简单但依然有效的将目标检测问题形式化为回归问题从而来对物体边界框进行定位。我们提出了一个多尺度推理程序（模型？），它可以通过应用少量网络层来产生高分辨率的具有小误差的目标检测。并在Pascal VOC上展示了当前最好方法的效果。

1 - 简介

　　随着我们更加关注完整图像的理解，更加精确和细致的目标识别变得越来越重要。在本文内容中，我们不仅关注图像的分类，同时也关注图像中包含的目标的类别的精确评估及其坐标，该问题成为目标检测问题。

　　目标检测的主要改进归功于目标表示以及机器学习模型的提升。一个当前最先进的检测系统突出例子是the Deformable Part-based Model（DPM）。它是基于仔细设计的表示以及目标在运动学上的分解而构建起来的图解模型。对图解模型进行分离的学习使得能够对于各种各样的目标类别构造高精度的基于部分（part-based）模型。

　　结合人工手工设计表示的浅层分离训练模型是关于目标分类的相关问题的最佳范例之一。然而，在过去几年里，深度神经网络（DNNs）作为有利的机器学习模型而出现了。

　　DNNs展示了与传统分类方法的主要区别。首先，它们的深度架构使得它们相比浅层网络能够学习更加复杂的模型。这种表示性以及健壮性训练算法使得模型能够脱离手工设计特征而学习到有效的目标表示。这通过了ImageNet的上千个类别的分类任务挑战进行了经验上的证明。

　　在这篇文章中，我们探索DNNs对于目标检测问题的能力，这其中我们不仅解决分类任务，也尝试进行精确目标定位。我们在此提出的这个问题是具有挑战性的，因为我们想要使用有限的计算资源在同一幅图像中检测出潜在的许多具有不同尺寸的目标实例。

　　我们提出一种形式化，其能够对于给定的一张图像预测出多个目标的边界框。更准确地说，我们将基于DNN的回归形式化为输出目标边界框的二进制掩码（以及部分盒子 and portions of the box as well），如图1所示。此外，我们实现了一个简单的边界框推理去从掩码中提取检测结果。为了提高定位的准确率，我们DNN掩码生成器应用在完整图像的多尺度表示以及小数量的大图片切片上，后续跟着一个精心制作的步骤（见于图2）。用这种方法，只需要经过一些DNN回归我们就获得了当前最佳的边界框定位性能。

图1：基于DNN的回归模型的目标检测框图

图2：在对多个尺度和大图像框的目标掩码进行回归后，我们进行目标边界框提取。获得的边界框通过在子图像（通过当前对象边界框进行裁剪）上重复应用相同的程序来进行确认。为了简洁起见，我们只显示了完整的对象掩码，但我们使用了所有五个对象的掩码。

　　在这篇文章中，我们证明了基于DNN的回归模型是有能力学习特征的，其不只是胜任分类任务，也能够捕获有效的几何信息。我们使用用来做分类任务的通用架构，而将其最后一层替换成一个回归层。是我们有几分惊喜同时也有效的是该网络在一定程度能够在编码的同时保持不变性并且也能够捕获目标的位置。

　　其次，我们介绍了多尺度边界框推理加上一个精心设计的步骤来产生精确的检测。用这种方法，我们能够应用DNN来预测一个低分辨率的受限于输出层尺寸掩码而在像素方面的准确率上只有少量的误差——每一张输入图片只应用了少量次数的该网络。

　　另外，现在的方法很简单。它不需要手工设计一个模型去捕获部分的以及它们之间明确的联系的信息。这种简单性的优点不仅容易适用于广泛的类别，而且对于更大范围的目标检测中显示了更好的性能——不管对于固定的还是变化的对象。在Sec. 7中在Pascal VOC挑战上比较了该模型与当前最先进的检测结果。

2 - 相关工作

　　目标检测的一个非常值得学习的案例是deformable part-based model，这是最突出的例子。这个方法综合了一系列的分离训练部分和一个称为形象化架构的模型（pictorial structure）。它可以理解为一个二层模型——parts为第一层，而star model为第二层。与统一的网络层DNNs相反的是，其工作是需要利用领域知识的——parts是基于手动设计Histogram of Gradients（HOG）的描述形式并且parts的架构是根据运动学上的动机设计的。

　　已经提出的基于部分的模型和传统模型的用于目标检测和分析的深度架构被称为组成模型，其中目标被表示为图像基元的网络层组成部分。一个显著的例子是And/OrAnd/Or图，其目标被建模为一棵树，其中AndAnd结点表示不同部分而OrOr结点表示同一部份的不同模式。类似于DNNs，And/OrAnd/Or图包括了多个层，其中低部分的层表示小部分的统一的图像基元，而高部分的层表示目标的部分。类似的组成模型比DNNs更容易进行解释。在另一方面，它们需要进行推理，而本文提出的DNN模型只需要单纯的前向传播而没有潜在变量需要推理。

　　检测任务的组成模型的更深层样例是基于片段作为基元的，使用Gabor过滤器或者大规模的HOG过滤器来关注其形状。这类方法由于训练的差异性而需要使用专门设计的学习程序从而具有传统上的挑战性。此外，在推理时间里，它们综合了自底向上和自顶向下的处理过程。

　　神经网络（NNs）能够被抽象考虑为组成模型，相比于上述模型，其结点相更具统一性和可推理性。将NNs应用于视觉问题上已经有数十年的历史了，其中卷积NNs成为了最具有代表性的样例。直到最近，这些模型才在大规模图像分类任务中以DNNs的形式出现。然而它们应用于检测是受限的。语法分析作为目标检测的更进一步的细节，已经尝试使用多层卷积网络。医学图像分析使用了DNNs进行处理。然而这两种方法都是使用NNs要么在每个像素上要么在超像素上作为局部部分分类。而我们的方法使用整张图片作为输入并且通过回归来进行定位。这样，这是一种NNs更加有效的应用。

　　与我们最相似的方法跟我们有着相似的高层次目标表示但是使用了不一样的特征、损失函数以及不用额外的机制去相同类别的多个实例间的联系从而使用了更小的网络。

3 - 基于DNN的检测

　　用于目标掩码的基于DNN的回归方法的核心如图1所示。基于这个回归模型，我们能够对于整个和部分目标生成掩码。一个单一的DNN回归能够给我们一张图片中多个目标的掩码。为了进一步提高定位的精度，我们将DNN定位器应用于一个小规模的大子窗口集合。整体的流如图2表示，并且在其下面进行解释。

4 - DNN检测回归

　　我们的网络基于[14]提出的卷积DNN。它一共包含了7个层，前面五个层为卷积层，最后两个层为全连接层。每一个成使用一个线性调整单元作为非线性变化。其中的三个卷积层有额外加入的最大池化层。更进一步的细节我们参考[14]。

　　我们应用上述的统一架构来进行定位。我们不是采用softmax分类器作为最后一层，而是使用了回归层，它可以产生一个对象的二进制掩码DNN(x;Theta;)isin;RNDNN(x;Theta;)isin;RN，其中Theta;Theta;是网络的参数，NN是全部像素的总数量。因为网络的输出具有固定的维度，我们对于一个固定尺寸N=dtimes;dN=dtimes;d预测一个掩码。在经过放缩到图像尺寸之后，二进制掩码表示了一个或者多个目标：如果这个像素位于一个给定类别的目标的边界框之内，其值应为1，否则为0。

　　这个网络被训练来最小化对于图片x预测一个真实边界掩码misin;[0,1]Nmisin;[0,1]N的L2L2误差：

minTheta;sum;(x,m)isin;D∥∥(Diag(m lambda;I)1/2(DNN(x;Theta;)minus;m))∥∥22minTheta;⁡sum;(x,m)isin;D‖(Diag(m lambda;I)1/2(DNN(x;Theta;)minus;m))‖22

　　其中的总和包括一个包含采用二进制掩码表示的目标边界框的训练集合D。

　　因为我们的基础网络是高度非凸的，因此我们的优化不能得到保证，所以有时候必须根据真实边界框掩码来对每一个输出使用不同的权重调整损失函数。一个直觉是，大多数的目标相比于图片尺寸显得很小，因此网络很容易被简单的解决方案所困，即为每一个输出都分配零值。为了避免这种不好的行为，通过一个参数lambda;isin;R lambda;isin;R 来增加真实边界框掩码中与输出相应的非零值的权重是有用的：如果lambda;lambda;选择一个小的值，输出中有真实数值0的部分的误差惩罚将小于真实数值为1的部分，因此这激励网络即时在信号很弱的情况下（目标很小）也尽力去预测非零的数值。

　　在我们的思想中，我们使用感受野为255times;255255times;255的网络，并且其输出预测一个尺寸为dtimes;ddtimes;d的掩码，其中d=24d=24。

5 - 通过DNN的掩码生成精确定位

　　虽然前面提到的方法有能力生成高质量的掩码，但是它们也有几个额外的挑战。第一，一个简单的目标掩码不能有效地区分不同物体。第二，由于输出尺寸的限制，我们生成的掩码比原始图像小了许多。举个例子，对于一张尺寸为400times;400400times;400的图像并且d=24d=24，每一个输出会关联到原始图像中的一个16times;1616times;16的单元，这使得定位一个目标不够精确，特别是对于一个小目标。最后，因为我们使用整张图像作为输入，小目标只会影响一小部分的输入神经元因此使得其很难去识别。在下面，我们解释我们怎么解决这些问题。

5.1 - 多个用于定位的掩码

　　为了解决多个接触目标（multiple touching objects），我们生成了多个掩码，每一个掩码代表了一整个目标或者目标的部分。因为我们最后的目的是去产生一个边界框，所以我们使用一个网络去预测目标边界框掩码以及四个额外的网络去预测边界框的四个部分：上下左右，可以被描述为mh,hisin;full,bottom,top,left,rightmh,hisin;full,bottom,top,left,right。这五个预测是过于完备的，但是也在一些掩码中帮助减少了不确定性以及解决错误。更进一步，如果两个相同类型的目标被放置于彼此旁边，然后当至少有两个生成五个掩码而不会使得目标合并就可以识别出它们。这将可以允许检测多个目标。

　　在训练的时候，我们需要将目标边界框转换为这五个掩码。因为掩码能够比原始图像小很多，我们需要根据网络输出的尺寸按比例缩小真实的边界框掩码。用T(i,j)T(i,j)表示网络输出(i,j)(i,j)预测的在图像中存在目标的矩形位置。这个举行的左上角在(d1d(iminus;1),d2d(jminus;1))(d1d(iminus;1),d2d(jminus;1))并且尺寸为d1dtimes;d1dd1dtimes;d1d，其中dd是输出掩码的尺寸并且d1,d2d1,d2是图像的高度和宽度。在训练过程中我们使用m(i,j)m(i,j)去预测边界框bb(h)bb(h)和T(i,j)T(i,j)相互覆盖的部分：

mh(i,j;bb)=area(bb(h)cap;T(i,j))area(T(i,j)) (

剩余内容已隐藏，支付完成后下载完整资料

Deep Neural Networks for Object Detection

Christian Szegedy Alexander Toshev Dumitru Erhan Google, Inc.

{szegedy, toshev, dumitru}@google.com

Abstract

Deep Neural Networks (DNNs) have recently shown outstanding performance on image classification tasks [14]. In this paper we go one step further and address the problem of object detection using DNNs, that is not only classifying but also precisely localizing objects of various classes. We present a simple and yet powerful formulation of object detection as a regression problem to object bounding box masks. We define a multi-scale inference procedure which is able to produce high-resolution object detections at a low cost by a few network applications. State-of-the-art performance of the approach is shown on Pascal VOC.

Introduction

As we move towards more complete image understanding, having more precise and detailed object recognition becomes crucial. In this context, one cares not only about classifying images, but also about precisely estimating estimating the class and location of objects contained within the images, a problem known as object detection.

The main advances in object detection were achieved thanks to improvements in object representations and machine learning models. A prominent example of a state-of-the-art detection system is the Deformable Part-based Model (DPM) [9]. It builds on carefully designed representations and kinematically inspired part decompositions of objects, expressed as a graphical model. Using discriminative learning of graphical models allows for building high-precision part-based models for variety of object classes.

Manually engineered representations in conjunction with shallow discriminatively trained models have been among the best performing paradigms for the related problem of object classification as well [17]. In the last years, however, Deep Neural Networks (DNNs) [12] have emerged as a powerful machine learning model.

DNNs exhibit major differences from traditional approaches for classification. First, they are deep architectures which have the capacity to learn more complex models than shallow ones [2]. This expressivity and robust training algorithms allow for learning powerful object representations without the need to hand design features. This has been empirically demonstrated on the challenging ImageNet classification task [5] across thousands of classes [14, 15].

In this paper, we exploit the power of DNNs for the problem of object detection, where we not only classify but also try to precisely localize objects. The problem we are address here is challenging, since we want to detect a potentially large number object instances with varying sizes in the same image using a limited amount of computing resources.

We present a formulation which is capable of predicting the bounding boxes of multiple objects in a given image. More precisely, we formulate a DNN-based regression which outputs a binary mask of the object bounding box (and portions of the box as well), as shown in Fig. 1. Additionally, we employ a simple bounding box inference to extract detections from the masks. To increase localization precision, we apply the DNN mask generation in a multi-scale fashion on the full image as well as on a small number of large image crops, followed by a refinement step (see Fig. 2).

In this way, only through a few dozen DNN-regressions we can achieve state-of-art bounding box localization.

In this paper, we demonstrate that DNN-based regression is capable of learning features which are not only good for classification, but also capture strong geometric information. We use the general architecture introduced for classification by [14] and replace the last layer with a regression layer. The somewhat surprising but powerful insight is that networks which to some extent encode translation invariance, can capture object locations as well.

Second, we introduce a multi-scale box inference followed by a refinement step to produce precise detections. In this way, we are able to apply a DNN which predicts a low-resolution mask, limited by the output layer size, to pixel-wise precision at a low cost – the network is a applied only a few dozen times per input image.

In addition, the presented method is quite simple. There is no need to hand design a model which captures parts and their relations explicitly. This simplicity has the advantage of easy applicability to wide range of classes, but also show better detection performance across a wider range of objects – rigid ones as well as deformable ones. This is presented together with state-of-the-art detection results on Pascal VOC challenge [7] in Sec. 7.

Related Work

One of the most heavily studied paradigms for object detection is the deformable part-based model, with [9] being the most prominent example. This method combines a set of discriminatively trained parts in a star model called pictorial structure. It can be considered as a 2-layer model – parts being the first layer and the star model being the second layer. Contrary to DNNs, whose layers are generic, the work by [9] exploits domain knowledge – the parts are based on manually designed Histogram of Gradients (HOG) descriptors [4] and the structure of the parts is kinematically motivated.

Deep architectures for object detection and parsing have been motivated by part-based models and traditionally are called compositional models, where the object is expressed as layered composition of image primitives. A notable example is the And/Or graph [20], where an object is modeled by a tree with And-nodes representing different parts and Or-nodes representing different modes of the same part. Similarly to DNNs, the And/Or graph consists of multiple layers, where lower layers represent small generic image primitives, while high

剩余内容已隐藏，支付完成后下载完整资料

资料编号：[275940]，资料为PDF文档或Word文档，PDF文档可免费转换为Word

您需要先支付 30元 才能查看全部内容！立即支付

免费ai写开题、写任务书：免费Ai开题 | 免费Ai任务书 | 免费降AI率 | 免费降重复率 | 论文免费排版

注册

找回密码

基于Android的中医舌诊健康干预系统的设计与实现外文翻译资料

Abstract

Introduction

Related Work

您可能感兴趣的文章

最新文档

联系我们

登录

注册

找回密码

Abstract

Introduction

Related Work

您可能感兴趣的文章

最新文档

联系我们