NSD-SSD:ANovelReal-TimeShipDetectorBasedon

Research ArticleNSD-SSD A Novel Real-Time Ship Detector Based onConvolutional Neural Network in Surveillance Video

Jiuwu Sun 1 Zhijing Xu 1 and Shanshan Liang 2

1College of Information Engineering Shanghai Maritime University Shanghai 201306 China2School of Physics and Electronics Shandong Normal University Jinan 250358 China

Correspondence should be addressed to Zhijing Xu zjxushmtueducn

Received 27 May 2021 Revised 13 August 2021 Accepted 18 August 2021 Published 14 September 2021

Academic Editor Francisco Gomez-Donoso

Copyright copy 2021 Jiuwu Sun et al 0is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

With the rapid development of the marine industry intelligent ship detection plays a very important role in the marine trafficsafety and the port management Current detection methods mainly focus on synthetic aperture radar (SAR) images which is ofgreat significance to the field of ship detection However these methods sometimes cannot meet the real-time requirement Tosolve the problems a novel ship detection network based on SSD (Single Shot Detector) named NSD-SSD is proposed in thispaper Nowadays the surveillance system is widely used in the indoor and outdoor environment and its combination with deeplearning greatly promotes the development of intelligent object detection and recognition 0e NSD-SSD uses visual imagescaptured by surveillance cameras to achieve real-time detection and further improves detection performance First dilatedconvolution and multiscale feature fusion are combined to improve the small objectsrsquo performance and detection accuracySecond an improved prediction module is introduced to enhance deeper feature extraction ability of the model and the meanAverage Precision (mAP) and recall are significant improved Finally the prior boxes are reconstructed by using the K-meansclustering algorithm the Intersection-over-Union (IoU) is higher and the visual effect is better0e experimental results based onship images show that the mAP and recall can reach 893 and 936 respectively which outperforms the representative model(Faster R-CNN SSD and YOLOv3) Moreover our modelrsquos FPS is 45 which can meet real-time detection acquirement wellHence the proposed method has the better overall performance and achieves higher detection efficiency and better robustness

1 Introduction

With the rapid development of the shipping industry thereare more frequent human activities on the ocean in recentyears 0erefore robust ship detection is strongly needed tomeet the demand Currently ship detection is used in porttransportation management sea area monitoring over illegalactivities and ship abnormal behavior detection for navi-gation safety Modern radar target tracking equipment andship automatic identification systems are mainly based onpositioning and thus ship detection needs substantialimprovements In response to these problems many re-searchers have used traditional machine learning methods toexplore this field in search of better results For examplethey used features of ships combined with classifiers [1 2]Although these methods achieve good results they require

manual extraction of features and a classifier with goodperformance which needs further validation in terms ofefficiency and accuracy Fortunately the development ofdeep learning has enabled object detection to be widely usedin many scenarios such as surveillance security and au-tonomous driving In 2019 Jiao et al [3] provided a com-prehensive analysis of the current state and future trends ofdeep learning-based object detection Convolutional NeuralNetworks (CNN) can effectively learn the correspondingfeatures from massive samples which avoids the compli-cated feature extraction process and achieves higher accu-racy In 1998 Lecun et al [4] proposed LeNet-5 andachieved success in the recognition of handwritten char-acters Since then the performance of CNNs has beenimproved with the appearance of deeper and more complexCNNs such as AlexNet [5] VGGNet [6] GoogLeNet [7]

HindawiComputational Intelligence and NeuroscienceVolume 2021 Article ID 7018035 16 pageshttpsdoiorg10115520217018035

ResNet [8] and DenseNet [9] In 2020 Abdollahi et al [10]used a generative adversarial network (GAN) architecture toextract building footprints from high-resolution aerial im-ages However the algorithms of regular CNNs combinedwith feature pyramid networks (FPN) have become a newfocus in the field of object detection 0e object detectionalgorithms currently mainly include two technical routestwo-stage detection and one-stage detection 0e two-stagedetection is divided into two steps First obtain the regionproposals and then these region proposals are classified andregressed to get the final detection results Two-stage de-tectors mainly include R-CNN [11] SPP-Net [12] FastR-CNN [13] Faster R-CNN [14] andMask R-CNN [15] Forone-stage detection it treats the object detection problem asa regression problem A unified CNN completes the objectclassification and location which is an end-to-end targetdetection solution One-stage detectors mainly includeOverFeat [16] SSD [17] and YOLO [18ndash21] Many scholarsproposed improved YOLOv3 and SSD for object detectionand obtained outstanding detection performance [22 23]0e two-stage detection algorithm such as Faster R-CNN hashigh accuracy but its region proposal network (RPN) istime-consuming and therefore reduces the detection effi-ciency On the contrary although the YOLO series has agreat advantage in terms of detection speed they cannotachieve high accuracy

0e SSD is used as a one-stage detector and introduces amultiscale feature layer for object detection which has fasterdetection speed but accuracy needs to be improved In thispaper the SSD is applied to ship detection and severalimprovements are used to improve the overall performanceof the network

(1) To address the problem of poor performance of smalltarget detection we apply a dilated convolution on the low-level feature layer to expand the receptive field so that thelow-level feature layer can also contain more feature in-formation At the same time we perform multiscale fusionon the original feature layers after up-sampling so that thenetwork can make full use of the contextual information (2)We introduce a residual structure in the prediction moduleof the network to enable the network to extract deeperdimensional feature information for better classification andregression (3) We use the K-means clustering algorithm toreconstruct the prior bounding box so as to obtain a moresuitable scale and aspect ratio which can improve both thevisual effect and the efficiency of ship detection Finally wepropose a new SSD-based network called NSD-SSD whichis significantly better than the original SSD Compared withSSD and other detection networks the proposed networkprovides a good trade-off between real-time detection andaccuracy

0e rest of this paper is organized as follows In Section2 we introduce the related work of the ship detection InSection 3 we give detailed program of our proposed ap-proach Section 4 outlines the experimental results andcomparisons against other state-of-the-art methods Finallyconclusions are made in Section 5

2 Related Work

0is paper categorizes the previous work of ship objectdetection to traditional methods and deep learning methods

0e traditional detection methods include two types (1)Ship-radiated noise-based methods Kang et al [24] pro-posed a multiple classifier fusion algorithm based on many-person decision theory to identify ship radiated noise withaccuracy rate of over 96 Zhao et al [25] proposed adecision tree support vector machine (SVM) classificationmethod based on the ship-radiated noise multidimensionfeature vector for the measured radiated noise of three kindsof ship targets Luo and Wang [26] used the time-frequencyrange characteristics of ship noise to distinguish shiprsquos sternshiprsquos mid-aft and shiprsquos middle part to complete the po-sitioning and identification of ship targets Peng et al [27]proposed a ship-radiated noise model based on the wingerrsquoshigher-order spectrum for feature extraction (2) Shipstructure and shape characteristics-based methods Zhuet al [28] proposed a novel hierarchical method of shipdetection from spaceborne optical image based on shape andtexture features and this method can effectively distinguishships from nonships on the optical image dataset Liu et al[29] used segmentation and shape analysis to detect inshoreships and proved their method was effective and robustunder various situations Shi et al [30] proposed an ap-proach involving a predetection stage and an accurate de-tection stage to detect ships in a coarse-to-fine manner inhigh-resolution optical images Wang et al [31] proposed adetection method based on DoG (Difference of Gaussian)preprocessing and shape features to detect ship targets inremote sensing images

Most of the traditional methods use manually extractedfeatures which will lead to low efficiency and high timeconsumption At the same time even if a classifier with goodperformance is used to classify these features the accuracycannot meet the actual demand 0erefore the recognitionrate of these methods in complex environmental back-ground and multivessel classification is not ideal

0e deep learning detection methods with the boomdevelopment of deep learning many ship object detectionmethods based on deep CNN have been proposed Zou et al[32] proposed an improved SSD algorithm based onMobilenetV2 [33] and finally achieved better detection re-sults in three types of ship images Zhao et al [34] proposed anew network architecture based on the Faster R-CNN byusing squeeze and excitation for ship detection in SARimages Shao et al [35] proposed a saliency-aware CNNframework and coastline segmentation method to improvethe accuracy and robustness of ship detection under com-plex seashore surveillance conditions Nie et al [36] pro-posed an improved Mask R-CNN model which canaccurately detect and segment ships from remote sensingimages at the pixel level Guo et al [37] proposed a novelSSD network structure to improve the semantic informationby deconvoluting high-level features into a low-level featureand then fusing it with original low-level features and the

2 Computational Intelligence and Neuroscience

model performed well on both the PASCAL VOC andrailway datasets Huang et al [38] proposed a new networkby referring to the feature extraction layer of YOLOv2 andfeature pyramid network of YOLOv3 and the new networkmodel can detect seven types of ships Zhao et al [39]proposed the Attention Receptive Pyramid Network(ARPN) which detected multiscale ships in SAR images Liet al [40] proposed a new method combining the SaliencyEstimation Algorithms (SEAs) and the Deep CNN (DCNN)object detection to ensure the extraction of large-scale shipsIn 2021 Zhao et al [41] proposed a feature pyramid en-hancement strategy (FPES) and a cascade detection mech-anism to improve SSD and the improved model can beapplied to vehicle detection quickly and efficiently

In short although the existing ship target detectionmethods have made major breakthroughs they still havecertain limitations Firstly the low-level feature map con-tains less semantic information but can accurately presentthe location of the target In contrast high-level featuremapscontain rich semantic information but cannot accuratelydisplay the location of objects In addition the previousmethods cannot extract the features of small objects well Inthis paper we use a multiscale feature fusion algorithmwhich considers the ability of the entire network to combinethe context information and improve small target detectionperformance In addition we have also improved the pre-diction module and the settings of prior boxes Finally wetest the improved model on the ship dataset

3 Materials and Methods

31 Single-Shot Multibox Detector Figure 1 shows the SSDnetwork structure diagram with a backbone network VGG-16 VGG-16 has stable network structure and good featureextraction capabilities 0e SSD network converts FC6 andFC7 in VGG-16 into convolutional layers removes allDropout layers and FC8 layers and adds four additionalconvolutional layers Conv6 Conv7 Conv8 and Conv90efeature pyramid structure is to detect objects of differentsizes In the process of detection a large number of priorboxes are usually generated and these prior boxes havemultiple predefines scales and ratios Finally it is required toapply a Nonmaximum Suppression (NMS) process to obtainthe final test results 0e biggest advantage of the SSDnetwork is that classification and regression are carried outat the same time which improves the detection speedcompared with other models such as Faster R-CNN

32 Our Proposed Network 0e overall architecture of theNovel Ship Detection SSD (NSD-SSD) is shown in Figure 2From the figure the architecture mainly is formed by threeparts a dilated convolution layer a multiscale feature fusionlayer and a prediction layer In addition the prior boxes arereconstructed within this network Ship images are sent tothe NSD-SSD network for a series of operations and finallythe specific location and type of ship can be obtained

To understand the features extracted by the networkmore clearly a visualization of the feature maps is given in

Figure 3 In the figure from left to right the input image thefeature maps extracted by SSD and the feature mapsextracted after feature layer fusion are shown From thefigure we can see that the feature maps extracted by theoriginal SSD network lack rich semantic information Forexample the main characteristics of the low-level featurelayer Conv4_3 are small perceptual field and too poor abilityto extract target features However after the dilated con-volution and features fusion the feature information of thetarget is greatly enriched Similarly all other scale layers alsoextract a large amount of meaningful contextual informationafter feature fusion which greatly improves the accuracy ofobject detection

321 Dilated Convolution Layer Traditional SSD networkmainly uses low-level feature layer Conv4_3 to detect smallobjects However due to insufficient feature extraction in theConv4_3 layer the detection effect of small objects is notideal To address this issue we use dilated convolution tomap high-dimensional features to low-dimensional input Inthis paper we choose the lower-level feature layer Conv3_1for dilated convolution and merge it with Conv4_3 forfeature fusion In this way the range of the receptive field canbe enlarged without loss of image detail information andobtains more global information

Dilated convolution is to inject dilation on map of thestandard convolution to increase the receptive field 0edilated convolution has another hyperparameter called thedilation rate which refers to the number of intervals of theconvolution kernel Assuming that the original convolutionkernel is f and the dilation rate is α the new convolutionkernel size n after dilated convolution is

n α times(f minus 1) + 1 (1)

0e receptive field size r after dilated convolution is

r 2(α2)+2minus 11113960 1113961 times 2(α2)+2

minus 11113960 1113961 (2)

Suppose that there is a dilated convolution with f 3and α 1 which is equivalent to a standard convolution Itsreceptive field is 3 times 3 When f 3 and α 2 according toequations (1) and (2) its new convolution kernel is 5 times 5 andthe receptive field size is expanded to 7 times 7 without losingdetailed information

In this paper we choose the Conv3_1 layer for dilatedconvolution 0e original kernel is 3 times 3 stride is 2 pad is2 and dilation rate is 2 From equation (1) the newconvolution kernel is 5 times 5 0e original feature map ofConv3_1 layer is 75 times 75 times 256 After performing dilatedconvolution it obtains a feature map size which is38 times 38 times 512 From equation (2) the receptive filed is7 times 7 0e Conv3_1 layer undergoes feature map fusionwith the Conv4_3 layer after dilated convolution 0ere aretwo main ways of feature map fusion additive fusion andcascade fusion Because the cascade fusion has a smallamount of calculation and high accuracy in this paper wechoose cascade fusion method Figure 4 shows the processof feature map fusion

Computational Intelligence and Neuroscience 3

To better explain how the dilated convolution improvesthe performance of the network with the addition of featuremaps Figure 5 shows the feature maps of the image beforeand after the dilated convolution

In the figure (a) is the original image (b) are the featuremaps of Conv4_3 in the SSD network and (c) are the featuremaps with dilated convolution and feature fusion 0eoriginal features of the Conv4_3 activation area and

perceptual field are small and cannot detect the ship targetsat the corresponding scales well 0e original features of theConv4_3 activation area and perceptual field are small andcannot detect the ship targets at the corresponding scaleswell On the contrary the dilated convolution and featurefusion are able to more richly extract the texture and detailfeatures on the low-level feature maps and the contours andshapes are more clearly distinguished

VGG-16rough Conv5_3layer

Conv3_1

75

75

Conv4_3 Conv8_2

Conv9_2Conv10_2

Conv11_2

38

38

FC7

19

1910

10 55

33 1 1

P5 P4P6 P3 P2 P1

Input image300times300

DilatedConv

Cascadefusion

Dilated convolution layer Multi-scale featurefusion layer

Prediction layer

Improved prediction module

Non-maximumsuppression

Upsampling

Feature fusion

1times1Conv

Figure 2 0e architecture of the proposed novel ship detection SSD (NSD-SSD)

Extra Feature Layers


300

300

3

Image

38

38

512

Conv4_3 10

10

512

Conv8_2

5

5256

Conv9_2 3

3256

Conv10_2Conv11_2

Conv6(FC6)

Conv7(FC7)

19 19

19191024 1024 1

256

Detections8732

per Class

Non-M

aximum

Suppression

Figure 1 SSD network structure diagram


Conv3_1

75

75256

Dilatedconvolution

Conv4_3 38

38

38

38

512

38

38512

1024

α=2

Figure 4 Fusion process of the Conv3_1 layer with the Conv4_3 layer after dilated convolution

Input imageConv4_3

Conv8_2

Conv9_2

Conv10_2

Conv11_2

FC7

Feature maps extracted from SSD

P1 P2 P3 P4 P5 P1Feature maps extracted after feature layers fusion

Figure 3 Feature maps of ship images extracted by the network

(a) (b) (c)

Figure 5 Comparison of the feature maps in the original SSD and after the dilated convolution


322 Multiscale Feature Fusion Layer 0e original SSDnetwork uses the Feature Pyramid Network (FPN) to detectdifferent feature layers so that it can adapt to different objectsizes Although this detection method provides the possi-bility of multiscale object detection it does not consider thecombination of shallow features and deep features In thisstudy on the basis of the original SSD network we introducea multiscale feature fusion mechanism 0is method cansynthesize shallow high-resolution features and deep se-mantic features to make joint decisions 0e green dottedbox in Figure 2 shows the specific fusion connections ofdifferent feature layers 0e left half of the figure is theoriginal SSD network feature layer and the right half is thefused feature layer 0e specific implementation process ofthis feature fusion method will be described in detail belowFirst perform 1 times 1 convolution of Conv11_2 to obtain P6then perform up-sampling of P6 and finally perform 1 times 1convolution of Conv10_2 with the feature layer obtained byup-sampling P6 to obtain P5 0e purpose of up-samplinghere is to obtain the feature map of the size required forfusion After the same fusion process the fused feature layersare successively P4 P3 P2 and P1 In this way the com-bination of shallow features and deep features is consideredcomprehensively and it is possible to improve the detectionaccuracy P1 is formed by fusion of dilated convolutionallayer and P2 up-sampling 0e parameters of the predictionlayer are shown in Table 1

323 Improved Prediction Module 0e SSD network uses aset of convolution filters at each effective feature layer toobtain prediction results For each effective feature layerwith a size of h times w with dchannels use a 3 times 3 convolutionoperation on each route to obtain the score of each categoryand the change of each prior bounding box

MS-CNN [42] points out that improving the subnetworkof each task can improve the accuracy DSSD [43] followsthis principle and proposes an improved prediction moduleand experimental results show that this method can improvedetection accuracy 0erefore we transplant the idea ofDSSD into our network model to better improve the de-tection performance0e prediction layer corresponds to thered box in Figure 2 0at is on the basis of SSD the originalstructure is changed to a residual module 0e residualprediction block allows the use of 1 times 1 convolution topredict the score of each category and the changes of priorboxes 0e structure of the original predictor and the im-proved predictor are shown in Figure 6 In this way deeperdimensional features can be extracted for classification andregression

324 Reconstruction of Regional Prior Box 0e perfor-mance of deep learning object detection algorithms largelydepends on the quality of feature learning driven by trainingdata In the SSD object detection task the training data is theregional prior box 0e SSD network has selected a total of 6effective feature layers as the prediction layer the sizes ofwhich are (38 38) (19 19) (10 10) (5 5) (3 3) and (1 1)but the number of a prior bounding boxes set on each feature

map is different 0e prior bounding box has two hyper-parameters scale and aspect ratio 0e scale of a priorbounding box in each prediction layer is

Sk Smin +Smax minus Smin

m minus 1(k minus 1) k isin [1 m] (3)

Among them m refers to the number of feature maps(m 6 in the SSD algorithm) sk represents the ratio of theprior box size of the kth feature map to the picture Sminrepresents the minimum value of the ratio and the value is02 and Smax indicates the maximum value of the ratio andthe value is 09 0e aspect ratio of the prior bounding box isgenerally set to ar 1 2 3 12 13 0e width and heightof the prior bounding box are as follows

wak Sk

ar

radic

hak

Skar

radic

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

(4)

By default each feature map will have a prior boundingbox with ar 1 and a scale of Sk In addition the priorbounding box with a scale of Sk

prime SkSk+1

1113968will be added In

this way each feature map has two square prior boundingboxes with an aspect ratio of 1 but different sizes 0emaximum side length of the square prior bounding box isSkprime

SkSk+1

1113968 and the minimum side length is Sk Table 2

lists the min-size and max-size of the prior bounding boxesused in this paper

As shown in Figure 7 4 prior bounding boxes aregenerated two squares (red dashed line) and two rectangles(blue dashed line) At this time the aspect ratio ar 1 2 Among them Sk lowast 300 is the side length of the small squareand

SkSk+1

1113968lowast 300 is the side length of the large square 300

is the size of the input image in the SSD algorithm0e widthand height of the corresponding two rectangles are

ar

radic lowast Sk lowast 3001ar

radic lowast Sk lowast 300

1ar

1113971

lowast Sk lowast 3001

1ar

1113968 lowast Sk lowast 300

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(5)

When 6 prior bounding boxes are to be generated theaspect ratio ar 1 2 3 0e center point of each prior boxis (i + 05|fk| j + 05|fk|) i and j isin [0 |fk|] and fk is thesize length of the feature map In this paperfk 38 19 10 5 3 1 Table 3 shows the detailed param-eters of the prior bounding boxes of the SSD algorithm

In the SSD algorithm the scale and aspect ratio of theprior boxes in the network cannot be obtained throughlearning but manually set Since each feature map in thenetwork uses different prior bounding boxes in scale andshape the debugging process is very dependent on expe-rience In this paper we use the K-means algorithm topredict the scale and proportion of the prior bounding boxto improve the detection efficiency of the network 0estandard K-means clustering algorithm uses Euclideandistance to measure distance But if Euclidean distance is


used here the larger boxes will produce more errors than thesmall boxes 0erefore we use other distance measurementmethods and the specific equation is as follows

d(box centroid) 1 minus IOU(box centroid)

1 minus IOU xj yj wj hj1113872 1113873 xj yj Wi Hi1113872 11138731113960 1113961(6)

IoU is the intersection ratio between the regional priorbounding boxes and the ground truth boxes and we expect a

larger IoU 0e purpose of clustering is that the priorbounding boxes and the adjacent ground truth have a largeIoU value Equation (6) just ensures that the smaller thedistance the larger the IoU value

In this paper we will traverse different types of labeledboxes in the dataset and cluster different types of boxesSome specific parameters in equation (6) are as follows(xj yj wj hj) j isin 1 2 k is the coordinates of thelabel boxes (xj yj) is the center point of the box (wj hj) isthe width and height of the boxes and N is the number of alllabel boxes Given k cluster center points (Wi Hi)i isin 1 2 k where Wi andHi are the width and height ofthe prior bounding box Calculate the distance between eachlabel box and each cluster center and the center of each labelbox coincides with the cluster center during calculation Inthis way the label box is assigned to the nearest clustercenter After all the label boxes are allocated the clustercenters are recalculated for each cluster 0e equation is asfollows

Table 1 Parameters of the prediction layer

Prediction layer Kernel size Padding Kernel numbers Strides Feature mapP1 3times 3 1 1024 1 38times 38P2 3times 3 1 1024 1 19times19P3 3times 3 1 512 1 10times10P4 3times 3 1 256 1 5times 5P5 3times 3 1 256 1 3times 3P6 3times 3 1 256 1 1times 1

3times3Conv

3times3Conv

Feature map

class

box

(a)

3times3Conv

3times3Conv

Feature map

1024

1

1

1

1

1

1

256 256

1

1256

Eltw Sum

class

box

(b)

Figure 6 0e prediction process of the feature layer (a) 0e original SSD predictor obtain the score of each category and the change of theprior box after two convolution routes (b) 0e improved predictor add the residual structure on the basis of (a) to obtain the predictionresult

Table 2 Size of prior bounding boxes for different feature layers

Feature layer Min-size Max-sizeConv4_3 30 60FC7 60 111Conv8_2 111 162Conv9_2 162 213Conv10_2 213 264Conv11_2 264 315


Wiprime

1Ni

1113944 wi

Hiprime

1Ni

1113944 hi

(7)

where Ni is the number of label boxes in the ith cluster thatis find the average of all label boxes in the cluster Repeat theabove steps until the cluster center changes very little

In this paper we set the number of cluster center k 01 2 3 4 5 6 7 8 9 10 to conduct experiments and use theaverage IoU tomeasure the results of the experiment so as tocomplete the reconstruction of the prior box It can be seenfrom Figure 8 that when kle 6 the average IoU increasesgreatly and when kgt 6 it basically tends to be flat Bycombining the calculation amount of the entire algorithmfor comprehensive consideration we choose k 6 At thistime the aspect ratio of the prior bounding box is predictedto be [035 089 118 169 189 286] Table 4 shows thespecific parameters of the prior bounding box setting in theNSD-SSD algorithm0rough themethod of prior boundingbox reconstruction the error of the algorithm is reducedwith improved accuracy and efficiency

33 Loss Function When training the detection network weneed to measure the error between the candidate boxes and

the truth value boxes and minimize this error At this timefor each candidate box the offset of the center point of thecandidate box relative to the center of the truth box and theconfidence of the candidate box needs to be calculated In thetraining phase there are generally two samples calledpositive samples and negative samples Here we consider thematching value of the candidate box and the truth box to be

Figure 7 Schematic diagram of the prior bounding box At this time the aspect ratio ar 1 2 and there are 4 prior bounding boxes

Table 3 0e specific parameters of prior bounding boxes in the SSD algorithm

Feature map Size Numbers ar

Conv4_3 38times 38 4 1 2FC7 19times19 6 1 2 3Conv8_2 10times10 6 1 2 3Conv9_2 5times 5 6 1 2 3Conv10_2 3times 3 4 1 2Conv11_2 1times 1 4 1 2

0 1 2 3 4 5 6 7 8 9 10 11k

055

060

065

070

075

080

085

aver

age I

oU

Figure 8 0e clustering map of the prior bounding box


greater than the threshold as positive samples denoted byd1 and other candidate boxes that do not satisfy minimummatching value are considered negative samples denoted byd2 In order to ensure the balance of the sample the ratio ofpositive and negative samples is required to be at most 3 1

0e loss function of the NSD-SSD algorithm is basicallysimilar to that of the SSD In this study the total lossfunction includes the classification loss and the localizationloss

L(x c l g) 1N

Lcls(x c) + αLloc(x l g)( 1113857 (8)

where N is the number of the positive samples If N 0 weset the loss to 0 c is confidence l is the predicted box and g

is the ground truth box α is the balance coefficient betweenclassification loss and localization loss and its value usuallyis 1

0e localization loss is smooth L1 loss xpij is an indicator

and xpij 0 1 When x

pij 1 it means that the ith can-

didate box matches jth ground truth box of ship category p

Lloc(x l g) 1113944N

iisind1

1113944

misin cxxywh

xkijsmoothL1 l

mi minus 1113954g

mj1113872 1113873 (9)

where

smoothL1(x) 05x

2|x|lt 1

|x| minus 05 otherwise

⎧⎨

⎩ (10)

0e classification loss is the Softmax loss When clas-sifying the confidence level belonging to the ship category p

is expressed by cp and the confidence level belonging to thebackground is expressed as c0

Lcls(x c) minus 1113944N

i isin d1

xp

ijlog 1113954cp

i1113872 1113873 minus 1113944iisind2

log 1113954c0i1113872 1113873 (11)

where 1113954cpi exp(c

pi )1113936pexp(c

pi ) In the first half of equation

(11) the predicted frame i and the real frame j match withrespect to the ship category p 0e higher the predictedprobability of p the smaller the loss In the second half of theequation there is no ship in the predicted box 0at is thehigher the predicted probability of the background thesmaller the loss In this study we use Stochastic GradientDescent to optimize the loss function to find the optimalsolution 0e final loss function curve of NSD-SSD is shownin Figure 9 Note that due to the result of deep learning inthis model the loss function in the early stage will fluctuatebut it will eventually become stable

4 Experimental Results

To prove the effectiveness of our proposed method wedesigned experiments and quantitatively evaluated theproposed method on the public ship dataset Subjective andobjective results will be presented in this section and theresults will also be analyzed

41 Dataset In this paper we use a public dataset calledSeaShips [44] for ship detection 0is dataset consists of 6common ship categories and 7000 images in total includingore carrier bulk cargo carrier general cargo ship containership fishing boat and passenger ship All of the images arevideo clips taken by surveillance camera covering all pos-sible imaging changes with different proportions hull partsbackground and occlusion All images are marked with shipcategory labels and bounding boxes 0e example images ofeach ship category are shown in Figure 10 In order to bettertrain and evaluate the model we divided the dataset into atraining set a validation set and a testing set 0e 3500images were randomly selected as the training set 1750images as the validation set and the rest as the testing set Inparticular the validation set was useful to avoid overfittingfor better model selection

42 Test Settings All the models in our experiment are runon a 64-bit Ubuntu operating system using a 29GHz IntelCore-i5 with 156GB of RAM and NVIDIA GTX 1080Ti

Table 4 0e specific parameters of prior bounding boxes in the NSD-SSD algorithm


Conv4_3 38times 38 6 1 2 3FC7 19times19 6 1 2 3Conv8_2 10times10 6 1 2 3Conv9_2 5times 5 6 1 2 3Conv10_2 3times 3 6 1 2 3Conv11_2 1times 1 6 1 2 3

trai

nlo

ss

Epoch

5

10

15

20

25

01000 2000 3000 4000 5000 60000 7000

train loss vs epochs

Figure 9 0e loss function curve of the NSD-SSD


GPU with 11GB of video RAM 0e deep learning frame-work that we use is Pytorch which runs on GPU

Our proposed network structure is modified from SSDand the NSD-SSD and SSD use the same hyperparametersfor training 0e batch size we used is 32 and the num_-workers is 4 0e initial learning rate is set to 0001 Mo-mentum is 09 and weight decay is 00002

43 Evaluation Index Since this article studies the task ofship object detection several mature indicators are neededto evaluate the detection model 0ese indicators will bedescribed in detail below

(1) Intersection-over-Union (IoU) IoU is a standard formeasuring the accuracy of detecting the position ofcorresponding objects in a specific dataset In otherwords this standard is used to measure the corre-lation between real and predicted 0e higher thecorrelation the greater the value 0e equation is asfollows

IoU Gt capDr

Gt cupDr

(12)

In equation (12) Gt is the ground-truth boundingbox Dr is the predicted bounding box Gt capDr is theintersection of Gt and Dr and Gt cupDr is the union ofGt and Dr 0e range of IoU is 0-1 in this paper weset the threshold to 05 Once the IoU calculationresult is greater than 05 it is marked as a positivesample otherwise it is also a negative sample

(2) Average precision After the IoU threshold is giventhere will be two indicators called precision andrecall 0e precision refers to the number of groundtruth ships in all predictions 0e recall refers to thenumber of ground truth ships predicted in all groundtruth ships So precision and recall are as follows

precision TP

TP + FP

recall TP

TP + FN

(13)

According to precision and recall a precision-recallcurve can be drawn referred to as the PR curve AP isthe area enclosed by this curve and the specificequation is as follows

(a) (b)

(c) (d)

(e) (f )

Figure 10 0e ship category in our used dataset (a) Bulk cargo carrier (b) Container ship (c) Fishing boat (d) General cargo ship (e) Orecarrier (f ) Passenger ship


AP 11139461

0P(R)dR (14)

(3) mean Average Precision (mAP) mAP shows theaverage values of APi of each class i

mAP 1113936

ni1 APi

n (15)

Here n represents the number of classes of ships thatneed to be detected

(4) Frames Per Second (FPS) the FPS is used to judge thedetection speed of different models and the largerthe FPS value the faster the speed

44 Results and Analysis Our model is based on the SSDnetwork with the backbone of VGG16 To test the detectionperformance of NSD-SSD the comparative experiments areimplemented using several popular baseline methods FasterR-CNN SSD series and YOLO series 0e backbone net-works of these models are pretrained on ImageNet Toachieve a fair comparison we train and test the four modelsusing the same dataset At the same time to ensure theconsistency of training we set the hyperparameters and thenumber of training echoes of these three baseline models tobe the same as the NSD-SSD

According to our detection method (NSD-SSD) the APperformance of the six categories of ship is shown in Fig-ure 11 0e IoU threshold is set to 05 in the experiment

We record the accuracy of the four models based on theevaluation indicators as shown in Table 5 0e detectionperformance of Faster R-CNN is significantly better thanYOLO series and SSD series On average Faster R-CNNrsquosmAP is 225 and 178 higher than SSD series and 125and 82 higher than YOLO series respectively Althoughour proposed model (NSD-SSD) has a little gap with FasterR-CNN in mAP our approach significantly improves theperformance of SSD Moreover it performs better thanFaster R-CNN on general cargo ship

Our proposed method is based on SSD (VGG16) 0edetection effect of the original SSD network is indeed notgood and the accuracy is extremely average But comparedwith SSD the mAP of each category of ship in our model hasa good improvement and the NSD-SSDrsquos mAP is 202higher than original SSD Among six categories of ships thecontainer ships have the best detection results Because theymainly transport containers and these cargoes have verydistinct shape characteristics that are different from otherships 0e ore carriers also achieve excellent detection re-sults Because they usually transport ore they have thespecial features like container ships In addition since thegeneral cargo ships are very large in the images their resultsare also extremely good On the contrary the performance offishing boats is the worst among these six categories of ships0e main season is that fishing boats are too small occu-pying a few pixels in the 1920 times 1080 image Detectors aregenerally not good at detecting small objects After layers ofconvolution the feature information of small objects willbecome blurred and even the SSD model is worse

We perform structural improvements on the basis ofSSD and add detailed detection techniques which makes itpossible for us to better detect small targets and improve theoverall accuracy For fishing boats we have increased from604 to 824 which already exceeds the mAP of theYOLOv3 model As shown in examples in Figure 12 ourproposed method greatly improves the detection effect offishing boats against SSD For the passenger ships ourmethod has increased by nearly 10 For the general cargoships our method makes their performance become betterand has a significant improvement over Faster R-CNN

In terms of detection speed FPS of 24 is called thestandard for real-time detection in object detection As canbe seen from Table 6 the detection speed of YOLOv3 ismuch better than other detectionmodel and the FPS reaches79 Unfortunately its detection effect is not good 0e de-tection speed of SSD series can be ranked second and theFPS can reach 75 and 680 respectively but detectionperformance is worse Since the Faster R-CNN is a two-stagedetector the detection process is more complicated whichresults in its FPS of only 7 and cannot meet real-time de-tection Our proposed model adds many parameters andcalculations on the basis of SSD thereby reducing the speed0e FPS given by our method is 45 which not only guar-antees the real-time detection requirements but also im-proves the detection accuracy In addition we also give theparameters of IoU and recall for different models and ourmethod is better than other methods

In Figure 13 we show the detection examples of ourmodel against Faster R-CNN and YOLOv3 and our pro-posedmethod has a better visual effect Specifically when thetwo ships are very close together the bounding box ofYOLOv3 is much larger or smaller than ship but ourmethod can mark a more accurate box Furthermore theFaster R-CNN sometimes detects the background as a shipbut our proposed method can avoid the false detection

We compare the proposed method with [35] and theypropose a detection method that combines YOLOv2 andsaliency detection and achieve good results 0e comparisonresults are shown in Table 7 From the table our method isslightly better than the comparisonmethod onmAP Amongthe six categories of ships container ships and fishing boatscan achieve better results Specifically these two categoriesof shipsrsquo AP is 77 and 41 higher than the comparisonmethod respectively For passenger ships our method is38 lower than Shaorsquos method because the color charac-teristic of passenger ships is very salient the performance oftheir proposed saliency detection is particularly good andthe accuracy is higher In addition the IoU of our method ishigher and the detection visual effect is better but the FPS ofShaorsquos model is 4 higher than the FPS of our model

To verify the effectiveness of our proposed variousmodules we conduct the ablation experiment for comparisonand the original SSD is the baseline network Moreover ourproposed three modules are considered as setting items andthe experimental results are shown in Table 8

As can be seen from the table that the detection accuracyof SSD is 107 higher than that of the backbone networkVGG16 indicating that SSD is a better detection network


When introducing the feature fusion in SSD the mAP hasincreased from 691 to 832 Because our algorithmconsiders the combination of shallow features and deepfeatures and makes full use of contextual information Whenadding the remaining two parts of modules the mAP hasincreased by 610e above results prove that our proposedmethod can effectively improve the accuracy of shipdetection

Furthermore we validate our proposed method underpractical extreme conditions as shown in Figure 14 andunder different weather conditions such as sunny rainyand night On the contrary the ships in the images areincomplete However our method still achieves excellentdetection performance and the marked bounding boxesand the classifications are reasonable and accuraterespectively

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision

class 8633 = bulkcargocarrier AP

10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision

class 8235 = fishingboat AP

10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision

class 9079 = orecarrier AP

10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision

class 9804 = containership AP

10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision

class 9372 = generalcargoship AP

10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision

class 8482 = passengership AP

10

Figure 11 Precision-recall curves of our proposed method (NSD-SSD) on six categories of ships


Table 5 Detection accuracy of different detection models

Model mAP Bulk cargo ship Container ship Fishing boat General cargo ship Ore carrier Passenger shipFaster R-CNN 0916 0893 0986 0908 0927 0914 0868SSD (VGG16) 0691 0661 0801 0604 0703 0620 0755SSD (Mobilev2) 0738 0703 0876 0635 0742 0686 0783YOLOv3 0791 0681 0959 0690 0893 0734 0786YOLOv4 0834 0849 0929 0732 0851 0778 0862NSD-SSD 0893 0863 0980 0824 0937 0908 0848

(a) (b)

(c) (d)

(e) (f )

Figure 12 Some fishing boatsrsquo detection results (andashc) 0e original SSD (dndashf) Our proposed method

Table 6 0e detection results of other indicators for different detectors

Model IoU Recall FPSFaster R-CNN 0603 0865 7YOLOv3 0616 0834 79SSD (VGG16) 0781 0700 75SSD (Mobilev2) 0745 0787 68YOLOv4 0716 0854 56Ours 0808 0936 45


(a) (b)

(c) (d)

Figure 13 Ship detection results (a) 0e Faster R-CNN (b) YOLOv3 (c d) Our proposed model

Table 7 Detection results of different detection models

Model IoU mAP Bulk cargo ship Container ship Fishing boat General cargo ship Ore carrier Passenger ship FPSOurs 08082 0893 0863 0980 0824 0937 0908 0848 45Shaorsquos 07453 0874 0876 0903 0783 0917 0881 0886 49

Table 8 0e results of the ablation experiment

VGG16 SSD Feature fusion Improved predicted module Prior boxes reconstruction mAP 0584

0691 0832 0893

(a) (b) (c)

(d) (e) (f )

(g) (h) (i)

Figure 14 Continued


5 Conclusion

In this paper based on real-time ship detection task as ourbasic goal as well as the characterization of the ship dataset anovel shipsrsquo detector in visual images captured by themonitoring sensor named NSD-SSD is proposed 0eNSD-SSD is mainly based on multiscale feature fusion(MFF) predicted module (PM) and reconstruction of priorboxes (RPB) Regarding the problem of small objects de-tection the dilated convolution is used to expand the re-ceptive field of low-level feature layers and the network canfully use the contextual information by the MFF For theproblem of setting prior boxes manually we propose RPB byusing the K-means clustering algorithm to improve thedetection efficiency In addition the PM is introduced toextract deeper features We train our model on the shipdataset and compare it with other conventional methods0e experimental results prove that our proposed method isable to acquire higher accuracy and recall and it can meetthe requirement of real-time detection Moreover the NSD-SSD can also guarantee high-quality detection performancein the relatively extreme environment We also noticed thatthe method could be improved for ship detection in complexbackgrounds We will address this issue in our future work

Data Availability

SeaShip httpwwwlmarswhueducnprof_webshaozhenfengdatasetsSeaShips(7000)zip (accessed on 2November 2020)

Conflicts of Interest

0e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

0is research was funded by the National Key Research andDevelopment Project of China (Grant no2019YFB1600605)

References

[1] S Dugad V Puliyadi H Palod N Johnson S Rajput andS Johnny ldquoShip intrusion detection security system usingimage processing amp SVMrdquo in Proceedings of the 2017 Inter-national Conference on Nascent Technologies in Engineering(ICNTE) pp 1ndash7 Navi Mumbai India 2017

[2] X Cao S Gao L Chen and Y Wang ldquoShip recognitionmethod combined with image segmentation and deeplearning feature extraction in video surveillancerdquoMultimediaTools and Applications vol 79 no 13-14 pp 9177ndash9192

[3] L Jiao F Zhang F Liu et al ldquoA survey of deep learning-basedobject detectionrdquo IEEE Access vol 7 pp 128837ndash1288682019

[4] Y Lecun L Bottou Y Bengio and P Haffner ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

[5] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of the Advances in Neural Information ProcessingSystems 30 (NIPS 2017) pp 1097ndash1105 Lake Tahoe NV USADecember 2012

[6] K Simonyan and A Zisserman ldquoVery deep convolu-tional networks for large-scale image recognitionrdquo 2015httpsarxivorgabs14091556

[7] C Szegedy W Wei Liu Y Sermanet et al ldquoGoing deeperwith convolutionsrdquo in Proceedings of the 2015 IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR)pp 1ndash9 Boston MA USA 2015

[8] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the 2016 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 770ndash778 Las Vegas NV USA 2016

[9] G Huang Z Liu and L Weinberger ldquoDensely connectedconvolutional networksrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 2261ndash2269 Honolulu HI USA 2017

[10] A Abdollahi B Pradhan S Gite and A Alamri ldquoBuildingfootprint extraction from high resolution aerial images usinggenerative adversarial network (GAN) architecturerdquo IEEEAccess vol 8 pp 209517ndash209527 2020

[11] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic seg-mentationrdquo in Proceedings of the 2014 IEEE Conference onComputer Vision and Pattern Recognition pp 580ndash587Columbus OH USA 2014

[12] K He X Zhang S Ren and J Sun ldquoSpatial pyramid poolingin deep convolutional networks for visual recognitionrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 37 no 9 pp 1904ndash1916 2015

[13] R Girshick ldquoFast R-CNNrdquo in Proceedings of the InternationalConference on Computer Vision (ICCV) Santiago Chile 2015

[14] S Ren K He R Girshick and J Sun ldquoFaster R-CNN towardsreal-time object detection with region proposal networksrdquoIEEE Transactions on Pattern Analysis and Machine Intelli-gence vol 39 no 6 pp 1137ndash1149 2017

[15] K He G Gkioxari P Dollar and R Girshick ldquoMaskR-CNNrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 42 no 2 pp 386ndash397 2020

(j) (k) (l)

Figure 14 Visualization of ship detection with the proposed method under different conditions (andashc) Sunny (dndashf) Rainy (gndashi) Night (jndashl)Incomplete ship


[16] P Sermanet D Eigen and X Zhang ldquoOverFeat integratedrecognition localization and detection using convolutionalnetworksrdquo 2013 httpsarxivorgabs13126229

[17] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Amsterdam 0eNetherlands October 2016

[18] J Redmon S Divvala and R Girshick ldquoYou only look onceunified real-time object detectionrdquo in Proceedings of the 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR) pp 779ndash788 Las Vegas NV USA June 2016

[19] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the 2017 IEEE Conference onComputer Vision and Pattern Recognition (CVPR)pp 6517ndash6525 Honolulu HI USA 2017

[20] J Redmon and A Farhadi ldquoYOLOv3 an incremental im-provementrdquo 2018 httpsarxivorgabs180402767

[21] A Bochkovskiy C Wang and H Liao ldquoYOLOv4 optimalspeed and accuracy of object detectionrdquo 2020 httpsarxivorgabs200410934

[22] T Yulin S Jin G Bian and Y Zhang ldquoShipwreck targetrecognition in side-scan sonar images by improved YOLOv3model based on transfer learningrdquo IEEE Access vol 8pp 173450ndash173460 2020

[23] Z Chen K Wu Y Li M Wang and W Li ldquoSSD-MSN animproved multi-scale object detection network based onSSDrdquo IEEE Access vol 7 pp 80622ndash80632 2019

[24] C Kang and X Zhang ldquoA fusion algorithm of multipleclassifiers for recognition of ships radiated_noises based onmany-person decision-makings theoryrdquo in Proceedings of theFourth International Conference on Fuzzy Systems andKnowledge Discovery (FSKD 2007) pp 71ndash74 Haikou China2007

[25] C Zhao L Zhengguo S Xiaohong B Jun and L ZhengguoldquoA decision tree SVM classification method based on theconstruction of ship-radiated noise multidimension featurevectorrdquo in Proceedings of the 2011 IEEE International Con-ference on Signal Processing Communications and Computing(ICSPCC) pp 1ndash6 Xirsquoan China 2011

[26] J Luo and Y Wang ldquoRecognition and location technique ofvolume target based on frequency domain characteristics ofship radiated noiserdquo in Proceedings of the 2011 IEEE Inter-national Conference on Signal Processing Communicationsand Computing (ICSPCC) pp 1ndash5 Xirsquoan China 2011

[27] C Peng L Yang X Jiang and Y Song ldquoDesign of a shipradiated noise model and its application to feature extractionbased on wingerrsquos higher-order spectrumrdquo in Proceedings ofthe 2019 IEEE 4th Advanced Information Technology Elec-tronic and Automation Control Conference (IAEAC)pp 582ndash587 Chengdu China 2019

[28] C Zhu H Zhou R Wang and J Guo ldquoA novel hierarchicalmethod of ship detection from spaceborne optical imagebased on shape and texture featuresrdquo IEEE Transactions onGeoscience and Remote Sensing vol 48 no 9 pp 3446ndash34562010

[29] G Ge Liu Y Yasen Zhang and X Xian Sun ldquoA new methodon inshore ship detection in high-resolution satellite imagesusing shape and context informationrdquo IEEE Geoscience andRemote Sensing Letters vol 11 no 3 pp 617ndash621 2014

[30] Z Zhenwei Shi X Xinran Yu and Z Bo Li ldquoShip detection inhigh-resolution optical imagery based on anomaly detectorand local shape featurerdquo IEEE Transactions on Geoscience andRemote Sensing vol 52 no 8 pp 4511ndash4523 2014

[31] W Wenxiu F Yutian and D Feng ldquoRemote sensing shipdetection technology based on DoG preprocessing and shapefeaturesrdquo in Proceedings of the 2017 3rd IEEE InternationalConference on Computer and Communications (ICCC)pp 1702ndash1706 Chengdu China 2017

[32] Y Zou L Zhao S Qin M Pan and Z Li ldquoShip targetdetection and identification based on SSD_MobilenetV2rdquo inProceedings of the 2020 IEEE 5th Information Technology andMechatronics Engineering Conference (ITOEC) pp 1676ndash1680 Chongqing China 2020

[33] M Sandler A Howard M Zhu A Zhmoginov andL-C Chen ldquoMobileNetV2 inverted residuals and linearbottlenecksrdquo in Proceedings of the 2018 IEEECVF Conferenceon Computer Vision and Pattern Recognition pp 4510ndash4520Salt Lake City UT USA 2018

[34] Z Lin K Ji X Leng and G Kuang ldquoSqueeze and excitationrank faster R-CNN for ship detection in SAR imagesrdquo IEEEGeoscience and Remote Sensing Letters vol 16 no 5pp 751ndash755 2019

[35] Z Shao L Wang Z Wang W Du and W Wu ldquoSaliency-aware convolution neural network for ship detection insurveillance videordquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 30 no 3 pp 781ndash794 2020

[36] X Nie M Duan H Ding B Hu and E K Wong ldquoAttentionMask R-CNN for ship detection and segmentation fromremote sensing imagesrdquo IEEE Access vol 8 pp 9325ndash93342020

[37] B Guo J Shi L Zhu and Z Yu ldquoHigh-speed railwayclearance intrusion detection with improved SSD networkrdquoApplied Sciences vol 9 no 15 p 2981 2019

[38] Z Huang B Sui J Wen and G Jiang ldquoAn intelligent shipimagevideo detection and classification method with im-proved regressive deep convolutional neural networkrdquoComplexity vol 2020 Article ID 1520872 11 pages 2020

[39] Y Zhao L Zhao B Xiong and G Kuang ldquoAttention re-ceptive pyramid network for ship detection in SAR imagesrdquoIEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing vol 13 pp 2738ndash2756 2020

[40] Z Li Y You and F Liu ldquoAnalysis on saliency estimationmethods in high-resolution optical remote sensing imageryfor multi-scale ship detectionrdquo IEEE Access vol 8pp 194485ndash194496 2020

[41] M Zhao Y Zhong and D Sun ldquoAccurate and efficientvehicle detection framework based on SSD algorithmrdquo IETImage Process pp 1ndash11 2021

[42] Z Cai Q Fan R S Feris and N Vasconcelos ldquoA unifiedmulti-scale deep convolutional neural network for fast objectdetectionrdquo in Proceedings of the 14th European Conference onComputer Vision ECCV 2016 pp 354ndash370 Amsterdam 0eNetherlands October 2016

[43] C Fu W Liu and A Ranga ldquoDSSD deconvolutional singleshot detectorrdquo 2017 httpsarxivorgabs170106659

[44] Z Shao W Wu Z Wang W Du and C Li ldquoSeaShips alarge-scale precisely annotated dataset for ship detectionrdquoIEEE Transactions on Multimedia vol 20 no 10 pp 2593ndash2604 2018


ResNet [8] and DenseNet [9] In 2020 Abdollahi et al [10]used a generative adversarial network (GAN) architecture toextract building footprints from high-resolution aerial im-ages However the algorithms of regular CNNs combinedwith feature pyramid networks (FPN) have become a newfocus in the field of object detection 0e object detectionalgorithms currently mainly include two technical routestwo-stage detection and one-stage detection 0e two-stagedetection is divided into two steps First obtain the regionproposals and then these region proposals are classified andregressed to get the final detection results Two-stage de-tectors mainly include R-CNN [11] SPP-Net [12] FastR-CNN [13] Faster R-CNN [14] andMask R-CNN [15] Forone-stage detection it treats the object detection problem asa regression problem A unified CNN completes the objectclassification and location which is an end-to-end targetdetection solution One-stage detectors mainly includeOverFeat [16] SSD [17] and YOLO [18ndash21] Many scholarsproposed improved YOLOv3 and SSD for object detectionand obtained outstanding detection performance [22 23]0e two-stage detection algorithm such as Faster R-CNN hashigh accuracy but its region proposal network (RPN) istime-consuming and therefore reduces the detection effi-ciency On the contrary although the YOLO series has agreat advantage in terms of detection speed they cannotachieve high accuracy

0e SSD is used as a one-stage detector and introduces amultiscale feature layer for object detection which has fasterdetection speed but accuracy needs to be improved In thispaper the SSD is applied to ship detection and severalimprovements are used to improve the overall performanceof the network

(1) To address the problem of poor performance of smalltarget detection we apply a dilated convolution on the low-level feature layer to expand the receptive field so that thelow-level feature layer can also contain more feature in-formation At the same time we perform multiscale fusionon the original feature layers after up-sampling so that thenetwork can make full use of the contextual information (2)We introduce a residual structure in the prediction moduleof the network to enable the network to extract deeperdimensional feature information for better classification andregression (3) We use the K-means clustering algorithm toreconstruct the prior bounding box so as to obtain a moresuitable scale and aspect ratio which can improve both thevisual effect and the efficiency of ship detection Finally wepropose a new SSD-based network called NSD-SSD whichis significantly better than the original SSD Compared withSSD and other detection networks the proposed networkprovides a good trade-off between real-time detection andaccuracy

0e rest of this paper is organized as follows In Section2 we introduce the related work of the ship detection InSection 3 we give detailed program of our proposed ap-proach Section 4 outlines the experimental results andcomparisons against other state-of-the-art methods Finallyconclusions are made in Section 5

2 Related Work

0is paper categorizes the previous work of ship objectdetection to traditional methods and deep learning methods

0e traditional detection methods include two types (1)Ship-radiated noise-based methods Kang et al [24] pro-posed a multiple classifier fusion algorithm based on many-person decision theory to identify ship radiated noise withaccuracy rate of over 96 Zhao et al [25] proposed adecision tree support vector machine (SVM) classificationmethod based on the ship-radiated noise multidimensionfeature vector for the measured radiated noise of three kindsof ship targets Luo and Wang [26] used the time-frequencyrange characteristics of ship noise to distinguish shiprsquos sternshiprsquos mid-aft and shiprsquos middle part to complete the po-sitioning and identification of ship targets Peng et al [27]proposed a ship-radiated noise model based on the wingerrsquoshigher-order spectrum for feature extraction (2) Shipstructure and shape characteristics-based methods Zhuet al [28] proposed a novel hierarchical method of shipdetection from spaceborne optical image based on shape andtexture features and this method can effectively distinguishships from nonships on the optical image dataset Liu et al[29] used segmentation and shape analysis to detect inshoreships and proved their method was effective and robustunder various situations Shi et al [30] proposed an ap-proach involving a predetection stage and an accurate de-tection stage to detect ships in a coarse-to-fine manner inhigh-resolution optical images Wang et al [31] proposed adetection method based on DoG (Difference of Gaussian)preprocessing and shape features to detect ship targets inremote sensing images

Most of the traditional methods use manually extractedfeatures which will lead to low efficiency and high timeconsumption At the same time even if a classifier with goodperformance is used to classify these features the accuracycannot meet the actual demand 0erefore the recognitionrate of these methods in complex environmental back-ground and multivessel classification is not ideal

0e deep learning detection methods with the boomdevelopment of deep learning many ship object detectionmethods based on deep CNN have been proposed Zou et al[32] proposed an improved SSD algorithm based onMobilenetV2 [33] and finally achieved better detection re-sults in three types of ship images Zhao et al [34] proposed anew network architecture based on the Faster R-CNN byusing squeeze and excitation for ship detection in SARimages Shao et al [35] proposed a saliency-aware CNNframework and coastline segmentation method to improvethe accuracy and robustness of ship detection under com-plex seashore surveillance conditions Nie et al [36] pro-posed an improved Mask R-CNN model which canaccurately detect and segment ships from remote sensingimages at the pixel level Guo et al [37] proposed a novelSSD network structure to improve the semantic informationby deconvoluting high-level features into a low-level featureand then fusing it with original low-level features and the













r 2(α2)+2minus 11113960 1113961 times 2(α2)+2

minus 11113960 1113961 (2)








Conv3_1

75

75

Conv4_3 Conv8_2

Conv9_2Conv10_2

Conv11_2

38

38

FC7

19

1910

10 55

33 1 1

P5 P4P6 P3 P2 P1


DilatedConv

Cascadefusion


Prediction layer



Upsampling

Feature fusion

1times1Conv




300

300

3

Image

38

38

512

Conv4_3 10

10

512

Conv8_2

5

5256

Conv9_2 3

3256

Conv10_2Conv11_2

Conv6(FC6)

Conv7(FC7)

19 19

19191024 1024 1

256

Detections8732

per Class

Non-M

aximum

Suppression



Conv3_1

75

75256

Dilatedconvolution

Conv4_3 38

38

38

38

512

38

38512

1024

α=2


Input imageConv4_3

Conv8_2

Conv9_2

Conv10_2

Conv11_2

FC7




(a) (b) (c)











wak Sk

ar

radic

hak

Skar

radic

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

(4)


prime SkSk+1



SkSk+1




SkSk+1



ar



1ar

1113971


1ar


⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(5)












3times3Conv

3times3Conv

Feature map

class

box

(a)

3times3Conv

3times3Conv

Feature map

1024

1

1

1

1

1

1

256 256

1

1256

Eltw Sum

class

box

(b)





Wiprime

1Ni

1113944 wi

Hiprime

1Ni

1113944 hi

(7)









0 1 2 3 4 5 6 7 8 9 10 11k

055

060

065

070

075

080

085

aver

age I

oU





L(x c l g) 1N

Lcls(x c) + αLloc(x l g)( 1113857 (8)




and xpij 0 1 When x




iisind1

1113944

misin cxxywh

xkijsmoothL1 l

mi minus 1113954g

mj1113872 1113873 (9)

where

smoothL1(x) 05x

2|x|lt 1


⎧⎨

⎩ (10)




i isin d1

xp

ijlog 1113954cp

i1113872 1113873 minus 1113944iisind2

log 1113954c0i1113872 1113873 (11)


pi )1113936pexp(c










trai

nlo

ss

Epoch

5

10

15

20

25

01000 2000 3000 4000 5000 60000 7000








IoU Gt capDr

Gt cupDr

(12)



precision TP

TP + FP

recall TP

TP + FN

(13)


(a) (b)

(c) (d)

(e) (f )



AP 11139461

0P(R)dR (14)


mAP 1113936

ni1 APi

n (15)
















0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10





(a) (b)

(c) (d)

(e) (f )





(a) (b)

(c) (d)






0691 0832 0893

(a) (b) (c)

(d) (e) (f )

(g) (h) (i)

Figure 14 Continued


5 Conclusion


Data Availability




Acknowledgments


References
















(j) (k) (l)












































r 2(α2)+2minus 11113960 1113961 times 2(α2)+2

minus 11113960 1113961 (2)








Conv3_1

75

75

Conv4_3 Conv8_2

Conv9_2Conv10_2

Conv11_2

38

38

FC7

19

1910

10 55

33 1 1

P5 P4P6 P3 P2 P1


DilatedConv

Cascadefusion


Prediction layer



Upsampling

Feature fusion

1times1Conv




300

300

3

Image

38

38

512

Conv4_3 10

10

512

Conv8_2

5

5256

Conv9_2 3

3256

Conv10_2Conv11_2

Conv6(FC6)

Conv7(FC7)

19 19

19191024 1024 1

256

Detections8732

per Class

Non-M

aximum

Suppression



Conv3_1

75

75256

Dilatedconvolution

Conv4_3 38

38

38

38

512

38

38512

1024

α=2


Input imageConv4_3

Conv8_2

Conv9_2

Conv10_2

Conv11_2

FC7




(a) (b) (c)











wak Sk

ar

radic

hak

Skar

radic

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

(4)


prime SkSk+1



SkSk+1




SkSk+1



ar



1ar

1113971


1ar


⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(5)












3times3Conv

3times3Conv

Feature map

class

box

(a)

3times3Conv

3times3Conv

Feature map

1024

1

1

1

1

1

1

256 256

1

1256

Eltw Sum

class

box

(b)





Wiprime

1Ni

1113944 wi

Hiprime

1Ni

1113944 hi

(7)









0 1 2 3 4 5 6 7 8 9 10 11k

055

060

065

070

075

080

085

aver

age I

oU





L(x c l g) 1N

Lcls(x c) + αLloc(x l g)( 1113857 (8)




and xpij 0 1 When x




iisind1

1113944

misin cxxywh

xkijsmoothL1 l

mi minus 1113954g

mj1113872 1113873 (9)

where

smoothL1(x) 05x

2|x|lt 1


⎧⎨

⎩ (10)




i isin d1

xp

ijlog 1113954cp

i1113872 1113873 minus 1113944iisind2

log 1113954c0i1113872 1113873 (11)


pi )1113936pexp(c










trai

nlo

ss

Epoch

5

10

15

20

25

01000 2000 3000 4000 5000 60000 7000








IoU Gt capDr

Gt cupDr

(12)



precision TP

TP + FP

recall TP

TP + FN

(13)


(a) (b)

(c) (d)

(e) (f )



AP 11139461

0P(R)dR (14)


mAP 1113936

ni1 APi

n (15)
















0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10





(a) (b)

(c) (d)

(e) (f )





(a) (b)

(c) (d)






0691 0832 0893

(a) (b) (c)

(d) (e) (f )

(g) (h) (i)

Figure 14 Continued


5 Conclusion


Data Availability




Acknowledgments


References
















(j) (k) (l)





































Conv3_1

75

75

Conv4_3 Conv8_2

Conv9_2Conv10_2

Conv11_2

38

38

FC7

19

1910

10 55

33 1 1

P5 P4P6 P3 P2 P1


DilatedConv

Cascadefusion


Prediction layer



Upsampling

Feature fusion

1times1Conv




300

300

3

Image

38

38

512

Conv4_3 10

10

512

Conv8_2

5

5256

Conv9_2 3

3256

Conv10_2Conv11_2

Conv6(FC6)

Conv7(FC7)

19 19

19191024 1024 1

256

Detections8732

per Class

Non-M

aximum

Suppression



Conv3_1

75

75256

Dilatedconvolution

Conv4_3 38

38

38

38

512

38

38512

1024

α=2


Input imageConv4_3

Conv8_2

Conv9_2

Conv10_2

Conv11_2

FC7




(a) (b) (c)











wak Sk

ar

radic

hak

Skar

radic

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

(4)


prime SkSk+1



SkSk+1




SkSk+1



ar



1ar

1113971


1ar


⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(5)












3times3Conv

3times3Conv

Feature map

class

box

(a)

3times3Conv

3times3Conv

Feature map

1024

1

1

1

1

1

1

256 256

1

1256

Eltw Sum

class

box

(b)





Wiprime

1Ni

1113944 wi

Hiprime

1Ni

1113944 hi

(7)









0 1 2 3 4 5 6 7 8 9 10 11k

055

060

065

070

075

080

085

aver

age I

oU





L(x c l g) 1N

Lcls(x c) + αLloc(x l g)( 1113857 (8)




and xpij 0 1 When x




iisind1

1113944

misin cxxywh

xkijsmoothL1 l

mi minus 1113954g

mj1113872 1113873 (9)

where

smoothL1(x) 05x

2|x|lt 1


⎧⎨

⎩ (10)




i isin d1

xp

ijlog 1113954cp

i1113872 1113873 minus 1113944iisind2

log 1113954c0i1113872 1113873 (11)


pi )1113936pexp(c










trai

nlo

ss

Epoch

5

10

15

20

25

01000 2000 3000 4000 5000 60000 7000








IoU Gt capDr

Gt cupDr

(12)



precision TP

TP + FP

recall TP

TP + FN

(13)


(a) (b)

(c) (d)

(e) (f )



AP 11139461

0P(R)dR (14)


mAP 1113936

ni1 APi

n (15)
















0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10





(a) (b)

(c) (d)

(e) (f )





(a) (b)

(c) (d)






0691 0832 0893

(a) (b) (c)

(d) (e) (f )

(g) (h) (i)

Figure 14 Continued


5 Conclusion


Data Availability




Acknowledgments


References
















(j) (k) (l)

































Conv3_1

75

75256

Dilatedconvolution

Conv4_3 38

38

38

38

512

38

38512

1024

α=2


Input imageConv4_3

Conv8_2

Conv9_2

Conv10_2

Conv11_2

FC7




(a) (b) (c)











wak Sk

ar

radic

hak

Skar

radic

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

(4)


prime SkSk+1



SkSk+1




SkSk+1



ar



1ar

1113971


1ar


⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(5)












3times3Conv

3times3Conv

Feature map

class

box

(a)

3times3Conv

3times3Conv

Feature map

1024

1

1

1

1

1

1

256 256

1

1256

Eltw Sum

class

box

(b)





Wiprime

1Ni

1113944 wi

Hiprime

1Ni

1113944 hi

(7)









0 1 2 3 4 5 6 7 8 9 10 11k

055

060

065

070

075

080

085

aver

age I

oU





L(x c l g) 1N

Lcls(x c) + αLloc(x l g)( 1113857 (8)




and xpij 0 1 When x




iisind1

1113944

misin cxxywh

xkijsmoothL1 l

mi minus 1113954g

mj1113872 1113873 (9)

where

smoothL1(x) 05x

2|x|lt 1


⎧⎨

⎩ (10)




i isin d1

xp

ijlog 1113954cp

i1113872 1113873 minus 1113944iisind2

log 1113954c0i1113872 1113873 (11)


pi )1113936pexp(c










trai

nlo

ss

Epoch

5

10

15

20

25

01000 2000 3000 4000 5000 60000 7000








IoU Gt capDr

Gt cupDr

(12)



precision TP

TP + FP

recall TP

TP + FN

(13)


(a) (b)

(c) (d)

(e) (f )



AP 11139461

0P(R)dR (14)


mAP 1113936

ni1 APi

n (15)
















0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10





(a) (b)

(c) (d)

(e) (f )





(a) (b)

(c) (d)






0691 0832 0893

(a) (b) (c)

(d) (e) (f )

(g) (h) (i)

Figure 14 Continued


5 Conclusion


Data Availability




Acknowledgments


References
















(j) (k) (l)









































wak Sk

ar

radic

hak

Skar

radic

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

(4)


prime SkSk+1



SkSk+1




SkSk+1



ar



1ar

1113971


1ar


⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(5)












3times3Conv

3times3Conv

Feature map

class

box

(a)

3times3Conv

3times3Conv

Feature map

1024

1

1

1

1

1

1

256 256

1

1256

Eltw Sum

class

box

(b)





Wiprime

1Ni

1113944 wi

Hiprime

1Ni

1113944 hi

(7)









0 1 2 3 4 5 6 7 8 9 10 11k

055

060

065

070

075

080

085

aver

age I

oU





L(x c l g) 1N

Lcls(x c) + αLloc(x l g)( 1113857 (8)




and xpij 0 1 When x




iisind1

1113944

misin cxxywh

xkijsmoothL1 l

mi minus 1113954g

mj1113872 1113873 (9)

where

smoothL1(x) 05x

2|x|lt 1


⎧⎨

⎩ (10)




i isin d1

xp

ijlog 1113954cp

i1113872 1113873 minus 1113944iisind2

log 1113954c0i1113872 1113873 (11)


pi )1113936pexp(c










trai

nlo

ss

Epoch

5

10

15

20

25

01000 2000 3000 4000 5000 60000 7000








IoU Gt capDr

Gt cupDr

(12)



precision TP

TP + FP

recall TP

TP + FN

(13)


(a) (b)

(c) (d)

(e) (f )



AP 11139461

0P(R)dR (14)


mAP 1113936

ni1 APi

n (15)
















0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10





(a) (b)

(c) (d)

(e) (f )





(a) (b)

(c) (d)






0691 0832 0893

(a) (b) (c)

(d) (e) (f )

(g) (h) (i)

Figure 14 Continued


5 Conclusion


Data Availability




Acknowledgments


References
















(j) (k) (l)









































3times3Conv

3times3Conv

Feature map

class

box

(a)

3times3Conv

3times3Conv

Feature map

1024

1

1

1

1

1

1

256 256

1

1256

Eltw Sum

class

box

(b)





Wiprime

1Ni

1113944 wi

Hiprime

1Ni

1113944 hi

(7)









0 1 2 3 4 5 6 7 8 9 10 11k

055

060

065

070

075

080

085

aver

age I

oU





L(x c l g) 1N

Lcls(x c) + αLloc(x l g)( 1113857 (8)




and xpij 0 1 When x




iisind1

1113944

misin cxxywh

xkijsmoothL1 l

mi minus 1113954g

mj1113872 1113873 (9)

where

smoothL1(x) 05x

2|x|lt 1


⎧⎨

⎩ (10)




i isin d1

xp

ijlog 1113954cp

i1113872 1113873 minus 1113944iisind2

log 1113954c0i1113872 1113873 (11)


pi )1113936pexp(c










trai

nlo

ss

Epoch

5

10

15

20

25

01000 2000 3000 4000 5000 60000 7000








IoU Gt capDr

Gt cupDr

(12)



precision TP

TP + FP

recall TP

TP + FN

(13)


(a) (b)

(c) (d)

(e) (f )



AP 11139461

0P(R)dR (14)


mAP 1113936

ni1 APi

n (15)
















0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10





(a) (b)

(c) (d)

(e) (f )





(a) (b)

(c) (d)






0691 0832 0893

(a) (b) (c)

(d) (e) (f )

(g) (h) (i)

Figure 14 Continued


5 Conclusion


Data Availability




Acknowledgments


References
















(j) (k) (l)

































Wiprime

1Ni

1113944 wi

Hiprime

1Ni

1113944 hi

(7)









0 1 2 3 4 5 6 7 8 9 10 11k

055

060

065

070

075

080

085

aver

age I

oU





L(x c l g) 1N

Lcls(x c) + αLloc(x l g)( 1113857 (8)




and xpij 0 1 When x




iisind1

1113944

misin cxxywh

xkijsmoothL1 l

mi minus 1113954g

mj1113872 1113873 (9)

where

smoothL1(x) 05x

2|x|lt 1


⎧⎨

⎩ (10)




i isin d1

xp

ijlog 1113954cp

i1113872 1113873 minus 1113944iisind2

log 1113954c0i1113872 1113873 (11)


pi )1113936pexp(c










trai

nlo

ss

Epoch

5

10

15

20

25

01000 2000 3000 4000 5000 60000 7000








IoU Gt capDr

Gt cupDr

(12)



precision TP

TP + FP

recall TP

TP + FN

(13)


(a) (b)

(c) (d)

(e) (f )



AP 11139461

0P(R)dR (14)


mAP 1113936

ni1 APi

n (15)
















0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10





(a) (b)

(c) (d)

(e) (f )





(a) (b)

(c) (d)






0691 0832 0893

(a) (b) (c)

(d) (e) (f )

(g) (h) (i)

Figure 14 Continued


5 Conclusion


Data Availability




Acknowledgments


References
















(j) (k) (l)



































L(x c l g) 1N

Lcls(x c) + αLloc(x l g)( 1113857 (8)




and xpij 0 1 When x




iisind1

1113944

misin cxxywh

xkijsmoothL1 l

mi minus 1113954g

mj1113872 1113873 (9)

where

smoothL1(x) 05x

2|x|lt 1


⎧⎨

⎩ (10)




i isin d1

xp

ijlog 1113954cp

i1113872 1113873 minus 1113944iisind2

log 1113954c0i1113872 1113873 (11)


pi )1113936pexp(c










trai

nlo

ss

Epoch

5

10

15

20

25

01000 2000 3000 4000 5000 60000 7000








IoU Gt capDr

Gt cupDr

(12)



precision TP

TP + FP

recall TP

TP + FN

(13)


(a) (b)

(c) (d)

(e) (f )



AP 11139461

0P(R)dR (14)


mAP 1113936

ni1 APi

n (15)
















0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10





(a) (b)

(c) (d)

(e) (f )





(a) (b)

(c) (d)






0691 0832 0893

(a) (b) (c)

(d) (e) (f )

(g) (h) (i)

Figure 14 Continued


5 Conclusion


Data Availability




Acknowledgments


References
















(j) (k) (l)





































IoU Gt capDr

Gt cupDr

(12)



precision TP

TP + FP

recall TP

TP + FN

(13)


(a) (b)

(c) (d)

(e) (f )



AP 11139461

0P(R)dR (14)


mAP 1113936

ni1 APi

n (15)
















0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10





(a) (b)

(c) (d)

(e) (f )





(a) (b)

(c) (d)






0691 0832 0893

(a) (b) (c)

(d) (e) (f )

(g) (h) (i)

Figure 14 Continued


5 Conclusion


Data Availability




Acknowledgments


References
















(j) (k) (l)

































AP 11139461

0P(R)dR (14)


mAP 1113936

ni1 APi

n (15)
















0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10





(a) (b)

(c) (d)

(e) (f )





(a) (b)

(c) (d)






0691 0832 0893

(a) (b) (c)

(d) (e) (f )

(g) (h) (i)

Figure 14 Continued


5 Conclusion


Data Availability




Acknowledgments


References
















(j) (k) (l)



































0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10

0000

02

02

04

04

06

06

08

08

10Recall

Prec

ision


10





(a) (b)

(c) (d)

(e) (f )





(a) (b)

(c) (d)






0691 0832 0893

(a) (b) (c)

(d) (e) (f )

(g) (h) (i)

Figure 14 Continued


5 Conclusion


Data Availability




Acknowledgments


References
















(j) (k) (l)



































(a) (b)

(c) (d)

(e) (f )





(a) (b)

(c) (d)






0691 0832 0893

(a) (b) (c)

(d) (e) (f )

(g) (h) (i)

Figure 14 Continued


5 Conclusion


Data Availability




Acknowledgments


References
















(j) (k) (l)

































(a) (b)

(c) (d)






0691 0832 0893

(a) (b) (c)

(d) (e) (f )

(g) (h) (i)

Figure 14 Continued


5 Conclusion


Data Availability




Acknowledgments


References
















(j) (k) (l)

































5 Conclusion


Data Availability




Acknowledgments


References
















(j) (k) (l)































































Documents

NSD-SSD:ANovelReal-TimeShipDetectorBasedon