14
1 Cross-Sensor Adversarial Domain Adaptation of Landsat-8 and Proba-V images for Cloud Detection Gonzalo Mateo-Garc´ ıa, Valero Laparra, Dan L´opez-Puigdollers, Luis G´omez-Chova, Senior Member, IEEE Abstract —The number of Earth observation satellites carrying optical sensors with similar characteristics is constantly growing. Despite their similarities and the potential synergies among them, derived satellite prod- ucts are often developed for each sensor independently. Differences in retrieved radiances lead to significant drops in accuracy, which hampers knowledge and in- formation sharing across sensors. This is particularly harmful for machine learning algorithms, since gath- ering new ground truth data to train models for each sensor is costly and requires experienced manpower. In this work, we propose a domain adaptation trans- formation to reduce the statistical differences between images of two satellite sensors in order to boost the performance of transfer learning models. The proposed methodology is based on the Cycle Consistent Gen- erative Adversarial Domain Adaptation (CyCADA) framework that trains the transformation model in an unpaired manner. In particular, Landsat-8 and Proba- V satellites, which present different but compatible spatio-spectral characteristics, are used to illustrate the method. The obtained transformation significantly re- duces differences between the image datasets while pre- serving the spatial and spectral information of adapted images, which is hence useful for any general purpose cross-sensor application. In addition, the training of the proposed adversarial domain adaptation model can be modified to improve the performance in a specific remote sensing application, such as cloud detection, by including a dedicated term in the cost function. Results show that, when the proposed transformation is applied, cloud detection models trained in Landsat-8 data increase cloud detection accuracy in Proba-V. I. Introduction Over the last decade, the number of both private and public satellite missions for Earth observation has explode. According to the UCS satellite database [1], there are currently around 500 orbiting satellites carrying passive optical sensors (multispectral or hyperspectral). Whereas each sensor is to some extent unique, in many cases there are only small differences between them such as slightly different spectral responses, different ground sam- pling distances, or the different inherent noise of the instruments. Nevertheless, derived products from those images are currently tailored to each particular sensor, since models designed to one sensor often transfer poorly Manuscript received June 2020; GMG, VL, DLP and LGC are with the Image Processing Labo- ratory (IPL), University of Valencia, Spain. Web: http://isp.uv.es. E-mail: {gonzalo.mateo-garcia,valero.laparra,dan.lopez,luis.gomez- chova}@uv.es. This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO, TEC2016-77741-R, ERDF) and the European Space Agency (ESA IDEAS+ research grant, CCN008). Landsat-8 upscaled Proba-V image 2016-04-20 17:29 UTC 2016-04-20 18:33 UTC 2016-05-20 19:18 UTC 2016-05-20 20:51 UTC 2016-04-20 20:42 UTC 2016-04-20 20:09 UTC Fig. 1. Close-in-time acquisitions of Landsat-8 and Proba- V satellites. Landsat-8 image is transformed and upscaled to resemble the optical characteristics of the Proba-V sensor; however, differences in radiometry and texture between images still remain. First row: Missouri river in North America (2016- 04-20). Second and third row: North West Pacific coast, North America (2016-05-20 and 2016-04-20). to a different one due to those differences [2]. In order to transfer products across sensors we need to ensure that the underlying data distribution does not change from one sen- sor to the other. In machine learning this is a long-standing problem that is called data shift [3]: differences between the training and testing dataset distributions yield significant drops in performance. In order to address this problem, the field of domain adaptation (DA) proposes to build a transformation between the different distribution domains such that, when images are transformed from one source domain to other target domain, the distribution shift is reduced. In this work we focus on the problem where the training (source) distribution corresponds to images and ground truth from one satellite, whereas the testing (target) distri- bution corresponds to images from another sensor. Notice that this a very broad scenario that is found frequently in Remote Sensing (RS). Examples of products built on this manner include cloud masks [4], [5], but also land arXiv:2006.05923v1 [eess.IV] 10 Jun 2020

Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

1

Cross-Sensor Adversarial Domain Adaptation ofLandsat-8 and Proba-V images for Cloud Detection

Gonzalo Mateo-Garcıa, Valero Laparra, Dan Lopez-Puigdollers, Luis Gomez-Chova, Senior Member, IEEE

Abstract—The number of Earth observation satellitescarrying optical sensors with similar characteristics isconstantly growing. Despite their similarities and thepotential synergies among them, derived satellite prod-ucts are often developed for each sensor independently.Differences in retrieved radiances lead to significantdrops in accuracy, which hampers knowledge and in-formation sharing across sensors. This is particularlyharmful for machine learning algorithms, since gath-ering new ground truth data to train models for eachsensor is costly and requires experienced manpower.In this work, we propose a domain adaptation trans-formation to reduce the statistical differences betweenimages of two satellite sensors in order to boost theperformance of transfer learning models. The proposedmethodology is based on the Cycle Consistent Gen-erative Adversarial Domain Adaptation (CyCADA)framework that trains the transformation model in anunpaired manner. In particular, Landsat-8 and Proba-V satellites, which present different but compatiblespatio-spectral characteristics, are used to illustrate themethod. The obtained transformation significantly re-duces differences between the image datasets while pre-serving the spatial and spectral information of adaptedimages, which is hence useful for any general purposecross-sensor application. In addition, the training ofthe proposed adversarial domain adaptation model canbe modified to improve the performance in a specificremote sensing application, such as cloud detection,by including a dedicated term in the cost function.Results show that, when the proposed transformationis applied, cloud detection models trained in Landsat-8data increase cloud detection accuracy in Proba-V.

I. IntroductionOver the last decade, the number of both private and

public satellite missions for Earth observation has explode.According to the UCS satellite database [1], there arecurrently around 500 orbiting satellites carrying passiveoptical sensors (multispectral or hyperspectral). Whereaseach sensor is to some extent unique, in many casesthere are only small differences between them such asslightly different spectral responses, different ground sam-pling distances, or the different inherent noise of theinstruments. Nevertheless, derived products from thoseimages are currently tailored to each particular sensor,since models designed to one sensor often transfer poorly

Manuscript received June 2020;GMG, VL, DLP and LGC are with the Image Processing Labo-

ratory (IPL), University of Valencia, Spain. Web: http://isp.uv.es.E-mail: {gonzalo.mateo-garcia,valero.laparra,dan.lopez,luis.gomez-chova}@uv.es.

This work has been partially supported by the Spanish Ministryof Economy and Competitiveness (MINECO, TEC2016-77741-R,ERDF) and the European Space Agency (ESA IDEAS+ researchgrant, CCN008).

Landsat-8 upscaled Proba-V image2016-04-20 17:29 UTC 2016-04-20 18:33 UTC

2016-05-20 19:18 UTC 2016-05-20 20:51 UTC

2016-04-20 20:42 UTC 2016-04-20 20:09 UTC

Fig. 1. Close-in-time acquisitions of Landsat-8 and Proba-V satellites. Landsat-8 image is transformed and upscaled toresemble the optical characteristics of the Proba-V sensor;however, differences in radiometry and texture between imagesstill remain. First row: Missouri river in North America (2016-04-20). Second and third row: North West Pacific coast, NorthAmerica (2016-05-20 and 2016-04-20).

to a different one due to those differences [2]. In order totransfer products across sensors we need to ensure that theunderlying data distribution does not change from one sen-sor to the other. In machine learning this is a long-standingproblem that is called data shift [3]: differences between thetraining and testing dataset distributions yield significantdrops in performance. In order to address this problem,the field of domain adaptation (DA) proposes to build atransformation between the different distribution domainssuch that, when images are transformed from one sourcedomain to other target domain, the distribution shift isreduced.

In this work we focus on the problem where the training(source) distribution corresponds to images and groundtruth from one satellite, whereas the testing (target) distri-bution corresponds to images from another sensor. Noticethat this a very broad scenario that is found frequentlyin Remote Sensing (RS). Examples of products built onthis manner include cloud masks [4], [5], but also land

arX

iv:2

006.

0592

3v1

[ee

ss.I

V]

10

Jun

2020

Page 2: Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

2

use and land cover classification [6], vegetation indexesretrieval [7], or crop yield estimation [8]. The goal ofdomain adaptation is thus to find a transformation thatallows models working on a given satellite (source domain)to work accurately on another one (target domain).

As a representative case study, in this work, we focuson the Landsat-8 [9] and Proba-V [10] satellites. Transferlearning across these two sensors is particularly interestingsince Landsat is a pioneering RS satellite program with astrong and well-established community and, hence, a goodnumber of manually annotated datasets with cloud masksare publicly available, which could be very valuable todevelop cloud detection models for Proba-V. Nevertheless,in order to build a model for Proba-V using Landsat-8 training data, differences in the imaging instrumentson-board Landsat-8 and Proba-V must be taken in toaccount. On the one hand, the Operational Land Imagerinstrument (OLI), on board of Landsat-8, measures radi-ance in 9 bands in the visible and infrared part of theelectromagnetic spectrum at 30m resolution. On the otherhand, the Proba-V instrument retrieves 4 bands in theblue, red, near-infrared and short-wave infrared at 333mresolution (full swath). Compared to Landsat-8, Proba-Vhas a much wider swath, which yields a more frequentrevisiting time of around two days, whereas Landsat-8revisit time ranges between 7 to 14 days. Figure 1 showsthree calibrated top of atmosphere (TOA) reflectanceLandsat-8 and Proba-V images from the same locationacquired with a difference of 30 to 90 minutes. We cansee that, despite showing the same bands for Landsat-8and Proba-V, and upscaling the Landsat-8 image to theProba-V resolution (cf. section III-A), differences betweenthe images still remain [11]. In particular, Proba-V imagesare more blueish, due to saturation effects in the bluechannel, and are more noisy [12]. This higher noise andlower spatial resolution can be appreciated for example inthe bottom row of Fig. 1, which shows a mountainous areain British Columbia where details in the Landsat-8 imageare sharper than in the Proba-V one. Hence, products builtusing data in the Landsat-8 domain, such as the clouddetection model proposed in [5], show a drop in detectionaccuracy when directly applied to Proba-V images.

In this work, we propose a DA methodology based onthe state-of-the-art work in DA for computer vision ofHoffman et al., CyCADA [13], to the remote sensing field.In particular, we propose to use cycle consistent genera-tive adversarial networks (cycleGAN [14]) to find a DAtransformation from the Proba-V domain to the Landsat-8 upscaled domain that removes noise and saturation ofProba-V images while preventing artifacts on the resultingimages. One of the main advantages of the proposedmethodology is that it is unpaired, i.e. it does not requirea paired dataset of simultaneous and collocated Landsat-8 and Proba-V images to be trained. This is crucial forapplications such as cloud detection since clouds presenceand location highly varies between acquisitions. This canalso be viewed in Fig. 1, even though acquisitions are theclosest possible in time for Landsat-8 and Proba-V, cloudlocation changes significantly between the acquisitions.This would make a paired approach unfeasible. Following

the proposed methodology, Proba-V images are enhancedand are shown to be statistically more similar to theLandsat-8 upscaled ones. In addition, from a transferlearning perspective, it is important to remark that thecloud detection models applied to Proba-V are trainedusing only Landsat-8 images and their ground truth.In this context, a boost in cloud detection accuracy isshown for Proba-V when the proposed adversarial domainadaptation transformation is applied.

The rest of the paper is organized as follows: in sec-tion II, we discuss related work in cloud detection anddomain adaptation in RS; in section III, we detailed theproposed methodology to upscale Landsat-8 images and totrain the domain adaptation network; section IV describesthe Proba-V and Landsat-8 datasets where experimentsare carried out. Finally, sections V and VI contain theexperimental results and discussion and conclusions, re-spectively.

II. Related workThere is a huge amount of work in both cloud detection

and domain adaptation in the remote sensing literature.We discuss related work in both fields with a particularfocus on approaches that deal with data coming fromdifferent sensors.

A. Transfer learning for Cloud detectionCloud detection has been lately dominated by deep

learning approaches where, given a sufficiently large corpusof manually annotated images with the correspondingground truth cloud masks, a network based on spatialconvolutions is trained by back-propagation. Fully convo-lutional neural networks (FCNN) [15], most of them basedon the U-Net architecture [16], produce very accurateresults and have the advantage that they can be appliedto images of arbitrary size with a fast inference time.Works such as Jeppesen et al. [17], Mohajerani and Sa-hedi [18], [19], Li et al. [20], or Yang et al. [21] tackle clouddetection in Landsat-8 using Fully Convolutional NeuralNetworks trained in publicly available manually annotateddatasets. They all show very high cloud detection accuracyoutperforming the operational Landsat-8 cloud detectionalgorithm, FMask [22]. Hence, our work seeks to transferthose accurate cloud detection models to other satellitedata with a minimal drop in performance. There are somevery recent works that propose to transfer a FCNN clouddetection model between different sensors. For instance, inWieland et al. [23], a FCNN is trained with contrast andbrightness data augmentation in the Landsat-8 SPARCSdataset [24] and it is tested on Sentinel-2, Landsat-7 andLandsat-8 images. Results show similar performance ofthe model on the three sensors; which suggest that theSentinel-2 and the Landsat sensors are very similar andthus the data shift problem is not so relevant. On the otherhand, in the work of Segal et al. [25], a FCNN is trained ona manually annotated collection of World-View-2 imagesover the Fiji islands and tested in both World-View-2and Sentinel-2 imagery. In this case, a significant dropin performance is observed in the Sentinel-2 domain; in

Page 3: Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

MATEO-GARCIA ET AL.: CROSS-SENSOR ADVERSARIAL DOMAIN ADAPTATION FOR CLOUD DETECTION 3

order to correct this gap, authors propose a simple domainadversarial method which obtain good results but it isstill far from the accuracy obtained in Word-View-2 data.The work of Shendryk et al. [26] also proposes transferlearning, in this case using PlanetScope and Sentinel-2imagery. The proposed network classifies patches as cloudyor clear instead of providing a full segmentation maskat pixel level. Nevertheless, an small gap in performanceis observed between results in the source domain (Plan-etScope) and the target domain (Sentinel-2). Finally, inour previous work [5], we showed that transfer learningfrom Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22]model on Landsat-8 and surpassing the operational clouddetection model [27] for Proba-V, respectively. However, asignificant gap between transfer learning approaches andstate-of-the-art models trained with data from the samedomain still exists, which is the focus of the present work.

B. Domain adaptationThe remote sensing community has traditionally ad-

dressed DA taking advantage of a deep understanding ofthe imaging sensors and exploiting their physical and opti-cal characteristics to provide well-calibrated products [28].Despite the efforts to provide a good calibration, smalldifferences are found in retrieved radiance values due todifferent spectral response functions, saturation effects,mixed pixels, etc [11], [29], [30]. To this end several workspropose sensor inter-calibrations or harmonizations to cor-rect biases between the bands of different sensors [31]–[35].These approaches train models in paired data using eitherreal spectra of both satellites (retrieved in same locationand close in time) or simulated radiances. As mentionedearlier, paired approaches cannot be applied to cloud de-tection due to the extreme variability of cloud location be-tween acquisitions. Among unpaired approaches histogrammatching [36] aligns the observed radiance distribution ofboth satellites using the cumulative density function of thedata. Histogram matching is fast and reliable and thus weuse it as a baseline to compare our method. More com-plex unpaired methods include multivariate matching [37],graph matching [38] or manifold alignment [39].

All the methods discussed so far only focus on thespectral information of images disregarding the spatialdimension. In order to account for spatial changes, recentworks propose DA using convolutional neural networks(CNN). Most of these works use Generative AdversarialNetworks (GAN) [40] to align source and target distri-butions. Those works could be divided in feature-levelDA and pixel-level DA. In feature-level or discrimina-tive DA [41], [42] the model to be transferred is jointlytrained with its normal loss and to make its internalrepresentations (i.e. activations at some layers) invariantto the input distribution. Hence, feature level DA requiresretraining the model with that extra penalty. An exampleof feature level DA is the work of Segal et al. [25] previouslydiscussed in section II-A. In pixel-level DA [43], [44] (alsocalled image-to-image DA), extra networks are trained totransform images between domains. Hence. pixel-level DAis independent of the transferred model and thus it could

Fig. 2. Transfer learning and adaptation scheme: Landsat-8 andProba-V datasets and how they are transformed between thethree different domains. The transformations look for adapta-tion between the domains: U is the upscaling transformationapplied to Landsat-8 to resemble the Proba-V instrumentcharacteristics (sec. III-A); and A adapts from the Proba-Vdomain to the Landsat-8 upscaled domain (sec. III-C).

be applied to other problems with same inputs. Our workfalls into the pixel-level DA framework: we assume thatthe cloud detection model trained in the source domain isfixed and thus we focus on finding a DA transformationfrom the target to the source domain (section III).

Regarding the definition and the types of domains inremote sensing, works such as [45]–[48] consider data froma single sensor, where the source and target domains arerepresented by images from different locations or differenttime acquisitions. DA works where domains are repre-sented by different sensors are scarce; for instance, thework of Benjdira et al. [49] tackles urban segmentationin aerial imagery of two cities acquired with two differentcameras. They obtain good results despite differences inspectral bands and spatial resolution of the instrumentsare not taken into account. Even though there are someworks using GANs to transfer learning between differentsensors, most of them involve training the classifiers usingsome labeled samples from the target domain. This isthe case of works that tackle DA between SAR andoptical images [50]–[52]. It is important to remark that weare dealing with unsupervised domain adaptation [2], [41](also known as transductive transfer learning [53]), whichassumes there is no labeled data available in the targetdomain.

III. MethodologyWe assume two independent datasets from two different

sensors are given, but we only have labels for one dataset.The main idea is to be able to use the data from the labeleddataset in order to design algorithms to solve problems inthe unlabeled dataset.

In our particular case, we have images for Landsat-8(L8) with the corresponding ground truth cloud masks (bi-nary labels identifying cloudy or clear pixels), {XL8, yL8};and we only have Proba-V (PV) images without groundtruth, XPV. Therefore, we want to perform cloud detectionin XPV using algorithms trained with {XL8, yL8}. Sincewe know the technical specifications of Landsat-8 andProba-V, we can design an upscaling algorithm to convertimages captured from Landsat-8 to resemble Proba-Vspectral and spatial characteristics. This upscaling couldwork quite well, and actually classical remote sensing

Page 4: Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

4

approaches follow this methodology to combine or performtransfer learning across different existing satellites. How-ever, this upscaling transformation is not perfect since itis based on the pre-launch characterization of the instru-ments and is always susceptible to be affected by diverseuncertainty sources. Therefore, an extra adaptation stepcould be used in order to transform the Proba-V imagesbefore applying the transfer learning algorithms. In thiswork, we are going to explore how to design this extrastep by using Generative Adversarial Networks (GANs).

In Fig. 2, we show the proposed adaptation scheme. Theupscaling transformation, U , converts the Landsat-8 la-beled data to a domain where it has similar spatio-spectralproperties than Proba-V (i.e. the same number of bandsand same spatial resolution). We can use the Landsat-8upscaled (LU) data in order to train an algorithm, in ourcase we are going to design a cloud detection algorithm us-ing fully convolutional neural networks. While this modelcould be applied directly on Proba-V images, we will showthat an extra adaptation step, A, applied to the Proba-V images to resemble even more the upscaled domain,could improve the similarity between the training and thetesting data and therefore improve the performance of thecloud detection algorithm. Note that, if no adaptation isused, A would be the identity function. In the followingsubsections, we detail both the transformation U , based onthe instruments characteristics, and the transformation A,which is based on CycleGANs.

A. Upscaling transformation from Landsat-8 to Proba-VIn this section, we describe the upscaling transformation

(U transformation in Fig. 2). In a standard transferlearning approach from one sensor to another, the firststep is to transform Landsat-8 images to resemble Proba-Vimages in terms of spectral and spatial properties. Figure 3shows the proposed upscaling from Landsat-8 to Proba-V using the instrumental characteristics of both sensors.First, we select the spectral bands from Landsat-8 thatoverlap with Proba-V in terms of the spectral responsefunctions (SRF) of both instruments. Figure 4 shows theSRF of overlapping bands in Landsat-8 (dashed) andProba-V (solid). SWIR and RED bands present the bestagreement. However, the NIR SRF of Proba-V is widerand its peak is not aligned with Landsat-8 B5 band, whichmight led to differences in the the retrieved radiance.Finally, the BLUE band of Proba-V overlaps with twodifferent Landsat-8 bands. Therefore, the contribution ofLandsat-8 B1 and B2 bands is weighted according to theoverlapping area of the SRFs: 25% and 75% for B1 andB2, respectively.

Then, the selected bands of the Landsat-8 image arescaled to match the spatial properties of Proba-V. The30m resolution Landsat-8 bands are upscaled to the 333mresolution of Proba-V. This upscaling takes into accountthe optical characteristics of the Proba-V sensor and there-sampling of the 333m product described in [10]. First,the point spread function (PSF) of each Proba-V spectralband is used to convert the Landsat-8 observations to thenominal Proba-V spatial resolution at nadir. The groundsampling distance (GSD) for the Proba-V center camera

is about 96.9m for the BLUE, RED and NIR channels,while the SWIR center camera resolution is 184.7m [10]The SWIR PSF is about twice as wide as the PSF ofthe other bands, which stresses the fact that a distinctspatial adaptation might be applied to each band. ThePSFs of the bands are modeled as 2 dimensional Gaussianfilters, which are applied to the 30m resolution Landsat-8 bands. The filtered image is upscaled to the nominal90m resolution at nadir by taking 1 out of every 3 pixels.Finally, Lanczos interpolation is applied to upscale theimage to the final 333m Proba-V resolution. Notice thatLanczos is the interpolation method used at the Proba-V ground segment processing to upscale the acquired rawProba-V data to the 333m Plate Caree grid [10]. Groundtruth labels, yL at 30m, must were also scaled to get aLandsat-8 upscaled dataset at 333m: {XLU, yLU}.

B. Transfer Cloud Detection ModelThe cloud detection model trained on the Landsat-

8 upscaled dataset is a FCNN classifier based on thesimplified U-Net architecture described in [5]. This modelis trained in the Landsat-8 Upscaled dataset ({XLU, yLU});hence, it takes as input a 4-band 333m resolution imageand produces a binary cloud mask. Therefore, it couldbe applied directly to Proba-V images. Nevertheless, asexplained before, statistical differences between Landsat-8 upscaled and Proba-V images make that the perfor-mance of this model is not as good as expected. Thiseffect is related to the different sensor spectral responsefunctions, saturation effects, radiometric calibration, mod-ulation transfer functions, or mixed pixels. For instance, asshown in Fig. 1, Proba-V contains many saturated pixels,specially in the blue channel, which is a known issue. Thissuggests that an extra domain adaptation step could beadded to improve the transfer learning results reportedin [5].

C. Generative Adversarial Domain AdaptationIn this section, we describe the training process for the

extra adaptation transformation (A in Fig. 2) that wepropose to improve the performance of the transferredmodels. This training process is based on the GenerativeAdversarial Network (GAN) [40] framework.

The main idea of GANs is to train two networks, agenerative one and a discriminative one, with oppositeobjectives, simultaneously. This adversarial training fits adata generator that minimizes the Jensen-Shannon diver-gence between the real and the generated data distribu-tion. An extension of the original GANs formulation, theconditional GANs [54], was proposed to train a model thatgenerates samples from a conditional distribution. Oneapplication of conditional GANs is the Generative Adver-sarial Domain Adaptation (GADA) proposed in [13], [42],[44]. In those works, the conditional GANs formulationwas modified to solve domain adaptation problems.

Probably, the most complete approximation for Domainadaptation based on GANs is the one proposed in Cy-CADA [13]. Unlike the classical GANs, where adaptationis performed in one direction only, this approach proposes

Page 5: Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

MATEO-GARCIA ET AL.: CROSS-SENSOR ADVERSARIAL DOMAIN ADAPTATION FOR CLOUD DETECTION 5

450 500 550 600 650 700 750 800 850 900 950 1400 1450 1500 1550 1600 1650 1700

0.0

0.2

0.4

0.6

0.8

1.0

Landsat-811 spectral bands 30m spatial resolution 4 spectral bands

333m spatial resolution

Landsat-8 upscaled

Spectral Transformation

(SRF)

SpatialTransformation

(PSF)

SpatialInterpolation

(Lanczos)

4 spectral bands90m spatial resolution

100

0.2

0.4

100

PSF

- BLU

E

0.6

0.8

0

meters

1

0-100 -100

Proba-V characteristics

Fig. 3. Upscaling transformation (U in Fig. 2) applied to Landsat-8 in order to resemble the Proba-V instrument characteristics.

450 550 650 750 850 950 1050 1150 1250 1350 1450 1550 16500.0

0.2

0.4

0.6

0.8

1.0Proba-V BLUEProba-V REDProba-V NIRProba-V SWIRLandsat 8 B1 CoastalAerosolLandsat 8 B2 BlueLandsat 8 B4 RedLandsat 8 B5 NIRLandsat 8 B6 SWIR1

Fig. 4. Spectral response of Landsat-8 and Proba-V channels.

a double simultaneous adaptation between the two do-mains. This allows to include several consistency terms inorder to impose restrictions on the two adaptation direc-tions. Our approach has a similar structure than CyCADA(Fig. 5). It has two Generators and two Discriminators:GLU→PV, GPV→LU, DPV, DLU. We are interested in usingGPV→LU to adapt the Proba-V images to better matchthe upscaled ones that have been used to train the clouddetection algorithm, i.e. A ≡ GPV→LU.

On the one hand, the discriminators are trained tominimize the binary cross-entropy loss between the realand the generated images:

LD(DLU) =∑

i

− log(D(XiLU))+

− log(1−D(GPV→LU(XiPV)))

LD(DPV) =∑

i

− log(D(XiPV))+

− log(1−D(GLU→PV(XiLU)))

On the other hand, the generators are trained to foolthe discriminators by minimizing the adversarial loss:

LGAN (GPV→LU) =∑

i

− log(D(GPV→LU(XiPV)))

LGAN (GLU→PV) =∑

i

− log(D(GLU→PV(XiLU)))

In this work, in order to ensure consistency between thereal and the generated images, three extra penalties areadded to the standard GAN generator loss: the identityconsistency loss, the cycle loss, and the segmentation

consistency loss. Firstly, the identity consistency loss,introduced in our previous work [4], is added to makethe TOA reflectance of the input similar to those in theoutput:

Lid(GPV→LU) =∑

i

‖XiPV −GPV→LU(Xi

PV)‖1

Lid(GLU→PV) =∑

i

‖XiLU −GLU→PV(Xi

LU)‖1

Secondly, the cycle consistency loss, proposed in [54], isadded to both generators to force them to act approxi-mately as inverse functions one of each other:

Lcyc(GPV→LU, GLU→PV) =∑i

‖XiPV −GLU→PV(GPV→LU(Xi

PV))‖1

+∑

i

‖XiLU −GPV→LU(GLU→PV(Xi

LU))‖1

Finally, we additionally include a segmentation consistencyloss that takes advantage of the cloud detection modeltrained in the LU domain. We apply this model to both LUand PV images even though the cloud detection classifier(fLU) is trained using only LU images, the idea is that theLU classifier can act as a rough supervisor in the Proba-V domain. This approach is also taken in CyCADA [13].The selected semantic segmentation loss is the Kullback-Leibler divergence between the cloud probabilities of themodel fLU applied to real and generated images:

Lseg(GPV→LU) =∑

i

KL(fLU(XiPV)|fLU(GPV→LU(Xi

PV)))

Lseg(GLU→PV) =∑

i

KL(fLU(XiLU)|fLU(GLU→PV(Xi

LU)))

Therefore, the final loss of the generators is the weightedsum of the five described losses:L(GPV→LU) = λGANLGAN (GPV→LU) + λidLid(GPV→LU)

+ λcycLcyc(GPV→LU, GLU→PV)+ λsegLseg(GPV→LU)

L(GLU→PV) = λGANLGAN (GLU→PV) + λidLid(GLU→PV)+ λcycLcyc(GPV→LU, GLU→PV)+ λsegLseg(GLU→PV)

Page 6: Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

6

Proba-V

Landsat-8 upscaled

real/fakeprobabilityLU

real/fakeprobabilityLU

real/fakeprobabilityPV

real/fakeprobabilityPV

Cloud probability

Cloudprobability

Cloud probability

Cloud probability

Fig. 5. Scheme of the forward passes for the training procedure of the proposed cycle consistent adversarial domain adaptation method.The four networks (GPV→LU, GLU→PV, DPV, DLU) have a different color. Losses are depicted with circles and their fill color correspondsto the color of the network that they penalize.

The weight parameters are set to λcyc = λid = 5 andλseg = λGAN = 1, so that losses are of the same magni-tude. In addition, the two discriminators are regularizedusing a 0-centered gradient penalty with a weight of10 [55]. In section V, we conduct several experiments bysetting some of these weights to zero in order to quantifythe importance of each of these terms. In addition, noticethat by setting some of these hyper-parameters to zero weobtain different adversarial domain adaptations proposalsin the literature. In particular, if we set λid = 0 we getthe original CyCADA of Hoffman et al. [13]. By settingλcyc = λseg = 0 we get the approach of our previouswork [4]. Finally, by setting λcyc = λseg = λid = 0,we get the classical GAN approach. Details about thetraining procedure and particular network architectures ofthe generators G (fully convolutional neural networks) anddiscriminators D (convolutional neural networks) of theproposed adaptation model can be found in Appendix A.Additionally, the implemented code is available at https://github.com/IPL-UV/pvl8dagans.

From both a methodological and an operational per-spective, the proposed approach has an important benefit:it does not require simultaneous and collocated pairs

of Landsat-8, XiLU, and Proba-V, Xi

PV, images. Havingcoincident pairs from sensors on different platforms wouldbe impossible in our case. Note that clouds presenceand location within an image highly vary even for smalltime differences. This problem prevents the use of otherapproaches such as Canonical Correlation Analysis [56] ordirectly learning a generic transformation from Xi

PV toXi

LU [31].

IV. Manually annotated DatasetsTransfer learning from Landsat-8 to Proba-V is sup-

ported by the fact that there are several open accessdatasets with manually labeled clouds for the Landsat-8 mission. In this work, we use three of them that havea large coverage of acquisitions across different dates,latitudes and landscapes. The Biome dataset [57], releasedfor the Landsat-8 validation study of Foga et. al [58], isthe largest among them. It consists of 96 full acquisitionscovering the different biomes on Earth. The SPARCSdataset, collected in the study of Hughes and Hayes [59],contains 80 1,000×1,000 patches from different Landsat-8acquisitions. Finally, the 38-Clouds dataset of Mohajeraniand Saeedi [18] has 38 full scenes mostly located in North

Page 7: Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

MATEO-GARCIA ET AL.: CROSS-SENSOR ADVERSARIAL DOMAIN ADAPTATION FOR CLOUD DETECTION 7

� PV24 [5] � Biome [57] � 38-Clouds [18]� SPARCS [24]

Fig. 6. Location of Landsat-8 and Proba-V products with manuallyannotated ground truth cloud mask. For the Biome [57] and 38-Clouds dataset [18] we additionally downloaded Proba-V images fromthe same location and acquisition time if available.

America. Images and ground truth cloud masks from thesedatasets have been upscaled, to match Proba-V spectraland spatial properties, following the procedure describedin section III-A. Hence, when we refer to those datasets,we assume 4-band images and ground truth at 333m. Fortesting the proposed cloud detection approach in Proba-V, since there are not publicly available datasets, we usethe PV24 dataset created by the authors in [60] andextensively curated in [5]. The PV24 dataset contains24 full Proba-V images at 333m resolution and theircorresponding manually annotated cloud masks. Figure 6shows the locations of the products of the aforementioneddatasets. It is important to remark that we only use datafrom the Biome dataset for training the cloud detectionmodels based on FCNN (section III-B); the other threedatasets (SPARCS, 38-Cloud and PV24 ) are only usedfor testing the models.

On the other hand, to train the proposed domainadaptation model based on CycleGANs (section III-C),a set of 181 Proba-V products from the same loca-tions and season as the Biome dataset has been se-lected. Using those Proba-V images and the Landsat-8 upscaled images from the Biome dataset, we createdthe Biome Proba-V pseudo-simultaneous dataset, whichcontains 37,310 pairs of patches of 64×64 pixels, usedto train the proposed DA method. Notice that, in thisdataset, the pairs of images, one coming from Proba-Vand the other from Landsat-8, are images from the samelocation and close-in-time acquisitions when available1.The same approach is followed to create the 38-CloudsProba-V pseudo-simultaneous dataset, which is only usedfor testing the domain adaptation results. Images of thisdataset, together with the results of the proposed domainadaptation and cloud detection models, are available athttps://isp.uv.es/projects/cloudsat/pvl8dagans.

Finally, it is important to point out that images in thiswork are operational level-1TP products for Landsat-8 andlevel-2A products for Proba-V. These products have been

1Some images from the Biome dataset are previous to the begin-ning of the Proba-V mission catalog; in this cases, we use imagesfrom same day of year in the next year.

pre-processed to top of atmosphere (TOA) reflectance forboth Landsat-8 [9] and Proba-V [61].

V. Experimental ResultsThis section is divided in two parts. In the first one,

we analyze the radiometric and spatial properties of re-sulting images. The purpose is to assess the quality of theproposed transformation. We show that the proposed DAtransformation produces reliable images (i.e. images with-out artifacts) that are statistically similar to Landsat-8upscaled images. Additionally we show that pixels flaggedas good radiometric quality in Proba-V are much lesschanged by the proposed DA transformation. In the secondpart, we analyze the impact of DA on the cloud detec-tion performance. Cloud detection results in the sourcedomain (Landsat-8 upscaled) are compared with results inthe target domain (Proba-V), with and without the DAtransformation. In addition, we conduct an ablation studyto show the relative importance of each of the proposedloss terms included to fit the generator network.

A. Domain Adaptation of input imagesAs explained in section IV, we trained the DA method

described in section III-C using Proba-V and Landsat-8 upscaled patches from the Biome Proba-V pseudo-simultaneous dataset. The generator and discriminatornetworks are trained simultaneously with mini-batchstochastic gradient descent (see details in sec. A). After-wards, the trained Proba-V to Landsat-8 upscaled gener-ator GPV→LU (A in Fig. 2) is evaluated in the 38-CloudsProba-V pseudo-simultaneous dataset, (i.e. the GPV→LUnetwork is applied to all Proba-V images in the dataset).Figure 7 shows the distribution of TOA reflectance valuesfor each of the bands without the domain adaptation step(green) and after applying GPV→LU (orange) for all theProba-V images in the dataset. One of the interestingresults from these distributions is that the characteristicsaturation in the Blue and Red bands of Proba-V disap-pears in the adapted images. In addition, the shape ofthe distribution of the adapted data is more similar tothe shape of the pseudo-simultaneous Landsat-8 upscaledimages (blue).

Visual examples of the trained DA network are shown inFig. 8. We show in the first column the Proba-V image, inthe second one the adapted Proba-V image using our DAmethod, and in the third column the pseudo-simultaneousLandsat-8 upscaled image (LU). In the first row we cansee that the location of clouds in the Proba-V image arepreserved after the transformation while saturated bluevalues are removed. This provides a cloud appearance(and radiance) more similar to the pseudo-simultaneousLU images (third column). In the second row, we cansee slightly sharper edges in the DA transformed imagecompared to the original Proba-V image. This is becausethe Landsat-8 upscaled images have components of higherspatial frequency than Proba-V. This was also point out inthe pair of images at the bottom in Fig. 1. In order to testthis hypothesis, 64×64 pixels patches were extracted fromthe 38-Clouds Proba-V pseudo-simultaneous dataset. For

Page 8: Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

8

0.0 0.2 0.4 0.6 0.8 1.0

BLUE

0.0 0.2 0.4 0.6 0.8 1.0

RED

0.0 0.2 0.4 0.6 0.8 1.0

NIR

0.0 0.1 0.2 0.3 0.4 0.5 0.6

SWIR

L8 Upscaled PV->LU Proba-V

Fig. 7. TOA refectance distribution on each of the spectral bandsfor Proba-V (green) images, Proba-V images transformed using theproposed DA method (orange) and pseudo-simultaneous Landsat-8Upscaled images (blue). Values measured across all image pairs inthe 38-Clouds pseudo-simultaneous dataset

each patch we computed the 2D Fast Fourier Transformfor each of the four spectral bands. Finally, the amplitudeof the signal at each frequency is converted to decibels(dB) and averaged across all patches (Fig. 9). As pointedout before, Proba-V images have less high frequency com-ponents, whereas the average frequency amplitudes forthe adapted images are more similar to the Landsat-8upscaled ones. This highlights the spatio-spectral natureof the proposed method: it does not only learn spectralchanges between bands (colors) but also spatial relations.

Finally, Fig. 10 shows the difference in TOA reflectancebetween the original Proba-V images and the adaptedones (XPV −GPV→LU(XPV)) for all the pixels in the 38-Clouds Proba-V pseudo-simultaneous dataset and for eachof the 4 Proba-V bands. In this case, we have stratifiedthe pixels using the per pixel radiometric quality flagavailable in the status map (SM) of Proba-V products(see pag. 67 of Proba-V User Manual [61]). This qualityindicator is a binary mask for each of the 4 Proba-Vchannels; pixels are flagged as bad quality for different

reasons, including detector saturation [12]. In the Proba-V images of the 38-Clouds Proba-V pseudo-simultaneousdataset, approximately 30% of pixels in the blue bandhave a reported bad quality, in contrast to the 5% for thered band and 0.5% for the NIR and SWIR. One can seethat differences in TOA reflectance for saturated pixels ishigher whereas good quality pixels change much less.

B. Domain Adaptation for Cloud DetectionIn order to evaluate the DA methodology for the cloud

detection application, we trained a FCNN in the Biomedataset. In order to account for the uncertainty at theweights initialization and ordering of the batches, wetrained 10 copies of the network using different randomseed initializations. This procedure is also followed in [5].Table I shows the cloud detection accuracy on the sourcedomain by using the SPARCS and 38-Clouds datasets. Wesee that overall the accuracy is relatively high and net-works are not much affected by the weights initialization.

TABLE IAccuracy for test images in the source Landsat-8 Upscaleddomain. Results averaged over 10 U-Net networks trained

with different random initializations.

Dataset min max mean std

SPARCS 91.67 92.46 92.12 0.2038-Clouds 90.32 91.92 91.31 0.47

Table II shows the results in the target domain (Proba-V) using the PV24 dataset with the trained DA transfor-mation GPV→LU (called full DA in the table) and withoutit (called no DA). We also include the results of theablation study, where we have set some of the weights ofthe generator losses to zero and results using histogrammatching [36] for domain adaptation as in [47]. In addition,results are compared with the FCNN trained in originalProba-V images and ground truths (PV-trained), whichserves as an upper bound reference, and with the opera-tional Proba-V cloud detection algorithm (v101) [27]. Firstof all, we see that the proposed DA method increases themean overall accuracy and reduce the standard deviationof the metrics compared with direct transfer learning (noDA) or with adjusting the reflectance of each band withhistogram matching. Secondly, we see minor reductions inaccuracy when some losses are not used, in particular wesee that the the segmentation loss and the identity losshave a strong influence in the result. We also see thatwhen the segmentation and the identity loss are set to zero(λseg = 0, λid = 0) the cloud detection accuracy decreasesabruptly. This is because without those losses generatorsare not constrained to maintain original radiance valuesand colors. A quick look at the generated images displayedin Fig. 11 shows that the generators without these two lossterms could invert typical clear and cloud spectra.

Figure 12 shows the cloud detection accuracy with theproposed DA transformation (full DA) and without DA(no DA). In this case, results are stratified using thequality flag available in the SM band of Proba-V. Inthe PV-24 dataset, 16.13% of pixels have at least onevalue in a band flagged as having bad quality. Within

Page 9: Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

MATEO-GARCIA ET AL.: CROSS-SENSOR ADVERSARIAL DOMAIN ADAPTATION FOR CLOUD DETECTION 9

Proba-V (PV) PV → LU L8-Upscaled (LU)

Fig. 8. Left: Proba-V images. Center: Proba-V adapted with the GPV→LU transformation (A in Fig.2). Right: Landsat-8 upscaled image (LUin Fig.2). Top: 1 day difference acquisitions from Nueva Esparta island in the Caribbean sea. Bottom: four minutes of difference acquisitionsfrom Lake Made in North America.

TABLE IIAccuracy of different DA approaches for cloud detection

over the PV24 dataset. Results averaged over 10 FCNNnetworks trained with different random initialization.

min max mean std

PV-trained [5] 94.83 95.15 94.98 0.09

full DA 91.87 93.10 92.42 0.37λid = 0 91.87 93.02 92.34 0.36λseg = 0 91.15 91.86 91.47 0.24λcyc = 0, λseg = 0 90.35 91.26 90.79 0.32λseg = 0, λid = 0 33.99 36.10 35.20 0.63Histogram Matching [36] 89.09 91.31 90.18 0.72

no DA 89.05 91.88 90.41 0.82

Proba-V operational v101 [27] - - 82.97 -

bad quality pixels, 96.8% are cloudy pixels. We see that,on the one hand, if the DA transformation is not used,the accuracy of the networks wildly varies specially forbad quality pixels. These differences in accuracy of modelstrained on the same data indicate that the networks areextrapolating in those regions. On the other hand, whenthe DA transformation is used, all the networks identifycorrectly most of the bad quality pixels.

Finally, Fig. 13 shows some cherry-picked examplesof cloud detection with the proposed methodology. Oneach row we show: the pseudo-simultaneous Landsat-8Upscaled image, the original Proba-V image, the Proba-V image after the domain adaptation GPV→LU, the cloud

mask using as input the DA image, and the cloud maskobtained without the domain adaptation. First row showsa completely cloudy image with several blue saturatedpixels in the original Proba-V image. We see that the DAimage removes those saturated values and helps the clouddetection model to correctly predict all pixels. We see that,if no DA transformation is employed, the saturated valuesin Proba-V hinder the performance of the model withsome cloud miss-classifications. The second row shows anacquisition over the Canyon de Chelly, in North America.We see again that saturated values in the blue banddisappear after the DA transformation; in this case, thishelp to reduce the false positives in the bottom and upperleft part of the image. In the third row, we see an easiercase where cloud masks, with and without DA, are bothaccurate. Finally, in the fourth row, a very challengingexample of thin clouds over snowy mountains is shown.In this case, the DA method captures better the thincloud in the top of the image; however, it produces somefalse positives in mixed pixels where snow is melting.For results of all methods shown in Table II over allthe images in the 38-Cloud Proba-V pseudo-simultaneousdataset, we refer the reader to the web application in https://isp.uv.es/projects/cloudsat/pvl8dagans.

VI. Discussion and conclusionsThe main motivation for this study has been to pro-

pose a domain adaptation algorithm to improve trans-fer learning from Landsat-8 to Proba-V for cloud detec-

Page 10: Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

10

Proba-V (PV) PV → L8 L8-Upscaled (LU)BL

UERE

DNI

RSW

IR

Fig. 9. 2D Fourier transform in dB for each of the four spectralchannels averaged across all 64× 64 images patches in the 38-CloudsProba-V pseudo-simultaneous dataset. Left: Proba-V images. Center:Proba-V images adapted using the proposed domain adaptationmethod. Right: pseudo-simultaneous Landsat-8 upscaled images.

tion. However, the obtained transformation is application-independent since its aim is to reduce spatial and spectraldifferences between the two sensors image datasets. It isworth noting that the main objective of Proba-V missionwas to ensure continuity and fill the gap between SPOTVegetation (VGT) and the Sentinel-3 missions [62]. Thesemulti-mission approaches are necessary to provide longtime-series of surface vegetation products with completeEarth coverage, and require a high radiometric consistencyacross sensor datasets. For instance, differences betweenProba-V and VGT2 spectral responses were of the sameorder as between VGT1 and VGT2 [63]. Also, the ESASentinel-3 synergy vegetation products replicate the char-acteristics of the 1-km SPOT VGT products using spectralremapping and co-location techniques. In this context,the proposed domain adaptation model is useful for anygeneral purpose cross-sensor application. In addition, itdirectly learns from the satellite images and is based onsound statistical methods. Therefore, the transformation issuitable for general use and could be applied to any remotesensing problem in which Proba-V could take advantageof Landsat-8 data.

Obtained results have shown that the proposed domainadaptation transformation, in addition to reduce the dif-ference between the TOA reflectance distributions, also

0.4 0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4

BLUEgoodbad

0.4 0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4

REDgoodbad

0.4 0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4

NIRgoodbad

0.4 0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4

SWIRgoodbad

Fig. 10. Differences in TOA for Proba-V images before and afterapplying the proposed DA transformation (XPV −GPV→LU(XPV))

unintentionally fixes some radiometry artifacts. Lookingat the resulting distributions one can see that the charac-teristic saturation in the Blue and Red channels of Proba-V disappears in the adapted images (see Fig. 7). More-over, although the developed model does not distinguishexplicitly between good and bad pixels of the Proba-Vproducts quality flag, results show that good quality pixelsare much less changed than bad quality pixels (see Fig. 10).On the one hand, this result agrees with [11], where goodquality pixels are similar between Landsat-8 and Proba-V, which implies that their TOA radiance calibration isquite good. On the other hand, the proposed adaptationmethod only changes those good pixels within the rangeof the radiometric error reported in [11] or in [12] (seeFig. 10). However, Proba-V products present a significantnumber of bad quality pixels: between 20% and 30% inthe blue channel and around 5% in the red one. This caneventually have an important impact on derived products,since usually we are expected to provide results in thewhole image. For instance, removing or ignoring those badpixels is not feasible for methods using the spatial context,such as CNNs, since the output for a pixel dependson the surrounding pixels. Therefore, the DA method

Page 11: Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

MATEO-GARCIA ET AL.: CROSS-SENSOR ADVERSARIAL DOMAIN ADAPTATION FOR CLOUD DETECTION 11

Fig. 11. Example of color inversion when neither the segmentationloss nor the identity loss are included. Left: Proba-V image. Right:adapted image with the generator GPV→LU trained with λseg = 0,λid = 0.

no DA full DA

88

90

92

94

96

98

100

Accu

racy

qualitygoodbad

Fig. 12. Cloud detection accuracy in the PV-24 dataset of the 10cloud detection U-Net models with different weight initializationwith and without the proposed DA transformation. Pixels stratifiedaccording to the quality indicator available in the SM flag of Proba-Vimages.

improves the TOA reflectance image resemblance acrosssensors and, in this particular case, significantly increasesthe number of pixels that can be further processed: i.e.corrected bad quality pixels (see Fig. 12).

In addition, the proposed adversarial domain adaptationmodel has been modified to specifically improve the per-formance of a transfer learning cloud detection problem. Inparticular, the cost function used to train the DA networkhas been modified by including a dedicated term forcingsimilar cloud detection results across domains. Resultsshow that, when the proposed transformation is applied,cloud detection models trained using only Landsat-8 dataincrease cloud detection accuracy in Proba-V. It is worthnoting that results without the application dependentterm (λseg = 0) are good enough. However, it is importantto include either λseg > 0 or λid > 0 in order to constrainthe method and to avoid artifacts in the adapted images.

The proposed adaptation framework can be extendedin two ambitious directions. On the one hand, it wouldbe possible to learn a domain adaptation transformationdirectly from Landsat-8 to Proba-V, without previouslyapplying the upscaling transformation, which convertedthe Landsat-8 images in order to have similar spatio-

spectral properties than Proba-V (number of bands andspatial resolution). However, this approach would imply tosolve the more challenging super-resolution problem whentransforming Proba-V to Landsat-8 in our cyclic GANadaptation framework. On the other hand, an interestingoption to explore would be to apply the transformationfrom top of atmosphere to surface reflectance data. In thiscase, the obtained transformation would be equivalent tolearn an atmospheric correction transformation relying onthe image data only. In addition, as mentioned before, theproposed framework can be applied to any other pair ofsimilar sensors such as Proba-V and Sentinel 2 and 3.

Summarizing, in this paper, a Cycle-GAN architecturehas been proposed to train a domain adaptation methodbetween Proba-V and upscaled Landsat images. The pro-posal includes two generators, two discriminators and fourdifferent penalties. The GAN generator is used to mod-ify the Proba-V images to better resemble the upscaledLandsat images that have also been used to train a clouddetection algorithm. Results on original Proba-V imagesdemonstrate that when using the proposed model for theadaptation a higher cloud detection accuracy is achieved.

Appendix AThis appendix presents the details about the network

architectures and the training procedure of the generatorsand discriminators of the proposed generative adversarialadaptation model. It also has the details of the networksand training configuration of the cloud detection models.The implementation is available at https://github.com/IPL-UV/pvl8dagans.

We use the same network architecture for all generatorsG (fig. 14 top). In particular, G is a 5-layer fully convolu-tional neural network . It consists of:• 2 layers: convolution with 64 separable filters of size

3×3, reLU activation, and batch normalization.• 2 layers: convolution with 64 separable filters of size

3×3 with a dilation rate equal to 2, reLU activation,and batch normalization.

• 1 layer: 1×1 convolution with 4 channels output.We used residual connections between blocks and beforethe final layer.

As in the case of the generators, both discriminators, D,have the same architecture: a 5-layer convolutional neuralnetwork adapted from [54] (fig. 14 bottom). It consists of:• 4 layers: 4×4 convolution, leakyReLU activation and

batch normalization. The number of filters starts in8 for the first convolution an grows by a factor twoin every layer. The convolutions are applied with astride of 2, thus reducing by this factor the spatialsize of the input.

• 1 layer: 1×1 convolution with 1 output channel anda sigmoid activation.

The output of the discriminators can be interpreted as theprobability of an image to be real. Hence, the discrimina-tor is trained to provide close-to-zero values for imagesgenerated by G and close-to-one values for real satelliteimages.

The proposed networks (GPV→LU, GLU→PV, DPV andDLU) were trained simultaneously using 64× 64 patches

Page 12: Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

12

L8-Upscaled (LU) Proba-V (PV) PV → L8 Clouds with DA Clouds without DA

Fig. 13. From left to right: Landsat-8 Upscaled image, Proba-V image, Proba-V as Landsat-8 upscaled, Clouds from Proba-V as Landsat-8upscaled, Clouds without domain adaptation.

64 6468 132

SepConv 3x3,

Batch Norm, ReLU

copy

conv 1x1

2-dilated SepConv 3x3

Batch Norm, ReLU

4-bandimage

4-bandimage

8

1632 64 real/fake

probability

4-bandimage

conv 1x12 strided Conv 4x4,

Batch Norm, leakyReLU

Fig. 14. Generator (top) and discriminator (bottom) architectures.Implementation details available at https://github.com/IPL-UV/pvl8dagans

with stochastic gradient descent on their respective losseswith a batch size of 48. Networks were trained for 25

32 32

64 64 128 64

64

128 128

32 32

64

64

32

max pool 2x2 up-conv 2x2copySepConv 3x3,

Batch Norm, ReLU conv 1x1

Cloud probability

4-bandimage

Fig. 15. Simplified U-Net architecture used for the cloud detec-tion model. Implementation details available at https://github.com/IPL-UV/pvl8dagans

epochs in the Proba-V pseudo-simultaneous dataset, whichcorresponds to 14,574 steps where the weights are up-dated. In order to ensure convergence in the GAN train-ing procedure, we regularized the discriminator using 0centered gradient penalty on the real images [55] with aweight of 10. We used the the Adam [64] optimizer with alearning rate of 10−4 to update the weights of the networksat each step. Additionally, we apply data augmentationsin form of 90 degree rotations and horizontal and verticalflips.

Page 13: Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

MATEO-GARCIA ET AL.: CROSS-SENSOR ADVERSARIAL DOMAIN ADAPTATION FOR CLOUD DETECTION 13

For the cloud detection model fLU , we used the samesimplified U-Net architecture as in the work [5]. Fig-ure 15 shows the configuration of layers; we used onlytwo subsampling steps and separable convolutions [65] toreduce the number of trainable parameters and floatingpoints operations (96k parameters and 2.18M FLOPS).The cloud detection networks are trained for 250k stepsusing batches of 64 overlapping patches of 32× 32 pixelsfrom the Biome dataset (upscaled to 333m as describedin sec. IV). We used a learning rate of 10−4 and theAdam [64] optimizer.

References

[1] “UCS Satellite Database,” https://www.ucsusa.org/resources/satellite-database, accessed: 2020-04-25.

[2] D. Tuia, C. Persello, and L. Bruzzone, “Domain adaptation forthe classification of remote sensing data: An overview of recentadvances,” IEEE Geoscience and Remote Sensing Magazine,vol. 4, pp. 41–57, 06 2016.

[3] A. Torralba and A. A. Efros, “Unbiased look at dataset bias,”in CVPR 2011, Jun. 2011, pp. 1521–1528.

[4] G. Mateo-Garcıa, V. Laparra, and L. Gomez-Chova, “DomainAdaptation of Landsat-8 and Proba-V Data Using GenerativeAdversarial Networks for Cloud Detection,” in IGARSS 2019- 2019 IEEE International Geoscience and Remote SensingSymposium, Jul. 2019, pp. 712–715, iSSN: 2153-6996.

[5] G. Mateo-Garcıa, V. Laparra, D. Lopez-Puigdollers, andL. Gomez-Chova, “Transferring deep learning models for clouddetection between Landsat-8 and Proba-V,” ISPRS Journal ofPhotogrammetry and Remote Sensing, vol. 160, pp. 1–17, Feb.2020.

[6] B. Banerjee and S. Chaudhuri, “Hierarchical subspace learningbased unsupervised domain adaptation for cross-domain classi-fication of remote sensing images,” IEEE Journal of SelectedTopics in Applied Earth Observations and Remote Sensing,vol. 10, no. 11, pp. 5099–5109, Nov 2017.

[7] D. H. Svendsen, L. Martino, M. Campos-Taberner, F. J.Garcıa-Haro, and G. Camps-Valls, “Joint gaussian processesfor biophysical parameter retrieval,” IEEE Transactions onGeoscience and Remote Sensing, vol. 56, no. 3, pp. 1718–1727,March 2018.

[8] A. Wolanin, G. Mateo-Garcıa, G. Camps-Valls, L. Gomez-Chova, M. Meroni, G. Duveiller, Y. Liangzhi, and L. Guanter,“Estimating and understanding crop yields with explainabledeep learning in the Indian Wheat Belt,” Environmental Re-search Letters, vol. 15, no. 2, p. 024019, Feb. 2020.

[9] U.S. Geological Survey, “Landsat 8 Data UsersHandbook,” USGS, Tech. Rep. LSDS-1574, April 2019,https://www.usgs.gov/media/files/landsat-8-data-users-handbook.

[10] W. Dierckx, S. Sterckx, I. Benhadj, S. Livens, G. Duhoux,T. Van Achteren, M. Francois, K. Mellab, and G. Saint,“PROBA-V mission for global vegetation monitoring: standardproducts and image quality,” Int. Journal of Remote Sensing,vol. 35, no. 7, pp. 2589–2614, 2014.

[11] S. Sterckx and E. Wolters, “Radiometric Top-of-AtmosphereReflectance Consistency Assessment for Landsat 8/OLI,Sentinel-2/MSI, PROBA-V, and DEIMOS-1 over Libya-4 andRadCalNet Calibration Sites,” Remote Sensing, vol. 11, no. 19,p. 2253, Jan. 2019.

[12] S. Sterckx, W. Dierckx, S. Adriaensen, and S. Livens, “PROBA-V Commissioning Report Annex 1-Radiometric CalibrationResults,” VITO, Tech. Rep. Technical Note, 11/2013, Nov2013. [Online]. Available: https://earth.esa.int/documents/700255/1929094/US-20+Annex1-RadiometricCalibartion-v11.pdf/389c059f-4808-4f92-9642-2cab5a4450cb

[13] J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko,A. Efros, and T. Darrell, “CyCADA: Cycle-Consistent Adver-sarial Domain Adaptation,” in ICML 2018, Jul. 2018, pp. 1989–1998.

[14] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpairedimage-to-image translation using cycle-consistent adversarialnetworkss,” in Computer Vision (ICCV), 2017 IEEE Interna-tional Conference on, 2017.

[15] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutionalnetworks for semantic segmentation,” in 2015 IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), Jun.2015, pp. 3431–3440.

[16] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: ConvolutionalNetworks for Biomedical Image Segmentation,” in Medical Im-age Computing and Computer-Assisted Intervention – MICCAI2015, ser. LNCS. Springer, Cham, Oct. 2015, pp. 234–241.

[17] J. H. Jeppesen, R. H. Jacobsen, F. Inceoglu, and T. S. Tofte-gaard, “A cloud detection algorithm for satellite imagery basedon deep learning,” Remote Sensing of Environment, vol. 229,pp. 247–259, Aug. 2019.

[18] S. Mohajerani and P. Saeedi, “Cloud-net: An end-to-end clouddetection algorithm for landsat 8 imagery,” in IGARSS 2019- 2019 IEEE International Geoscience and Remote SensingSymposium, July 2019, pp. 1029–1032.

[19] S. Mohajerani, T. A. Krammer, and P. Saeedi, “”A CloudDetection Algorithm for Remote Sensing Images Using FullyConvolutional Neural Networks”,” in 2018 IEEE 20th Inter-national Workshop on Multimedia Signal Processing (MMSP),Aug 2018, pp. 1–5.

[20] Z. Li, H. Shen, Q. Cheng, Y. Liu, S. You, and Z. He, “Deeplearning based cloud detection for medium and high resolutionremote sensing images of different sensors,” ISPRS Journal ofPhotogrammetry and Remote Sensing, vol. 150, pp. 197–212,Apr. 2019.

[21] J. Yang, J. Guo, H. Yue, Z. Liu, H. Hu, and K. Li, “CDnet: CNN-Based Cloud Detection for Remote Sensing Imagery,” IEEETransactions on Geoscience and Remote Sensing, pp. 1–17,2019.

[22] Z. Zhu, S. Wang, and C. E. Woodcock, “Improvement andexpansion of the Fmask algorithm: cloud, cloud shadow, andsnow detection for Landsats 4–7, 8, and Sentinel 2 images,”Remote Sensing of Environment, vol. 159, no. Supplement C,pp. 269–277, Mar 2015.

[23] M. Wieland, Y. Li, and S. Martinis, “Multi-sensor cloud andcloud shadow segmentation with a convolutional neural net-work,” Remote Sensing of Environment, vol. 230, p. 111203,Sep. 2019.

[24] U. Geological Survey, “L8 SPARCS Cloud Validation Masks,”U.S. Geological Survey, no. data release, 2016. [Online].Available: https://landsat.usgs.gov/sparcs

[25] M. Segal-Rozenhaimer, A. Li, K. Das, and V. Chirayath, “Clouddetection algorithm for multi-modal satellite imagery usingconvolutional neural-networks (CNN),” Remote Sensing of En-vironment, vol. 237, p. 111446, Feb. 2020.

[26] Y. Shendryk, Y. Rist, C. Ticehurst, and P. Thorburn, “Deeplearning for multi-modal classification of cloud, shadow and landcover scenes in PlanetScope and Sentinel-2 imagery,” ISPRSJournal of Photogrammetry and Remote Sensing, vol. 157, pp.124–136, Nov. 2019.

[27] C. Tote, E. Swinnen, S. Sterckx, S. Adriaensen, I. Benhadj,M.-D. Iordache, L. Bertels, G. Kirches, K. Stelzer, W. Dierckx,L. Van den Heuvel, D. Clarijs, and F. Niro, “Evaluation ofPROBA-V Collection 1: Refined Radiometry, Geometry, andCloud Screening,” Remote Sensing, vol. 10, no. 9, p. 1375, Aug.2018. [Online]. Available: https://www.mdpi.com/2072-4292/10/9/1375

[28] S. Liang, Quantitative Remote Sensing of Land Surfaces. USA:Wiley-Interscience, 2003.

[29] E. Mandanici and G. Bitelli, “Preliminary Comparison ofSentinel-2 and Landsat 8 Imagery for a Combined Use,” RemoteSensing, vol. 8, no. 12, p. 1014, Dec. 2016.

[30] C. Revel, V. Lonjou, S. Marcq, C. Desjardins, B. Fougnie, C. C.-D. Luche, N. Guilleminot, A.-S. Lacamp, E. Lourme, C. Miquel,and X. Lenot, “Sentinel-2A and 2B absolute calibration moni-toring,” European Journal of Remote Sensing, vol. 52, no. 1, pp.122–137, Jan. 2019.

[31] Y. Zhao, L. Ma, C. Li, C. Gao, N. Wang, and L. Tang, “Radio-metric cross-calibration of landsat-8/oli and gf-1/pms sensorsusing an instrumented sand site,” IEEE Journal of SelectedTopics in Applied Earth Observations and Remote Sensing,vol. 11, no. 10, pp. 3822–3829, Oct 2018.

[32] R. Houborg and M. F. McCabe, “A Cubesat enabled Spatio-Temporal Enhancement Method (CESTEM) utilizing Planet,Landsat and MODIS data,” Remote Sensing of Environment,vol. 209, pp. 211–226, May 2018. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0034425718300786

[33] M. Claverie, J. Ju, J. G. Masek, J. L. Dungan, E. F. Vermote,J.-C. Roger, S. V. Skakun, and C. Justice, “The Harmonized

Page 14: Cross-Sensor Adversarial Domain Adaptation of Landsat-8 ...from Proba-V to Landsat-8 and from Landsat-8 to Proba-V produce accurate results on a par with the FMask [22] model on Landsat-8

14

Landsat and Sentinel-2 surface reflectance data set,” RemoteSensing of Environment, vol. 219, pp. 145–161, Dec. 2018.

[34] H. K. Zhang, D. P. Roy, L. Yan, Z. Li, H. Huang,E. Vermote, S. Skakun, and J.-C. Roger, “Characterizationof Sentinel-2A and Landsat-8 top of atmosphere, surface,and nadir BRDF adjusted reflectance and NDVI differences,”Remote Sensing of Environment, vol. 215, pp. 482–494,Sep. 2018. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0034425718301883

[35] D. Scheffler, D. Frantz, and K. Segl, “Spectral harmonizationand red edge prediction of Landsat-8 to Sentinel-2 using landcover optimized multivariate regressors,” Remote Sensing ofEnvironment, vol. 241, p. 111723, May 2020.

[36] R. C. Gonzalez and R. E. Woods, Digital Image Processing (3rdEdition). USA: Prentice-Hall, Inc., 2006.

[37] S. Inamdar, F. Bovolo, L. Bruzzone, and S. Chaudhuri, “Mul-tidimensional Probability Density Function Matching for Pre-processing of Multitemporal Remote Sensing Images,” IEEETransactions on Geoscience and Remote Sensing, vol. 46, no. 4,pp. 1243–1252, Apr. 2008.

[38] D. Tuia, J. Munoz-Marı, L. Gomez-Chova, and J. Malo, “GraphMatching for Adaptation in Remote Sensing,” IEEE Transac-tions on Geoscience and Remote Sensing, vol. 51, no. 1, pp.329–341, Jan 2013.

[39] D. Tuia, M. Volpi, M. Trolliet, and G. Camps-Valls, “Semisuper-vised Manifold Alignment of Multimodal Remote Sensing Im-ages,” IEEE Transactions on Geoscience and Remote Sensing,vol. 52, no. 12, pp. 7708–7720, Dec. 2014.

[40] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generativeadversarial nets,” in Advances in Neural Information ProcessingSystems 27. Curran Associates, Inc., 2014, pp. 2672–2680.

[41] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle,F. Laviolette, M. March, and V. Lempitsky, “Domain-Adversarial Training of Neural Networks,” Journal of MachineLearning Research, vol. 17, no. 59, pp. 1–35, 2016. [Online].Available: http://jmlr.org/papers/v17/15-239.html

[42] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “AdversarialDiscriminative Domain Adaptation,” in CVPR 2017, Jul. 2017,pp. 2962–2971.

[43] M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised Image-to-Image Translation Networks,” in Advances in NeuralInformation Processing Systems 30, I. Guyon, U. V. Luxburg,S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, andR. Garnett, Eds. Curran Associates, Inc., 2017, pp.700–708. [Online]. Available: http://papers.nips.cc/paper/6672-unsupervised-image-to-image-translation-networks.pdf

[44] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Kr-ishnan, “Unsupervised Pixel-Level Domain Adaptation withGenerative Adversarial Networks,” in CVPR 2017, Jul. 2017,pp. 95–104.

[45] A. Elshamli, G. W. Taylor, A. Berg, and S. Areibi, “DomainAdaptation Using Representation Learning for the Classificationof Remote Sensing Images,” IEEE Journal of Selected Topics inApplied Earth Observations and Remote Sensing, vol. 10, no. 9,pp. 4198–4209, Sep. 2017.

[46] S. Song, H. Yu, Z. Miao, Q. Zhang, Y. Lin, and S. Wang,“Domain Adaptation for Convolutional Neural Networks-BasedRemote Sensing Scene Classification,” IEEE Geoscience andRemote Sensing Letters, vol. 16, no. 8, pp. 1324–1328, Aug.2019.

[47] O. Tasar, S. L. Happy, Y. Tarabalka, and P. Alliez, “ColorMap-GAN: Unsupervised Domain Adaptation for Semantic Segmen-tation Using Color Mapping Generative Adversarial Networks,”IEEE Transactions on Geoscience and Remote Sensing, pp. 1–16, 2020.

[48] Y. Koga, H. Miyazaki, and R. Shibasaki, “A Method for Ve-hicle Detection in High-Resolution Satellite Images that Usesa Region-Based Object Detector and Unsupervised DomainAdaptation,” Remote Sensing, vol. 12, no. 3, p. 575, Jan. 2020.

[49] B. Benjdira, Y. Bazi, A. Koubaa, and K. Ouni, “UnsupervisedDomain Adaptation Using Generative Adversarial Networks forSemantic Segmentation of Aerial Images,” Remote Sensing,vol. 11, no. 11, p. 1369, Jan. 2019.

[50] F. Ye, W. Luo, M. Dong, H. He, and W. Min, “SAR ImageRetrieval Based on Unsupervised Domain Adaptation and Clus-tering,” IEEE Geoscience and Remote Sensing Letters, vol. 16,no. 9, pp. 1482–1486, Sep. 2019.

[51] L. Wang, X. Xu, Y. Yu, R. Yang, R. Gui, Z. Xu, and F. Pu, “Sar-to-optical image translation using supervised cycle-consistent

adversarial networks,” IEEE Access, vol. 7, pp. 129 136–129 149,2019.

[52] L. Liu, Z. Pan, X. Qiu, and L. Peng, “Sar target classificationwith cyclegan transferred simulated samples,” in IGARSS 2018- 2018 IEEE International Geoscience and Remote SensingSymposium, July 2018, pp. 4411–4414.

[53] S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” IEEETransactions on Knowledge and Data Engineering, vol. 22,no. 10, pp. 1345–1359, Oct. 2010.

[54] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-imagetranslation with conditional adversarial networks,” in CVPR,2017, 2017.

[55] L. Mescheder, S. Nowozin, and A. Geiger, “Which trainingmethods for gans do actually converge?” in International Con-ference on Machine Learning (ICML), 2018.

[56] L. Denaro and C. Lin, “Hybrid canonical correlation analysisand regression for radiometric normalization of cross-sensorsatellite imagery,” IEEE Journal of Selected Topics in AppliedEarth Observations and Remote Sensing, vol. 13, pp. 976–986,2020.

[57] U. Geological Survey, “L8 Biome Cloud Validation Masks,” U.S.Geological Survey, no. data release, 2016. [Online]. Available:http://doi.org/10.5066/F7251GDH

[58] S. Foga, P. L. Scaramuzza, S. Guo, Z. Zhu, R. D. Dilley,T. Beckmann, G. L. Schmidt, J. L. Dwyer, M. J. Hughes, andB. Laue, “Cloud detection algorithm comparison and validationfor operational Landsat data products,” Remote Sensing ofEnvironment, vol. 194, no. Suppl. C, pp. 379–390, Jun. 2017.

[59] M. J. Hughes and D. J. Hayes, “Automated Detection of Cloudand Cloud Shadow in Single-Date Landsat Imagery Using Neu-ral Networks and Spatial Post-Processing,” Remote Sensing,vol. 6, no. 6, pp. 4907–4926, May 2014.

[60] L. Gomez-Chova, G. Mateo-Garcıa, J. Munoz-Marı, andG. Camps-Valls, “Cloud detection machine learning algorithmsfor PROBA-V,” in IGARSS 2017, Jul. 2017, pp. 2251–2254.

[61] E. Wolters, W. Dierckx, M.-D. Iordache, andE. Swinnen, “PROBA-V products user manual,” QWG,Tech. Rep. Technical Note, 16/03/2018, March 2018.[Online]. Available: http://www.vito-eodata.be/PDF/image/PROBAV-Products User Manual.pdf

[62] C. Tote and E. Swinnen, “Extending the Spot/Vegetation–Proba-V Archive with Sentinel-3: a Preliminary Evaluation,”in IGARSS 2018 - 2018 IEEE International Geoscience andRemote Sensing Symposium, 2018, pp. 8707–8710.

[63] S. Sterckx, S. Adriaensen, W. Dierckx, and M. Bouvet,“In-Orbit Radiometric Calibration and Stability Monitoring ofthe PROBA-V Instrument,” Remote Sensing, vol. 8, no. 7,p. 546, Jul. 2016. [Online]. Available: https://www.mdpi.com/2072-4292/8/7/546

[64] D. P. Kingma and J. Ba, “Adam: A method for stochasticoptimization,” in 3rd International Conference on LearningRepresentations, ICLR 2015, San Diego, CA, USA, May 7-9,2015, Conference Track Proceedings, 2015, pp. 1–13.

[65] F. Chollet, “Xception: Deep Learning with Depthwise SeparableConvolutions,” in 2017 IEEE Conference on Computer Visionand Pattern Recognition (CVPR), Jul. 2017, pp. 1800–1807.