12
REVIEW OF SCIENTIFIC INSTRUMENTS 89, 074902 (2018) Improving underwater localization accuracy with machine learning Lynn T. Rauchenstein, Abhinav Vishnu, Xinya Li, and Zhiqun Daniel Deng a) Pacific Northwest National Laboratory, Richland, Washington 99352, USA (Received 7 November 2017; accepted 9 June 2018; published online 2 July 2018) Machine learning classification and regression algorithms were applied to calibrate the localization errors of a time-difference-of-arrival (TDOA)-based acoustic sensor array used for tracking salmon passage through a hydroelectric dam on the Snake River, Washington, USA. The locations of stationary and mobile acoustic tags were first tracked using the approximate maximum likelihood algorithm. Next, ensembles of classification trees successfully identified and filtered data points with large localization errors. This prefiltering step allowed the creation of a machine-learned regression model function, which decreased the median distance error by 50% for the stationary tracks and by 34% for the mobile tracks. It also extended the previous range of sub-meter localization accuracy from 100 m to 250 m horizontal distance from the dam face (the receivers). Median distance errors in the depth direction were especially decreased, falling from 0.49 m to 0.04 m in the stationary tracks and from 0.38 m to 0.07 m in the mobile tracks. These methods would have application to the calibration of error in any TDOA-based sensor network with a steady environment and array configuration. Published by AIP Publishing. https://doi.org/10.1063/1.5012687 I. INTRODUCTION Underwater acoustic telemetry is used frequently in oil and natural gas exploration, monitoring underwater plate tec- tonic movements and underwater vehicle navigation, and tracking the behavior and migratory patterns of aquatic and marine animals. 13 In the Pacific Northwest (USA), one of its most important applications has been tracking endangered salmon. Several species of Pacific salmonids in the Columbia River Basin have been listed as endangered or threatened. 4 The recovery of endangered salmon species has received critical attention over the last 20 years because of its high economic and environmental importance. 57 Salmonid life history involves migration of juveniles to the ocean, where they grow for 1-3 years to adults and migrate back upriver to their natal tributaries to spawn. 8 The fish are studied as they pass through hydroelectric dams, which can have a detri- mental impact on survival of migrating salmonids and adult fish. Downstream migrating fish may be injured or killed in turbines, dam spillways, or fish bypass systems. There is high interest in implementing/improving existing measures to increase survival, ensuring safe upstream and downstream fish passage. 913 Opportunities for improving fish survival arise during the design, operation, and evaluation of new or replacement dam components. Optimizing these systems requires knowledge of fish behaviors, such as seasonal movement patterns, prefer- ence of water depth or temperature, foraging behavior, habitat utilization, spawning behavior, and site fidelity. 1417 Accurate knowledge of fish behavior in forebays is especially important to track the success of fish diverting systems and to know depth distribution during the approach. The latter factor is impor- tant because of the possibility of damage to swim bladders a) Author to whom correspondence should be addressed: zhiqun.deng@ pnnl.gov. Tel.: 1-509-372-6120. Fax: 1-509-372-6089. (barotrauma) during dam passage when fish are exposed to rapid decompression. 18,19 The Juvenile Salmon Acoustic Telemetry System (JSATS) has been used to monitor survival and observe behavior of juvenile salmonids passing through eight large hydroelectric facilities within the Federal Columbia River Power System en route to the Pacific Ocean since 2006. 2022 Developed by the US Army Corps of Engineers, Portland District (Oregon, USA), these arrays of hydrophones receive coded transmis- sions from acoustic transmitters implanted in fish. Accuracy is important when evaluating fish survival and behavior during passage through these facilities as depth and location before passage are influential factors in survival. 23,24 Advances have been made in the accuracy of fish track- ing, including the development of advanced algorithms for localization, detection, and deployment schemes. 25 The sub- meter tracking accuracy is currently possible within 100 m horizontal distance from a dam 26 and is helpful when high- resolution three dimensional tracking results are needed to investigate the detailed behaviors of fish in the forebays prior to passage. 27 Monte Carlo simulations of errors in the local- ization accuracy have shown that the error pattern can vary predictably as a function of depth, angle, and distance from a dam. These error patterns are unique to the receiver array configuration and to the environment of the surrounding study site. The application of a calibrating function to counteract these errors can greatly increase the accuracy of localization. Such methodologies to improve the accuracy would have wider applications in fields of underwater and land animal track- ing, wireless sensor network (WSN) localization, intelligent transportation systems, radar, sonar, and autonomous robotics. As a computationally viable and robust approach, machine learning (ML) methods have been increasingly applied to provide innovative solutions to localization. 28,29 ML tech- niques were adopted to enhance the performance of existing localization methods, for instance, increase the precision as 0034-6748/2018/89(7)/074902/12/$30.00 89, 074902-1 Published by AIP Publishing.

Improving underwater localization accuracy with machine ...€¦ · a signal conditioning amplifier, a data acquisition computer with two 16-bit digital signal processing cards,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Improving underwater localization accuracy with machine ...€¦ · a signal conditioning amplifier, a data acquisition computer with two 16-bit digital signal processing cards,

REVIEW OF SCIENTIFIC INSTRUMENTS 89, 074902 (2018)

Improving underwater localization accuracy with machine learningLynn T. Rauchenstein, Abhinav Vishnu, Xinya Li, and Zhiqun Daniel Denga)

Pacific Northwest National Laboratory, Richland, Washington 99352, USA

(Received 7 November 2017; accepted 9 June 2018; published online 2 July 2018)

Machine learning classification and regression algorithms were applied to calibrate the localizationerrors of a time-difference-of-arrival (TDOA)-based acoustic sensor array used for tracking salmonpassage through a hydroelectric dam on the Snake River, Washington, USA. The locations of stationaryand mobile acoustic tags were first tracked using the approximate maximum likelihood algorithm. Next,ensembles of classification trees successfully identified and filtered data points with large localizationerrors. This prefiltering step allowed the creation of a machine-learned regression model function,which decreased the median distance error by 50% for the stationary tracks and by 34% for the mobiletracks. It also extended the previous range of sub-meter localization accuracy from 100 m to 250 mhorizontal distance from the dam face (the receivers). Median distance errors in the depth directionwere especially decreased, falling from 0.49 m to 0.04 m in the stationary tracks and from 0.38 m to0.07 m in the mobile tracks. These methods would have application to the calibration of error in anyTDOA-based sensor network with a steady environment and array configuration. Published by AIPPublishing. https://doi.org/10.1063/1.5012687

I. INTRODUCTION

Underwater acoustic telemetry is used frequently in oiland natural gas exploration, monitoring underwater plate tec-tonic movements and underwater vehicle navigation, andtracking the behavior and migratory patterns of aquatic andmarine animals.1–3 In the Pacific Northwest (USA), one ofits most important applications has been tracking endangeredsalmon. Several species of Pacific salmonids in the ColumbiaRiver Basin have been listed as endangered or threatened.4

The recovery of endangered salmon species has receivedcritical attention over the last 20 years because of its higheconomic and environmental importance.5–7 Salmonid lifehistory involves migration of juveniles to the ocean, wherethey grow for 1-3 years to adults and migrate back upriverto their natal tributaries to spawn.8 The fish are studied asthey pass through hydroelectric dams, which can have a detri-mental impact on survival of migrating salmonids and adultfish. Downstream migrating fish may be injured or killedin turbines, dam spillways, or fish bypass systems. There ishigh interest in implementing/improving existing measures toincrease survival, ensuring safe upstream and downstream fishpassage.9–13

Opportunities for improving fish survival arise during thedesign, operation, and evaluation of new or replacement damcomponents. Optimizing these systems requires knowledge offish behaviors, such as seasonal movement patterns, prefer-ence of water depth or temperature, foraging behavior, habitatutilization, spawning behavior, and site fidelity.14–17 Accurateknowledge of fish behavior in forebays is especially importantto track the success of fish diverting systems and to know depthdistribution during the approach. The latter factor is impor-tant because of the possibility of damage to swim bladders

a)Author to whom correspondence should be addressed: [email protected]. Tel.: 1-509-372-6120. Fax: 1-509-372-6089.

(barotrauma) during dam passage when fish are exposed torapid decompression.18,19

The Juvenile Salmon Acoustic Telemetry System (JSATS)has been used to monitor survival and observe behavior ofjuvenile salmonids passing through eight large hydroelectricfacilities within the Federal Columbia River Power Systemen route to the Pacific Ocean since 2006.20–22 Developed bythe US Army Corps of Engineers, Portland District (Oregon,USA), these arrays of hydrophones receive coded transmis-sions from acoustic transmitters implanted in fish. Accuracyis important when evaluating fish survival and behavior duringpassage through these facilities as depth and location beforepassage are influential factors in survival.23,24

Advances have been made in the accuracy of fish track-ing, including the development of advanced algorithms forlocalization, detection, and deployment schemes.25 The sub-meter tracking accuracy is currently possible within 100 mhorizontal distance from a dam26 and is helpful when high-resolution three dimensional tracking results are needed toinvestigate the detailed behaviors of fish in the forebays priorto passage.27 Monte Carlo simulations of errors in the local-ization accuracy have shown that the error pattern can varypredictably as a function of depth, angle, and distance froma dam. These error patterns are unique to the receiver arrayconfiguration and to the environment of the surrounding studysite. The application of a calibrating function to counteractthese errors can greatly increase the accuracy of localization.Such methodologies to improve the accuracy would have widerapplications in fields of underwater and land animal track-ing, wireless sensor network (WSN) localization, intelligenttransportation systems, radar, sonar, and autonomous robotics.As a computationally viable and robust approach, machinelearning (ML) methods have been increasingly applied toprovide innovative solutions to localization.28,29 ML tech-niques were adopted to enhance the performance of existinglocalization methods, for instance, increase the precision as

0034-6748/2018/89(7)/074902/12/$30.00 89, 074902-1 Published by AIP Publishing.

Page 2: Improving underwater localization accuracy with machine ...€¦ · a signal conditioning amplifier, a data acquisition computer with two 16-bit digital signal processing cards,

074902-2 Rauchenstein et al. Rev. Sci. Instrum. 89, 074902 (2018)

a post-optimizer,30 eliminate the need for anchor points inWSNs,29 estimate the ranging error directly from the receivedwave-form,31 and using neural networks to improve root-mean-square error (RMSE) of node localization in a densesensor network.32

To calibrate the error in the system, the error pat-tern must be accurately modeled. Localization error pat-terns depend linearly and nonlinearly on many factors andmay be influenced by features such as the acoustic signal’stime of arrival (TOA) at each receiver, signal-to-noise ratio(SNR), water temperature and temperature gradients, mul-tipath interference, and channel shape. The structural formof the desired calibration function is generally unknown andmay be very difficult or even too complex to form analyt-ically. In order to generate an accurate model function oferror for use in calibration, machine learning regression algo-rithms are a promising tool. These techniques use experimen-tal data to supervise the task of fitting an unknown modelto known outputs.29 The method involves starting with thedata (pairs of input/output vectors) and constructing a com-bination of elementary functions to produce a more complexmodel function that accurately describes the input/output map-ping of the data. Machine learning methods differ in theway of constructing the combination of elementary func-tions and generate approximate model functions to describethe input/output mapping.33 This process works as an opti-mization procedure that the approximation can be performedusing the weighted sum (which may not be this simple inML methods) of the elementary functions. Machine learn-ing techniques, the combination of the elementary functions,are able to learn and predict no matter how random, com-plex, and nonlinear the model function is. In this sense,machine learning methods perform universal function approx-imation.33 To enable the successful application of regressionalgorithms, it is necessary to first identify and remove out-lying data points. Regression on data containing large out-liers can actually increase the overall errors of the system.Classification algorithms such as support vector machines(SVMs) and ensembles of classification trees can be usedto identify and discard outliers based on input vectors ofdata from potentially important features. Thus, the calibra-tion task must consist of two parts: classification followed byregression.

In this study, we developed a calibration model (classi-fication and regression) for data outputs from TDOA-basedlocalization in a controlled field environment. To investigatethe performance of the proposed calibration model, the majorobjective is to reduce the range error of the tracked positions.As a post-calibrator, this model can enhance the accuracyof three dimensional localization in any other TDOA-basedsensor networks deployed in a stable environment.

II. METHODSA. Study area

The Little Goose Dam (LGS) is an electricity-generatingdam on the Snake River, 113 river kilometers upstream fromthe confluence with the main stem of the Columbia River in

south-central Washington state (USA). The dam is 809 m wideand 30.5 m tall. It is outfitted with a 6 unit powerhouse, an8 bay spillway, a navigation lock, and an adult fish ladder.

Cabled JSATS arrays were deployed along the upstreamdam face using a design detailed by Skalski et al.34 A sin-gle JSATS cabled receiver system consists of one to fourcabled hydrophones [Fig. 1(b), Model SC001, Sonic Con-cepts, Inc., Bothell, WA, USA]. The hydrophone sensitivityis about −182 dB re 1 V/µPa at 416.7 kHz. It also includesa signal conditioning amplifier, a data acquisition computerwith two 16-bit digital signal processing cards, detection soft-ware, decoding software, a global positioning system (GPS)card, and a GPS antenna.3,22,35,36 Before being deployed, eachhydrophone was evaluated in an acoustic tank lined with ane-choic materials at the PNNL Bio-Acoustics and Flow Labora-tory (BFL).37 BFL is accredited by the American Associationfor Laboratory Accreditation (A2LA) to ISO/IEC 17025:2005,which is the international standard for calibration and testinglaboratories.

Two hydrophones were deployed per pier, at 2 depths.Different depths were chosen at the powerhouse and spill-way, respectively, to facilitate the 3-D tracking range throughthe water column. Shallow hydrophones were deployed about3.5 m deep. Deep powerhouse hydrophones were about 25 mdeeper than the shallow phones, while deep spillway phoneswere 8-9 m deeper than the shallow phones. The JSATS arrayat the Little Goose Dam includes 8 cabled receiver systems,totaling 32 hydrophones. Each system (3-4 hydrophones) isindicated in Fig. 1(a) by yellow or orange lines. Field testingin 2012 and 2013 showed that the sub-meter tracking accuracyof individually tagged fish was achievable in the dam fore-bay when fish were within 100 m horizontal distance to thedam face.

B. Field testing

Field testing was conducted on March 19-20, 2013. Test-ing was performed using a 2.7 m long remotely operatedboat designed specifically for the purpose of evaluating thelocalization accuracy on JSATS.3,35 The boat was powered bytwo 50-lb thrust electric trolling motors. 9 acoustic beaconswere attached via a 3-m-long steel pipe mounted vertically tothe boat [Fig. 1(c)]. Beacons’ positions were at fixed depthsbetween 0 and 2 m below the boat (beacons were attachedto a stick perpendicular to the boat whose length is limiteddue to the river environment). Each beacon has a unique tagcode and fixed pulse repetition interval (PRI, i.e., the timebetween successive signal transmissions) which are used bythe JSATS detecting and decoding systems to identify the sig-nal source. The accurate location of the beacons was obtainedusing a real-time kinematic-GPS system (Trimble GeoEx-plorer, Trimble Navigation Ltd., Sunnyvale, California) anddepth sensor (HOBO U20-001-03, Onset Computer Corpora-tion, Bourne, Massachusetts). The antenna for the GPS waslocated 1.0 m above the water surface. The rated accuracyof the GPS was 10 cm. Water temperature (for sound speed)was measured as a function of time using the depth sensor.Springtime testing conditions meant that water temperaturevariation with depth in the river was low, so a single water

Page 3: Improving underwater localization accuracy with machine ...€¦ · a signal conditioning amplifier, a data acquisition computer with two 16-bit digital signal processing cards,

074902-3 Rauchenstein et al. Rev. Sci. Instrum. 89, 074902 (2018)

FIG. 1. Deployment of JSATShydrophones and remote boat used infield testing at the Little Goose Dam(LGS) on the Snake River. LGS hasa powerhouse with 6 turbine unitsdivided by 7 piers on the south sideof the dam, whereas the spillway onthe north side has 8 bays dived by9 piers. (a) Forebay view of locations ofhydrophones deployed at two differentelevations at each pier nose. (b) Themodel of JSTAS hydrophone used.(c) Remote-controlled boat beinglowered into the forebay of LGS.

temperature value was used to compute sound speed for allsound propagation measures. In summer when more watertemperature variability exists in the water column, the local-ization algorithm can be run to incorporate estimated temper-ature gradients to allow for variability in the speed of soundtravel.

Two separate data sets were collected, the first with bea-cons held in stationary positions and the second with beaconstraveling in the forebay. In the stationary treatment, the boatwas held as still as possible for at least 3 min to provide anadequate number of transmissions. The boat was held at a totalof 31 test locations spaced 25, 50, 100, 150, 200, and 250 mupstream from the dam. In the mobile treatment, the boat tracedlinear paths between 3 m and 150 m upstream from the dam,running parallel to the dam face to mimic the potential search-ing and detouring behaviors of fish. In both cases, a coordinatesystem was defined with its origin located at the approximatecenter of the JSATS receiving array, with z = 0 at the watersurface. The x axis traces the dam face, and the y axis followsthe river. The z axis was defined with the positive axis pointingdownward into the water.

The beacons transmitted signals at a carrier frequency of416.7 kHz. The received signals were bandpass filtered aroundthe carrier frequency and amplified. Transmission detection,decoding, and data storage were carried out following themethods described in the study by Weiland et al.3 and Ingra-ham et al.36 From the data files collected by JSATS, portionswith no transmissions of tag signals were selected and consid-ered to contain only background noise. From these, a frequencyrange between 375.0 and 458.0 kHz was selected, so the

bandwidth of the signal becomes 20% of the carrier frequency.Using the laboratory calibration data, the filtered voltage datawere used to calculate the pressure spectral density, whichwas then integrated over the frequency range described aboveto compute the background noise level.

C. Localization

Acoustic signals received at the JSATS hydrophone arraywere localized by TDOA using the 3D approximate maxi-mum likelihood (AML) algorithm developed by Li et al.26

The 3D AML algorithm is an expansion of the 2D AML algo-rithm developed by Chan et al.38 TDOA measurements werecalculated as the time difference between signal detection atthe first receiving hydrophone (the reference hydrophone) anddetection at each subsequent hydrophone. TDOA localizationin 3-D requires signal reception by at least four receivers inorder to calculate three TDOA measurements. These TDOAvalues can then be used to solve a determined system ofthree nonlinear equations for the three unknown signal sourcelocation variables x, y, and z. Signal reception at more thanfour receivers creates an overdetermined system of equationswhich cannot be solved exactly. The AML method resolvesthis problem by using an iterative least squares approach toestimate the most likely source location, using the inverse ofthe noise covariance as a weighting matrix. This estimatordiffers from exact solvers39–41 in that it can solve nonlinearlocalization equations while taking into account uncertaintiesin measurement values due to noise. Such unavoidable smallerrors exist in all measured input variables, including TDOA,

Page 4: Improving underwater localization accuracy with machine ...€¦ · a signal conditioning amplifier, a data acquisition computer with two 16-bit digital signal processing cards,

074902-4 Rauchenstein et al. Rev. Sci. Instrum. 89, 074902 (2018)

surveyed positions of the hydrophones, and the signal propa-gation velocity, and can significantly impact the localizationaccuracy.40,42–44 The signal propagation velocity was esti-mated with a high accuracy from a well-established polyno-mial equation dependent on water temperature.45 The AMLestimator can achieve the Cramer-Rao lower bound for accu-racy, especially when there are more than 4 hydrophonesdetecting the same transmission. It is thus considered to bean optimal estimator.

SNR (signal-to-noise ratio) is related to the distancebetween the transmitter and the hydrophone. The probabilityof multipath interference increases for longer hydrophone-transmitter distances. To reduce localization errors, the weight-ing terms can be improved by assigning higher weights tohydrophones closer to a transmitter. The weight least squaresolution is used to initialize the AML iterations.26 Thus, thisinitial solution is used to calculate the distances betweenarray hydrophones and the transmitter. The original weight-ing matrix in the AML algorithm uses the covariance matrixof TDOA measurements. The covariance matrix of TDOAis adjusted by inversely multiplying the estimated distances.Location estimates outputted from the AML algorithm werepost-filtered to remove obvious errors (i.e., outliers along rea-sonable swimming trajectory) from the data. Tracked locationswere checked to ensure that coordinates lay within the physi-cal boundaries of the river. Next, the median filter was appliedto smooth the tracks.

D. Machine learning1. Classification

Because regression is fundamentally sensitive to outliers,an effort was made to first identify and remove outliers abovea cut-off magnitude of range error using binary classifica-tion algorithms. The range error was defined as the differencebetween the range calculated from the dam origin to the AML-estimated beacon location and the range from the origin tothe measured GPS location. Every observation was assigneda classification of “0” or “1” if its range error magnitude wasabove or below this cut-off value, respectively.

The database for this study consisted of observationsfrom 17354 received stationary beacon transmissions and11970 mobile transmissions. For each of these transmis-sions, an observation vector x exists consisting of elementsx = [x1, . . ., xn] (the features), along with the correspondingrange error classification C = “0” or C = “1.” A classifier isa learned function C(x) that takes the values of various fea-tures and predicts the class that the observation belongs to. Inan applied tracking scenario, GPS ground-truth data will notbe available. Instead, the accuracy of the AML track must beconcluded from available measurements. The features chosento form the observation vectors included TDOA and SNR ateach hydrophone, the AML estimate of range, and the iden-tity of the reference hydrophone. These data were assembledinto a feature matrix X with one row for each observation andone column for each feature. Factors that can be important tolocalization but which did not vary among observations takenover the time period of the study, such as water temperature,were omitted.

For numerical reasons, it is important that each column(feature values) in feature matrix is normalized to confine val-ues to an order of magnitude around 1. The AML-estimatedrange was normalized to the logarithmic domain, and SNR wasstandardized to the mean. TDOA values varied over a rangefrom 10−7 to 10−1 s, posing a challenge to normalization. Toaccommodate this wide range, TDOA features used were lim-ited to receipts by the first N hydrophones, where values ofN were explored between 6 and 32 (all hydrophones). Thesevalues were then normalized to the logarithmic domain.

A classifier function includes a number of parameterswhich must first be learned from a training data set. Oncetrained, the classifier’s predictive ability can be validated bytesting on another subset of previously unseen observations,the test data set. For the training set, it is desirable to includethe same number of observations in each class. If not, it is pos-sible that the classifier leaning algorithm favors the class withthe most observations and thus is harmful to other classes.Under this situation, the accuracy result may not reflect thetruth. For example, in a case that 9 of 10 observations are inone class and 1 of 10 is in the other, 80% accuracy may not bedesired result since the classifier predicts the most numerousclass by default.46 In this study, only 3.9% of the stationarypoints and 4.9% of the mobile points had a range error greaterthan the cut-off range error value of 3 m. A desired size forthe training data set was approximately 10% of all observa-tions, but including not more than half of the high error points.This is because some high error points must be reserved forthe testing data set. Within the training set, approximately halfof observations in each class were chosen to lie within 0.6 m(20%) of the range error cutoff to ensure high training resolu-tion at the classification boundary. The training data set wasconstructed by sampling data points with replacement withineach of the four categories: class “0” near the classificationboundary, class “1” near the classification boundary, class “0”far from the classification boundary, and class “1” far fromthe classification boundary. When duplicate data points wereremoved, the final training data sets contained 8.0% of all sta-tionary and 6.4% of all mobile transmissions. The testing dataset contained all remaining data points.

To validate the reproducibility of the chosen classificationmethod, the training/testing steps were repeated 10 times forthe algorithm which had produced the most accurate classifiermodel, using different randomly selected subsets of the fulldata set. Each of these 10 validation sets was subdivided into2/3 training and 1/3 testing sets, and the accuracy of the newlygenerated model was recorded. Finally, the best classificationmodel was validated 10 times against a test set of 10 000 points,all with a classification of “1” to ensure low misclassificationrates of good data.

Both the classification and regression model functionswere constructed using Matlab r2014a with the OptimizationToolbox (MathWorks, Natick, MA, USA). Because a signaltransmission was frequently detected by fewer than half ofthe total available hydrophones, the observation data set com-prises a sparse matrix. Non-numeric fill values were used toindicate non-detection of a signal at a receiver. Both supportvector machine (SVM) and ensemble classification algorithmswere tried. SVM is a classification method with two main

Page 5: Improving underwater localization accuracy with machine ...€¦ · a signal conditioning amplifier, a data acquisition computer with two 16-bit digital signal processing cards,

074902-5 Rauchenstein et al. Rev. Sci. Instrum. 89, 074902 (2018)

FIG. 2. Flow diagram of the target calibration process, which uses learnedclassification and regression models to adjust AML tracks without GPS. ∗Themost successful feature matrix included TDOA and AML estimation of range.

components: a kernel function and a set of support vectorspopulated by the training data. SVM algorithms attempt togenerate a hyperplane in the feature space that maximally sep-arates the training points (“1” points on the positive side of theplane and “0” points on the negative side). The hyperplaneconstitutes the decision boundary.47 The two classificationensemble algorithms in the Optimization library with the abil-ity to process sparse matrices were the boosting algorithmsLPBoost and TotalBoost. Both of these algorithms rely onbinary decision tree learners to form the classification func-tions and operate by attempting to maximize the minimal mar-gin of the training set. LPBoost does this through an iterativesequence of linear programming problems, while TotalBoostoptimizes using quadratic programming.48

Machine learning is an intrinsically experimental process.Each algorithm was systematically evaluated by tuning impor-tant constructor parameters, such as the number and types oflearners to use (e.g., classification trees vs. surrogate splitsfor the ensemble algorithms), instructions for how to definedecision tree branch points, inclusion/exclusion of cost matri-ces, and types of elementary function to use in constructinghyperplanes. The number of permutations for these parame-ters would be impossible to evaluate exhaustively and mustbe explored by trial and error. For a specific type of classifier,there may be many possible parameter settings that lead toequally good results.

The quality of the model function produced was assessedusing two measures: the classification accuracy and the numberof misclassified “bad” data points (class “0”) left behind by theprefilter. Both metrics can be represented by a confusion matrixof the form

Confusion Matrix =

[0 classified as 0 0 classified as 11 classified as 0 1 classified as 1

]. (1)

A high-level description of the end-goal calibration pro-cess is shown in Fig. 2. After the initial rounds of modeltraining, calibration will be carried out without any GPS data.

A separate, more detailed explanation of the classificationmodel training steps is provided in Fig. 3.

2. Regression

In order to mitigate the errors in localization, modelsof the patterns of range and bearing error were generatedvia regression machine learning. In regression, the goal isto infer an unobserved scalar (F ∈ R) which depends on aset of observations (s ∈ Rn). In this study, three regressionmodels were generated for the spherical coordinates azimuth,elevation, and range, respectively. Observation vectors s con-sist of elements s = [s1, . . ., sn] containing features believedto be predictive of the output F(s). The features chosen toform s in this case were the AML-estimated location coor-dinates, converted from Cartesian to spherical to take advan-tage of symmetries observed in the Monte Carlo patterns oferror. Thus, every observation vector was a 3-vector consist-ing of azimuth, elevation, and range. Every observation vectormapped to a corresponding true output scalar F (GPS-derivedazimuth, elevation, or range).

The data set in the regression part of calibration wassmaller than that in the classification part because some num-ber of observations had been filtered out by the classifier. Inaddition to predicting the class of each observation, the clas-sifier returns a score indicating the confidence value (CV ) ofthe classification. It was hypothesized that regression modelsbuilt from the most confidently classified observations wouldbe the most accurate. To test this, multiple regression mod-els were constructed using a variety of training sets. In thefirst scenario, the training set consisted of half of all obser-vations. Next, training sets were constructed using half ofall observations where CV exceeded the 20th, 40th, 60th,or 80th percentile. Test sets for all five trials consisted ofall of the remaining observations. Features were normalizedto an order of magnitude around 1 to improve numericalperformance.

The regression algorithms tested included single regres-sion trees and ensembles of regression trees. Ensemble regres-sion tree algorithms in Matlab include “LSBoost,” “Bag,”and “TreeBagger,” which was set to invoke Breiman’s Ran-dom Forest algorithm.49 Regression trees predict numericresponses to data based on binary splits. The algorithms forconstructing these trees usually work top-down by choosing afeature at each step that best splits the set of observations. Dif-ferent algorithms use different metrics for measuring “best.”These generally measure the homogeneity of the target vari-able within the subsets.50 A graphic depiction of the regressionmodel training methods is given in Fig. 4.

FIG. 3. Flow diagram of the classification model trainingprocess.

Page 6: Improving underwater localization accuracy with machine ...€¦ · a signal conditioning amplifier, a data acquisition computer with two 16-bit digital signal processing cards,

074902-6 Rauchenstein et al. Rev. Sci. Instrum. 89, 074902 (2018)

FIG. 4. Flow diagram of the regression model trainingprocess.

III. RESULTS

SVM failed as a classifier for both the stationary andmobile beacon field data sets. The algorithms uniformlyterminated early with high classification errors. This may bedue to the sparse matrix conditions of the observation data. Theclassification ensemble algorithms TotalBoost and LPBoost,on the other hand, had a higher accuracy and were compara-ble to each other under most run conditions. Parameters tunedin these algorithms included inclusion/exclusion of surrogatesplits in the tree learner type and cost matrices. In almostevery case, the confusion matrices with and without surrogatesplits were identical, indicating that the constructed ensem-bles of decision trees were the same. Inclusion of a 5:1 costmatrix decreased the classification error in some cases andincreased it in others as more class “1” points were classifiedas “0.”

Varying the number of signal detections per observationhad a minor effect. The best models using few, half, or all ofthe available hydrophones varied by only 2.2% (1.1%) in theclassification accuracy for stationary (mobile) transmissionsalthough in both cases the accuracy increased with increasingnumber of hydrophones. Because of the nature of the errorsin range, only values at or below 3 m were considered for therange error classification cutoff. This number was limited bythe number of bad data points as the number of bad pointsabove 2, 3, 4, and 5 m error, respectively, were 7%, 4.9%,3.9%, and 3.4% of the mobile tracks and 7.8%, 3.9%, 2.6%,and 2.0% of the stationary tracks.

A. Stationary beacons1. Classification

The best classifier model was constructed using theLPBoost algorithm and a feature matrix including TDOAfrom all 32 receivers and AML-estimated range, but exclud-ing SNR and the identity of the reference hydrophone. Inthis run condition, LPBoost did not include a cost matrix andperformed identically with and without surrogate splits. Themodel achieved 100% accuracy on the training data set and85.3% on the test data set. The confusion matrix is

Confusion Matrix =

[97 51

2297 13 497

]. (2)

This indicates that a total of 2394 observations were filteredfrom the test data set, or 15% of all observations. Locations ofthe removed and remaining beacon location observations arepresented in Fig. 5.

The median absolute value and root-mean-square (RMS)of range error as estimated by the uncalibrated AML trackswere 0.44 m and 3.20 m, respectively. After the best classi-fier was applied, these values dropped to 0.39 m and 1.02 m,respectively. The median error and RMSE for the incorrectlyfiltered observations (class “1” miscategorized as class “0” inthe test set) were 0.59 and 0.97 m, respectively. The medianerror and RMSE for the correctly filtered observations (class“0” classified as class “0”) were 5.13 and 15.55 m, respectively.

The best classifier was validated for the accuracy andreproducibility. The model was applied to 10 sets of 1400 datapoints chosen according to methods described in Sec. II D 1.The model performed reproducibly across all runs, with a clas-sification accuracy ranging around 71%. Next the classifier wasapplied to 10 validation sets of 10 000 points all of class “1.”The median and mean classification accuracy of these runswere both 86%.

2. Regression

The best range error regression model was found to be aninvocation of Breiman’s Random Forest algorithm, a type ofregression tree method, constructed using a training set with aCV of 0. This means that the training set consisted of half of allobservations. Quality of the regression models was evaluatedby comparing the RMSE of the test sets. Results from thebest regression model, evaluated at each confidence value, arepresented in Table I. Before calibration, 76.2% of AML trackshad the sub-meter range error. After range calibration, 93.3%of AML tracks had the sub-meter range error.

Separate regression models were constructed for eachof the three spherical coordinates—azimuth, elevation, andrange. The best models for all three were likewise an invocationof Breiman’s Random Forest algorithm although algorithmtuning led to different parameter details for each. Distanceerrors (DE) of the original AML location estimates werecompared with DE from range-calibrated estimates (“rangecalibrated”) and from range-azimuth-elevation calibrated esti-mates (“fully calibrated”) in order to evaluate the magnitudeof the gains from the second two cycles of calibration. These

Page 7: Improving underwater localization accuracy with machine ...€¦ · a signal conditioning amplifier, a data acquisition computer with two 16-bit digital signal processing cards,

074902-7 Rauchenstein et al. Rev. Sci. Instrum. 89, 074902 (2018)

FIG. 5. Time history of locations of the 9 stationary bea-cons: (a) GPS locations of all observations, (b) GPSlocations of class “1” observations, i.e., observations keptby the classifier, (c) GPS locations of class “0” obser-vations, i.e., observations removed by the classifier, and(d) AML location estimates for the class “0” observations.

results are based on the application of the regression modelsto the test (validation) data sets.

Before calibration, 55.6% of AML location estimateshad the sub-meter total distance error (DEtotal). After rangecalibration, 72.5% had the sub-meter DEtotal, and after fullcalibration, 83.1% had the sub-meter DEtotal. The medianDEtotal for uncalibrated AML tracks, range calibrated tracks,and fully calibrated tracks, respectively, was 0.90, 0.69, and0.50 m. RMS of the distance errors (RMSDE) was 5.33, 2.11,and 1.90 m, respectively. In the x direction, the medianDEx was 0.33, 0.26, and 0.33 m and RMSDEx was 4.05,1.73, and 1.75 m, respectively. The median DEy was 0.37,0.19, and 0.22 m and RMSDEy was 3.00, 0.68, and 0.74 m,respectively. The median DEz was 0.49, 0.46, and 0.04 m,

and RMSDEz was 1.71, 1.00, and 0.12 m, respectively. Inall cases, calibration reduced DEtotal from the AML tracks(Fig. 6).

B. Mobile beacons1. Classification

The best classifier model was constructed using theLPBoost algorithm and a feature matrix including TDOA fromall 32 receivers and AML-estimated range, but excluding SNRand the identity of the reference hydrophone. In this run con-dition, LPBoost included a 5:1 cost matrix and performed bestwith surrogate splits. The model achieved 100.0% accuracyon the training data set and 92.7% on the test data set. The

TABLE I. Median absolute error and RMSE (in meters) of range estimates for the stationary beacons under 4treatments: the original AML algorithm, AML + a classification prefilter, AML + regression (no prefilter), andAML with prefilter + regression. When treatments included regression, distinct candidate models were tested usingtraining sets of all data exceeding a given confidence value (CV ), selected by percentile.

Regression withUncalibrated AML Classification only Regression only classification

CV (%) Median RMS Median RMS Median RMS Median RMS

0 0.44 3.33 0.39 1.29 0.24 3.28 0.22 1.250.2 0.28 3.98 0.27 2.590.4 0.42 5.91 0.40 4.930.6 1.78 13.65 1.55 12.910.8 9.90 38.36 9.35 37.21

Page 8: Improving underwater localization accuracy with machine ...€¦ · a signal conditioning amplifier, a data acquisition computer with two 16-bit digital signal processing cards,

074902-8 Rauchenstein et al. Rev. Sci. Instrum. 89, 074902 (2018)

FIG. 6. (a) DEtotal , (b) DEx , (c) DEy, and (d) DEz ofdifferent stages of calibration for the stationary bea-cons. AML = all AML tracks. AML “1” = AML tracksaccepted by the prefilter. AML “0” = AML tracksremoved by the prefilter. REG1 = the range-calibratedtest set. REG2 = the fully calibrated (range + bearing)test set. Box edges represent Q1 and Q3, and whiskersextend to approximately ±2.7σ.

confusion matrix is

Confusion Matrix =

[209 52769 10 174

]. (3)

This indicates that a total of 978 observations were fil-tered from the test data set or 8.7% of the total observations.The locations of the removed and remaining observations arevisualized in Fig. 7.

The median absolute value and RMS of range error of theuncalibrated AML tracks were 0.41 m and 3.07 m, respectively.After the best classifier was applied, these values dropped to0.38 m and 1.35 m, respectively. Median error and RMSE ofthe incorrectly filtered observations (class “1” miscategorizedas class “0” in the test set) were 0.66 and 1.00 m, respectively.The median error and RMSE of the correctly filtered observa-tions (class “0” classified as class “0”) were 7.97 and 11.23 m,respectively.

The best classifier was validated for accuracy and repro-ducibility. The model was applied to 10 new validation setsof 770 data points chosen according to methods described inSec. II D 1. The model performed reproducibly across all runs,with the classification accuracy of the best run at 83% and amean of 80%. Next the classifier was applied to 10 sets of10 000 points all of class “1.” Classification accuracy meanand median values were both 93%.

2. Regression

The best range error regression model was found to be anensemble of regression trees, constructed using a training setwith a CV of 0. This means that the training set consisted ofhalf of all observations. Quality of the regression models wasevaluated by comparing the RMSE of the test sets. Results fromthe best regression model, evaluated at each CV, are presentedin Table II. Before calibration, 84.4% of AML tracks had the

sub-meter range error. After range calibration, the sub-meterrange error increased to 92.4%.

Separate regression models were constructed for eachof the three spherical coordinates—azimuth, elevation, andrange. The best models for the azimuth and elevation were bothinvocations of Breiman’s Random Forest algorithm althoughalgorithm tuning led to different parameter details for each.Distance errors of the original AML location estimates werecompared with DE from range-calibrated and fully calibratedestimates in order to evaluate the magnitude of the gains fromthe second two cycles of calibration. These results are based onthe application of the regression models to the test (validation)data set observations.

Before calibration, 66.2% of AML location estimates hadthe sub-meter DEtotal. After range calibration, 73.7% had thesub-meter DEtotal, and after full calibration, 82.1% had thesub-meter DEtotal. The median DEtotal for uncalibrated AMLtracks, range calibrated tracks, and fully calibrated tracks,respectively, was 0.78, 0.68, and 0.53 m. RMSDE was 3.70,1.49, and 1.40 m, respectively. In the x direction, the medianDEx was 0.26, 0.26, and 0.30 m and RMSDEx was 2.25, 1.13,and 1.15 m, respectively. The median DEy was 0.42, 0.28, and0.29 m, and RMSDEy was 2.82, 0.74, and 0.78 m, respectively.The median DEz was 0.38, 0.35, and 0.07 m, and RMSDEz was0.80, 0.62, and 0.13 m, respectively. In all cases, calibrationreduced DEtotal from the AML tracks (Fig. 8).

IV. DISCUSSION

The ability to model and calibrate complex errors in agiven system helps to ease design constraints in some impor-tant ways. Localization using TDOA is highly sensitive tothe geometric arrangement of the receiving sensors, includingthe size of the array. Besides improving the accuracy of large

Page 9: Improving underwater localization accuracy with machine ...€¦ · a signal conditioning amplifier, a data acquisition computer with two 16-bit digital signal processing cards,

074902-9 Rauchenstein et al. Rev. Sci. Instrum. 89, 074902 (2018)

FIG. 7. Time history of locations of the 9 mobile bea-cons: (a) GPS locations of all observations, (b) GPSlocations of class “1” observations, i.e., observations keptby the classifier, (c) GPS locations of class “0” obser-vations, i.e., observations removed by the classifier, and(d) AML location estimates for the class “0” observations.

systems with favorable geometries, as has been demonstratedhere, error calibration may enable the deployment of subopti-mal array geometries in cases where field conditions requireit. Error modeling and calibration may also enable the designof much smaller and less expensive systems. Some small,portable arrays have already been developed for responsiveacoustic tracking of, e.g., marine mammals.51 However, adop-tion of small arrays for passive acoustic tracking has remainedelusive due to limitations in the range and accuracy.

In addition to generally improving the localization accu-racy, ML calibration significantly extended the area of thestudy field where distance errors were less than 1 m. Beforecalibration, the sub-meter accuracy was possible within100 m lateral distance from the dam face. The median dis-tance error for tracks in the 200-250 m range was 1.56 m witha RMSE of 7.5 m. After calibrating, the median distance errorfor tracks in the 200-250 m range was only 0.9 m with a RMSEof 1.3 m.

TABLE II. Median absolute error and RMSE (in meters) of range estimates for the mobile beacons under 4treatments: the original AML algorithm, AML + a classification prefilter, AML + regression (no prefilter), andAML with prefilter + regression. When treatments included regression, distinct candidate models were tested usingtraining sets of all data exceeding a given confidence value (CV ), selected by percentile.

Regression withUncalibrated AML Classification only Regression only classification

CV (%) Median RMS Median RMS Median RMS Median RMS

0 0.41 3.07 0.38 1.36 0.23 1.67 0.27 1.000.2 0.26 2.29 0.30 1.610.4 0.38 2.21 0.40 1.660.6 0.75 4.94 0.72 4.130.8 6.69 20.40 5.61 16.42

Page 10: Improving underwater localization accuracy with machine ...€¦ · a signal conditioning amplifier, a data acquisition computer with two 16-bit digital signal processing cards,

074902-10 Rauchenstein et al. Rev. Sci. Instrum. 89, 074902 (2018)

FIG. 8. (a) DEtotal , (b) DEx , (c) DEy, and (d) DEz of dif-ferent stages of calibration for the mobile beacons. AML= all AML tracks. AML “1” = AML tracks accepted bythe prefilter. AML “0” = AML tracks removed by the pre-filter. REG1 = the range-calibrated test set. REG2 = thefully calibrated (range + bearing) test set. Box edges rep-resent Q1 and Q3, and whiskers extend to approximately±2.7σ.

While the total distance error was improved overall,the improvement was especially high in the depth direction.Both the median error and RMSE of calibrated tracks in thez direction were less than 8% (18%) of their precalibratedvalues for stationary (mobile) tracks. This improvement can-not be explained by the constant depths of the 9 beaconsused. While the beacons depths were constant in the Carte-sian plane, the regression model was trained and appliedusing spherical coordinates, which varied widely in all threevariables.

Tracking errors were also highest where beacons were ata sharp angle to the dam (i.e., close to the dam but near theshorelines). While the overall RMSE of uncalibrated mobilebeacons was 3.7 m, the RMSE at the lower shore was 7.8 m.These edge regions contain areas of interest, such as the fishladder, where fish behaviors like far-field approach path andmilling are of interest. ML calibration successfully reducedthese high-error regions to the sub-meter accuracy.

Much of the increase in the accuracy along the aforemen-tioned river edge regions came from the successful classifica-tion step, where the ML algorithms were able to recognize badtracks using only TDOA and AML-estimated range (Figs. 5and 7). This flags the data in the region as high error which isuseful information to a researcher who may choose to incor-porate this knowledge in her treatment of the data or to discardit altogether.

Discarding data highlights the trade-off between the effi-ciency and accuracy. The classification step flagged 15% (9%)of all of the data points as the high error for stationary (mobile)tracks, and these points were not included in the regressionstep. This prefiltering is necessary for successful regression.Tables I and II show that without the classification prefilter,regression would have reduced the median range error to54.5% instead of 50% (56% instead of 66%) of its uncal-ibrated level for the stationary (mobile) data sets. RMS of

the range error would only have dropped to 98.5% instead of37.5% (54% instead of 32.5%) of its uncalibrated value. Thisdecrease in RMSE is the strongest indication of the impor-tance of the prefilter. However, prefiltering poses problems ifthe number of filtered points is too high. This may be adjustedby experimenting with the classification cut-off value, or itmay also be addressed by reinserting the flagged “bad” datapoints into the data set after regression has taken place in orderto provide a complete tracking trajectory. In the future, trackscould be further refined with a Kalman or Gauss-Newton fil-ter to help smooth away the remaining outliers left by theclassification step.

Machine learning results may be interpolated but notextrapolated. In this case, the model should be applied onlywithin the spatial and time domains of the study site, whichwere up to 250 m from the dam, and for the 2 study dates inMarch. During a different season, differences in conditions likewater temperatures and gradients would certainly require newcalibration. Future work should be conducted to extend a cal-ibration model over a longer time span, which would includevariations in water temperatures. Likewise, it was found thatthe stationary and mobile models were not interchangeable.This is likely due to the major differences in the distribution ofdata points over the spatial domain for observations collectedin the two treatments.

V. CONCLUSIONS

Machine learning using classification and regression algo-rithms was a successful method for modeling and subsequentlycalibrating the localization errors in a large, dam-deployedunderwater acoustic tracking system. The reduction in errorwas especially pronounced in the depth direction. A learnedclassification model successfully prefiltered the majority ofhigh error data points with a moderate efficiency trade-off of

Page 11: Improving underwater localization accuracy with machine ...€¦ · a signal conditioning amplifier, a data acquisition computer with two 16-bit digital signal processing cards,

074902-11 Rauchenstein et al. Rev. Sci. Instrum. 89, 074902 (2018)

15% (9%) total points removed for the stationary (mobile)tracks. This enabled the successful training of a regressionmodel, which significantly improved the localization accu-racy and extended the range of the sub-meter tracking accu-racy by an additional 150 m when applied to new sets ofunseen validation data. The calibration model presented hereis site-specific and restricted to the time domain for whichit was made. In order to create a calibration model whichmight be applied across months or years, more data wouldbe needed to provide variability of other important parame-ters, such as water temperature gradients. For future work,we will expand the machine learning method to additionalsteps of the tracking process and refine the AML algorithm formore realistic applications such as tracking vocalizing marineanimals.

ACKNOWLEDGMENTS

This study was funded by the U.S. Department of EnergyWater Power Technologies Office and Analysis in Motion Ini-tiative at the Pacific Northwest National Laboratory (PNNL).The data collection was funded by the U.S. Army Corps ofEngineers Walla Walla District. The study was conducted atPNNL, operated by Battelle for the U.S. Department of Energyunder Contract No. DE-AC05-76RL01830.

1N. H. Kussat, C. D. Chadwell, and R. Zimmerman, “Absolute positioning ofan autonomous underwater vehicle using GPS and acoustic measurements,”IEEE J. Oceanic Eng. 30, 153–164 (2005).

2F. N. Spiess et al., “Precise GPS/acoustic positioning of seafloor refer-ence points for tectonic studies,” Phys. Earth Planet. Inter. 108, 101–112(1998).

3M. A. Weiland et al., “A cabled acoustic telemetry system for detecting andtracking juvenile salmon: Part 1. Engineering design and instrumentation,”Sensors 11, 5645–5660 (2011).

4W. Nehlsen, J. E. Williams, and J. A. Lichatowich, “Pacific salmon at thecrossroads: Stocks at risk from California, Oregon, Idaho, and Washington,”Fisheries 16, 4–21 (1991).

5C. Macilwain, “Sceptics and salmon challenge scientists,” Nature 385, 668(1997).

6J. Wakefield, “Record salmon populations disguise uncertain future,” Nature411, 226 (2001).

7R. A. Lovett, “As salmon stage disappearing act, dams may too,” Science284, 574–575 (1999).

8B. J. Burke et al., “Multivariate models of adult Pacific salmon returns,”PLoS One 8, e54134 (2013).

9G. F. Cada, “The development of advanced hydroelectric turbines to improvefish passage survival,” Fisheries 26, 14–23 (2001).

10M. Odeh and G. Sommers, “New design concepts for fish friendly turbines,”Int. J. Hydropower Dams 7, 64–70 (2000).

11D. A. Neitzel et al., “Survival estimates for juvenile fish subjected toa laboratory-generated shear environment,” Trans. Am. Fish. Soc. 133,447–454 (2004).

12Z. Deng et al., “Evaluation of fish-injury mechanisms during exposure toturbulent shear flow,” Can. J. Fish. Aquat. Sci. 62, 1513–1522 (2005).

13Z. Deng, T. J. Carlson, G. R. Ploskey, M. C. Richmond, and D. D. Dauble,“Evaluation of blade-strike models for estimating the biological perfor-mance of Kaplan turbines,” Ecol. Modell. 208, 165–176 (2007).

14J. Metcalfe and G. Arnold, “Tracking fish with electronic tags,” Nature 387,665–666 (1997).

15J.-M. Roussel, A. Haro, and R. Cunjak, “Field test of a new method fortracking small fishes in shallow rivers using passive integrated transponder(PIT) technology,” Can. J. Fish. Aquat. Sci. 57, 1326–1329 (2000).

16C. G. Meyer, K. N. Holland, B. M. Wetherbee, and C. G. Lowe, “Movementpatterns, habitat utilization, home range size and site fidelity of whitesaddlegoatfish, Parupeneus porphyreus, in a marine reserve,” Environ. Biol. Fishes59, 235–242 (2000).

17S. J. Cooke et al., “Tracking animals in freshwater with electronic tags: Past,present and future,” Anim. Biotelem. 1, 5 (2013).

18V. Tsvetkov, D. Pavlov, and V. Nezdoliy, “Changes of hydrostatic pres-sure lethal to young of some freshwater fish,” J. Ichthyol. 12, 307–318(1972).

19R. S. Brown et al., “Pathways of barotrauma in juvenile salmonids exposedto simulated hydroturbine passage: Boyle’s law vs. Henry’s law,” Fish. Res.121-122, 43–50 (2012).

20G. L. Rutz, M. D. Sholtis, N. S. Adams, and J. W. Beeman, “Investigationof methods for successful installation and operation of juvenile salmonacoustic telemetry system (JSATS) hydrophones in the Willamette River,Oregon, 2012,” U.S. Geological Survey Report 2014-1112, 2014.

21E. L. Rechisky, D. W. Welch, A. D. Porter, M. C. Jacobs-Scott, andP. M. Winchell, “Influence of multiple dam passage on survival of juve-nile Chinook salmon in the Columbia River estuary and coastal ocean,”Proc. Natl. Acad. Sci. U. S. A. 110, 6883–6888 (2013).

22G. A. Mcmichael et al., “The juvenile salmon acoustic telemetry system:A new tool,” Fisheries 35, 9–22 (2010).

23C. C. Coutant and R. R. Whitney, “Fish behavior in relation to pas-sage through hydropower turbines: A review,” Trans. Am. Fish. Soc. 129,351–380 (2000).

24Z. Deng, T. J. Carlson, J. P. Duncan, M. C. Richmond, and D. D. Dauble,“Use of an autonomous sensor to evaluate the biological performance of theadvanced turbine at Wanapum Dam,” J. Renewable Sustainable Energy 2,053104 (2010).

25X. Li, Z. D. Deng, L. T. Rauchenstein, and T. J. Carlson, “Contributedreview: Source-localization algorithms and applications using time of arrivaland time difference of arrival measurements,” Rev. Sci. Instrum. 87, 041502(2016).

26X. Li et al., “A 3D approximate maximum likelihood solver for local-ization of fish implanted with acoustic transmitters,” Sci. Rep. 4, 7215(2014).

27X. Li et al., “Migration depth and residence time of juvenile salmonids inthe forebays of hydropower dams prior to passage through turbines or juve-nile bypass systems: Implications for turbine-passage survival,” Conserv.Physiol. 3(1), cou064 (2015).

28D. A. Tran, X. Nguyen, and T. Nguyen, “Machine learning based local-ization,” in Localization Algorithms and Strategies for Wireless SensorNetworks: Monitoring and Surveillance Techniques for Target Tracking (IGIGlobal, 2009), pp. 302–320.

29M. A. Alsheikh, S. Lin, D. Niyato, and H. P. Tan, “Machine learning inwireless sensor networks: Algorithms, strategies, and applications,” IEEECommun. Surv. Tutorials 16, 1996–2018 (2014).

30M. Di and E. M. Joo, “A survey of machine learning in wireless sensor net-works from networking and application perspectives,” in 2007 6th Interna-tional Conference on Information, Communications and Signal Processing(IEEE, 2007), pp. 1–5.

31H. Wymeersch, S. Marano, W. M. Gifford, and M. Z. Win, “A machinelearning approach to ranging error mitigation for UWB localization,” IEEETrans. Commun. 60, 1719–1728 (2012).

32P. Singh and S. Agrawal, “TDOA based node localization in WSN using neu-ral networks,” in 2013 International Conference on Communication Systemsand Network Technologies (CSNT) (IEEE, 2013), pp. 400–404.

33A. Zielesny, From Curve Fitting to Machine Learning: An Illustrative Guideto Scientific Data Analysis and Computational Intelligence (Springer, 2011),pp. 237–242.

34J. Skalski et al., “BiOp performance testing: Passage and survival of yearlingand subyearling Chinook salmon and juvenile steelhead at Little Goosedam, 2012,” Report PNNL-22140, Pacific Northwest National Laboratory,Richland, Washington, 2013.

35Z. D. Deng et al., “A cabled acoustic telemetry system for detecting andtracking juvenile salmon: Part 2. Three-dimensional tracking and passageoutcomes,” Sensors 11, 5661–5676 (2011).

36J. Ingraham et al., “A fast and accurate decoder for underwater acoustictelemetry,” Rev. Sci. Instrum. 85, 074903 (2014).

37Z. Deng, M. Weiland, T. Carlson, and M. B. Eppard, “Design and instrumen-tation of a measurement and calibration system for an acoustic telemetrysystem,” Sensors 10, 3090–3099 (2010b).

38Y.-T. Chan, H. Y. C. Hang, and P.-C. Ching, “Exact and approximate max-imum likelihood localization algorithms,” IEEE Trans. Veh. Technol. 55,10–16 (2006).

39J. L. Spiesberger and K. M. Fristrup, “Passive localization of calling animalsand sensing of their acoustic environment using acoustic tomography,” Am.Nat. 135, 107–153 (1990).

Page 12: Improving underwater localization accuracy with machine ...€¦ · a signal conditioning amplifier, a data acquisition computer with two 16-bit digital signal processing cards,

074902-12 Rauchenstein et al. Rev. Sci. Instrum. 89, 074902 (2018)

40M. Wahlberg, B. Møhl, and P. T. Madsen, “Estimating source position accu-racy of a large-aperture hydrophone array for bioacoustics,” J. Acoust. Soc.Am. 109, 397–406 (2001).

41R. Bucher and D. Misra, “A synthesizable VHDL model of the exact solu-tion for three-dimensional hyperbolic positioning system,” VLSI Des. 15,507–520 (2002).

42J. E. Ehrenberg and T. W. Steig, “A method for estimating the ‘positionaccuracy’ of acoustic fish tags,” ICES J. Mar. Sci. 59, 140–149 (2002).

43H. Ren et al., “Localization of Southern Resident killer whales using twostar arrays to support marine renewable energy,” in Oceans 2012 (IEEE,2012), pp. 1–7.

44G. Mao, B. Fidan, and B. D. Anderson, “Wireless sensor network localiza-tion techniques,” Comput. Networks 51, 2529–2553 (2007).

45W. Marczak, “Water as a standard in the measurements of speed of soundin liquids,” J. Acoust. Soc. Am. 102, 2776–2779 (1997).

46F. Pereira, T. Mitchell, and M. Botvinick, “Machine learning clas-sifiers and fMRI: A tutorial overview,” NeuroImage 45, S199–S209(2009).

47N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vec-tor Machines and Other Kernel-Based Learning Methods (CambridgeUniversity Press, 2000), pp. 93–122.

48M. Warmuth, J. Liao, and G. Ratsch, “Totally corrective boosting algo-rithms that maximize the margin,” in International Conference on MachineLearning (ICML’06) (ACM, New York, 2006), pp. 1001–1008.

49L. Breiman, “Random forests,” Mach. Learn. 45, 5–32 (2001).50L. Breiman, J. Friedman, R. Olshen, and C. Stone, Introduction to Tree

Classification in Classification and Regression Trees (Wadsworth, 1984),pp. 18–59.

51W. Au and D. Herzing, “Echolocation signals of wild Atlantic spotteddolphin (Stenella frontalis),” J. Acoust. Soc. Am. 113, 598–604 (2003).