6
Self-Adjusting Method for Efficient GPS Tracklog Compression Marcell Feher Budapest University of Technology and Economics Budapest, Hungary Email: [email protected] Bertalan Forstner PhD Budapest University of Technology and Economics Budapest, Hungary Email: [email protected] Abstract—Data reduction algorithms operating on GPS track- logs are widely used in cartographic applications, because the raw dataset usually includes a huge redundancy and unneccessarily high volume of measurement points. The existing methods for compressing tracklogs are either too simplistic or intended to be used on powerful computers, whereas our research focuses on mobile environments where both memory and computational power are very limited. In this paper we are introducing a modified version of a well-known line generalization algorithm, which aims to reach the best trade-off between complexity and accuracy in embedded systems, mostly smartphones. I. I NTRODUCTION Given a trajectory that consists of geographical positions, path compression algorithms generate a subset of the original data points which approximate the original measured path so that the points of the reduced path meet a given criteria. The general problem is called curve simplification, when the original polyline of N set of vertices is substituted with a polyline of M set of vertices, where M N while keeping a specified error tolerance. This problem occurs in a variety of fields, mostly cartography, computer graphics and computer vision. A. Context of Research Let’s assume that users of a mobile social network want to arrange a meeting. The traditional way of doing this is calling each other and trying to agree about the mutually acceptible coordinates of the meeting. If there are only a few participants this method may fit, but with a growing number of meeting parties it becomes hopeless. The proposed service we are working on - entitled MeetYouThere - is planned to be a plugin for social network applications running on modern smartphones and tablets, which calculates coordinates of the best possible meetings based on the movement habits of the participating users. It determines time and location coordinates, when and where people will be the least distance from each other according to repeating patterns of their movements. Learning these so-called routines is done by machine learning, however the actual method is beyond the scope of this paper. The abstract workflow of the service is: 1) Location tracking and compressing: A constantly running service is tracking and recording geographi- cal position of a user as time-stamped locations. At certain times, a compression function reduces the size of the raw dataset. 2) Routine learning: Based on GPS records, an algo- rithm learns repeating movements of the user and stores a single trajectory which is refined over time. It also identifies frequency of repetition as well as other parameters of the so-called routines. The repre- sentation we use allows calculating whether a routine will occur on an arbitrary day, which is vital when the system looks for trajectories of participants on the desired day of the meeting. 3) Meeting request: When a user would like to arrange a meeting with his acquaintances, he submits the list of participants and optionally the desired day of the meeting. If no date is given, the system will start searching from the current time to a few days ahead. 4) Calculating best meeting points: On this action the system collects routines of all participants and com- pares them. The algorithm determines the locations and moments of time when participants will be the closest to each other on the potential days of the meeting. If search was successful, calculated meeting points are listed for the initiating user. In this paper we introduce an algorithm for step one. II. RELATED WORK There are several methods for polyline simplification in the literature, which can be used for GPS tracklog compression as well. The algorithms are classified into two main groups based on the time of their applicability: batched and online compression techniques. The first group of algorithms are suitable when the whole tracklog is available (i.e. after the data collection is finished), while online methods can compress the data during its recording (location tracking) and not require looking ahead. The method we are introducing in this paper is applied after the data is acquired, therefore in this section we are focusing on batch compression algorithms. The most simple batch reduction method is called Uniform Sampling [1], which simply keeps every i-th location (e.g. 10th, 20th, 50th) of the original trajectory and discards all others. The main idea of this technique is that the original trajectory is also a sample of the true path, therefore keeping only an uniform sample of it is still an approximation of the real tra- jectory. This method is very efficient computationally, however it doesn’t neccessarily capture the valueable locations of the 753 CogInfoCom 2013 • 4th IEEE International Conference on Cognitive Infocommunications • December 2–5, 2013 , Budapest, Hungary 978-1-4799-1546-0/13/$31.00 ©2013 IEEE

[IEEE 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom) - Budapest, Hungary (2013.12.2-2013.12.5)] 2013 IEEE 4th International Conference on Cognitive

Embed Size (px)

Citation preview

Page 1: [IEEE 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom) - Budapest, Hungary (2013.12.2-2013.12.5)] 2013 IEEE 4th International Conference on Cognitive

Self-Adjusting Method for Efficient GPS TracklogCompression

Marcell FeherBudapest University of Technology and Economics

Budapest, HungaryEmail: [email protected]

Bertalan Forstner PhDBudapest University of Technology and Economics

Budapest, HungaryEmail: [email protected]

Abstract—Data reduction algorithms operating on GPS track-logs are widely used in cartographic applications, because the rawdataset usually includes a huge redundancy and unneccessarilyhigh volume of measurement points. The existing methods forcompressing tracklogs are either too simplistic or intended tobe used on powerful computers, whereas our research focuseson mobile environments where both memory and computationalpower are very limited. In this paper we are introducing amodified version of a well-known line generalization algorithm,which aims to reach the best trade-off between complexity andaccuracy in embedded systems, mostly smartphones.

I. INTRODUCTION

Given a trajectory that consists of geographical positions,path compression algorithms generate a subset of the originaldata points which approximate the original measured pathso that the points of the reduced path meet a given criteria.The general problem is called curve simplification, when theoriginal polyline of N set of vertices is substituted with apolyline of M set of vertices, where M ⊆ N while keepinga specified error tolerance. This problem occurs in a varietyof fields, mostly cartography, computer graphics and computervision.

A. Context of Research

Let’s assume that users of a mobile social network want toarrange a meeting. The traditional way of doing this is callingeach other and trying to agree about the mutually acceptiblecoordinates of the meeting. If there are only a few participantsthis method may fit, but with a growing number of meetingparties it becomes hopeless.

The proposed service we are working on - entitledMeetYouThere - is planned to be a plugin for social networkapplications running on modern smartphones and tablets,which calculates coordinates of the best possible meetingsbased on the movement habits of the participating users. Itdetermines time and location coordinates, when and wherepeople will be the least distance from each other accordingto repeating patterns of their movements. Learning theseso-called routines is done by machine learning, however theactual method is beyond the scope of this paper.

The abstract workflow of the service is:

1) Location tracking and compressing: A constantlyrunning service is tracking and recording geographi-cal position of a user as time-stamped locations. At

certain times, a compression function reduces the sizeof the raw dataset.

2) Routine learning: Based on GPS records, an algo-rithm learns repeating movements of the user andstores a single trajectory which is refined over time.It also identifies frequency of repetition as well asother parameters of the so-called routines. The repre-sentation we use allows calculating whether a routinewill occur on an arbitrary day, which is vital whenthe system looks for trajectories of participants on thedesired day of the meeting.

3) Meeting request: When a user would like to arrangea meeting with his acquaintances, he submits the listof participants and optionally the desired day of themeeting. If no date is given, the system will startsearching from the current time to a few days ahead.

4) Calculating best meeting points: On this action thesystem collects routines of all participants and com-pares them. The algorithm determines the locationsand moments of time when participants will be theclosest to each other on the potential days of themeeting. If search was successful, calculated meetingpoints are listed for the initiating user.

In this paper we introduce an algorithm for step one.

II. RELATED WORK

There are several methods for polyline simplification in theliterature, which can be used for GPS tracklog compressionas well. The algorithms are classified into two main groupsbased on the time of their applicability: batched and onlinecompression techniques. The first group of algorithms aresuitable when the whole tracklog is available (i.e. after the datacollection is finished), while online methods can compress thedata during its recording (location tracking) and not requirelooking ahead. The method we are introducing in this paper isapplied after the data is acquired, therefore in this section weare focusing on batch compression algorithms.

The most simple batch reduction method is called UniformSampling [1], which simply keeps every i-th location (e.g. 10th,20th, 50th) of the original trajectory and discards all others.The main idea of this technique is that the original trajectoryis also a sample of the true path, therefore keeping only anuniform sample of it is still an approximation of the real tra-jectory. This method is very efficient computationally, howeverit doesn’t neccessarily capture the valueable locations of the

753

CogInfoCom 2013 • 4th IEEE International Conference on Cognitive Infocommunications • December 2–5, 2013 , Budapest, Hungary

978-1-4799-1546-0/13/$31.00 ©2013 IEEE

Page 2: [IEEE 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom) - Budapest, Hungary (2013.12.2-2013.12.5)] 2013 IEEE 4th International Conference on Cognitive

trajectory and often produces very low quality simplification.A more sophisticated and very well-known method of thisfield is the Douglas-Peucker algorithm which was introducedindependently by numerous researchers: [2], [3], [4], [5], [6],[7]. It operates by approximating a sequence of input pointsby a single line which goes from the first to the last point.In case the farthest point of the original set is more distantfrom the approximated segment than a pre-defined tolerancetreshold, the sequence is splitted to two groups by the mosterroneous point, and the method recursively calls itself withthe subproblems. Otherwise (if the distance is not greater thanthe treshold), the sequence is substituted by its endpoints.Fig. 1 demonstates the algorithm and the outline is shown byAlgorithm 1. Regarding its time cost, in best case it requiresΩ(n) steps, worst case cost is Θ(nm) while the expected timeis Θ(n∗ logn), where n denotes the original and m stands forthe size of approximated points.

Fig. 1. First two steps of Douglas-Peucker algorithm. It produces theapproximated trajectory recursively, adding the most distant point in each step.In Step 1, p9 is selected and added to the set, than in Step 2 p3 is added.Imagecredit: Wang-Chien Lee and John Krumm

Algorithm 1 DouglasPeuckerReduction(startPointIndex,endPointIndex, tolerance)

startPoint = originalPath[startPointIndex]endPoint = originalPath[endPointIndex]currentMaxDistance = 0.0farthestPointIndex = 0for all currentPoint in originalPath between startPointIndexand endPointIndex docurrentDistance =calculatePerpendicularDistance(startPoint,endPoint, currentPoint)

if currentDistance > currentMaxDistance thencurrentDistance = currentMaxDistancefarthestPointIndex = index of currentPoint

end ifend forif currentDistance > tolerance then

Put originalPath[farthestPointIndex] to the com-pressed pathDouglasPeuckerReduction(startPointIndex,farthestPointIndex)DouglasPeuckerReduction(farthestPointIndex,endPointIndex)

end if

Result of the Douglas-Peucker algorithm is not optimal,since the heuristics behind it always selects the most deviatingpoints. To ensure optimal solution, area between the originaland the approximated polylines must be minimized, which isusually done using the Bellmann method [8]. It was invented toapproximate a continous function f(x) by a finite polyline inone dimension, however it can be easily generalized to twodimensions and applied to GPS tracklog compression. Thelarge drawback of this method is that it requires dynamicprogramming, which highly increases the complexity of thecode, which should be avoided in embedded and mobileenvironments.

To compare our proposed method with existing algorithms,we used a widely known and de-facto GPS dataset, calledGeoLife Trajectories. This was collected by Microsoft Asiaduring over three years with more than 180 subjects. Theresulting dataset contains almost 18.000 trajectories with atotal moving distance of 1.2 million kilometeres. This datahas been used by numerous researchers dealing with GPStracklogs, resulting scientific papers like [9], [10] and [11].The trajectories are stored as series of time-stamped decimalGPS coordinates.

III. CONTRIBUTION

As the previous section described the possible methods forcompressing tracklogs, it is clear that in mobile environmentthe most suitable one is the Douglas-Peucker algorithm. There-fore we decided to use that as the base of our compressionmethod, since uniform sampling would result in a very coarseapproximation and the complexity of Bellmann-method woulddrain the battery of the mobile device very quickly.The Douglas-Peucker algorithm does not restrict the distancemeasure in which the tolerance parameter is given and cal-culated throughout the path approximation phase. In case ofcompressing GPS tracklogs, the following two options arise:

1) Use perpendicular Euclidean distance as a distencemeasure. This requires the tolerance parameter to begiven in GPS degrees difference.

2) Use the Haversine-formula as a distance measure.This requires the tolerance parameter to be given inmeters or kilometers.

In the case of mobile environment and the application fieldwe described, both options has advantages and drawbacks.Calculating perpendicular Euclidean distance is very fast, sinceit doesn’t require trigonometric functions, but the toleranceparameter must be given in the same domain as the input data,namely GPS degrees difference. This is not a convinient wayto parameterize the algorithm.Using the Haversine-formula (see Algorithm 2) for comput-ing distances is much more complex. Calculating cross-trackdistance, which is needed by the Douglas-Peucker algorithm,includes almost thirty trigonometric function calls as well assqare root computations in order to get the meters distance ofa point to a line segment on the surface of the Earth. The bigadvenatage of using this method is that the tolerance parametercan be given in meters, which is our desired way of definingthe tolerance.

754

M. Feher and B. Forstner • Self-Adjusting Method for Efficient GPS Tracklog Compression

Page 3: [IEEE 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom) - Budapest, Hungary (2013.12.2-2013.12.5)] 2013 IEEE 4th International Conference on Cognitive

Algorithm 2 Calculate distance of two GPS coordinates withthe Haversine formula

R = 6371 (mean radius of Earth in kilometers)a = sin2(∆Lat/2)+cos(Lat1)∗cos(Lat2)∗sin2(∆Lon/2)c = 2 ∗ arctan(

√(1− a),

√a)

return R ∗ c

Algorithm 3 Calculate initial bearing between two GPS coor-dinatesa = sin(∆Lon) ∗ cos(Lat2)b = cos(Lat1) ∗ sin(Lat2) − sin(Lat1) ∗ cos(Lat2) ∗cos(∆Lon)return arctan(a ∗ b)

Algorithm 4 Calculate Cross-Track Distance on the Earthsurface

R = 6371 (mean radius of Earth in kilometers)a = sin(haversineDistance(lineStart, point)/R)b = sin(bearing(lineStart, point) −bearing(lineStart, lineEnd))return arcsin(a ∗ b) ∗R

In this paper we are showing the way of combiningthe minimal computational requirement of perpendicular Eu-clidean distance measure and the convinience of setting thetolerance parameter in meters to run the Douglas-Peucker pathcompression algorithm.In case of using perpendicular Euclidean distance measure, thevalue it produces is the distance in the coordinate system of theinput data, GPS degrees in this case. This unit doesn’t complywith our desire of parameterizing the algorithm with a valuegiven in meters. In order to enable meter-based tolerance, wehave to transform the domain of meters to the domain of GPSdegrees.

A. Converting meters value to degrees

To convert a distance value given in meters to degreesdifference, first we have to examine the geographic coordinatesystem of the Earth. The planet is considered to be a geoidshape, divided by latitude and longitude arcs (see Fig. 2).

Fig. 2. Longitudes and latitudes of the Earth

When converting a distance given in meters, the latitudevalue must be taken into consideration, since one degreedifference at a given latitude varies between zero and 111.320kilometers. To calculate the correct degrees distance of the

given meters difference at a given latitude, the followingformula is used:

metersDiff/(111132.954−559.822∗cos(2∗ latitude)+1.175 ∗ cos(4 ∗ latitude))

Using this formula the convinient meters tolerance param-eter is converted to the parameter space of the input data (GPScoordinates) and therefore can be used in the Douglas-Peuckeralgorithm with very fast perpencidular Euclidean distancemeasure. Since meter distance is proportional to latitude, weuse the maximum latitude value of the original tracklog tocalculate the global tolerance.Examples of the compression outputs are shown in Fig. 3. Theoriginal path contains 168 points, the compression tolerancewas set to 10 meters, 25 meters and 50 meters, resulting inreduced trajectories of 24 points, 10 points and 5 points. Theoriginal path is drawn with red color, the compressed path isblue, the saved points to procude the desired compression aredenoted by black triangles.

Fig. 3. Examples of GPS tracklog compression output, using meter-basedtolerance input 10, 25 and 50 meters

The mobile-optimized GPS tracklog compression methodis shown in Algorithm 5

755

CogInfoCom 2013 • 4th IEEE International Conference on Cognitive Infocommunications • December 2–5, 2013 , Budapest, Hungary

Page 4: [IEEE 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom) - Budapest, Hungary (2013.12.2-2013.12.5)] 2013 IEEE 4th International Conference on Cognitive

Algorithm 5 Mobile-optimized Douglas-Peucker GPS tracklogcompression

maxLatitude = getMaxLatitude(originalPath)toleranceInDegrees =calculateDegreeDiffAtLatitude(toleranceInMeters,maxLatitude)

douglasPeuckerReduction(originalPath,toleranceInDegrees)

IV. RESULTS

We carried out several experiments to compare the speedof our method and the traditional, Haversine formula-baseddistance calculation of Douglas-Peucker algorithm. The testsuite was a randomly sampled subset of GeoLife 1.3 dataset.We used 100 trajectories, each of them was an input for bothHaversine-based and our tolerance transformation method withmaximum meter distance settings 10, 25, 50 and 100 meters.

Results of the experiments shown an average of 26%speedup by using our proposed input parameter transfor-mational method, compared to the Haversine-formula baseddistance calculation (see Fig. 4).

V. CONCLUSIONS

When compressing GPS tracklogs, the most suitable algo-rithm in mobile environment is the Douglas-Peucker method.However the traditional GPS distance measure used to evaluateeach step is not directly applicable in use-cases when compu-tational power of the device is as low as a mobile phone.In this paper we proposed a method for transforming inputparameter space of the DP algorithm so that the much fasterdistance calculation can be applied when compressing GPStracklogs. We compared our algorithm to the traditional GPSdistance measure (Haversine-formula), and witnessed a 26%speedup which is promising for the future usage in mobileenvironments.

ACKNOWLEDGMENT

This work was partially supported by the European Unionand the European Social Fund through project FuturICT.hu(grant no.: TAMOP-4.2.2.C-11/1/KONV-2012-0013) orga-nized by VIKING Zrt. Balatonfured.

This work is connected to the scientific program of the”Development of quality-oriented and harmonized R+D+Istrategy and functional model at BME” project. This project issupported by the New Szechenyi Plan (Project ID: TAMOP-4.2.1/B-09/1/KMR-2010-0002).

REFERENCES

[1] M. Potamias, K. Patroumpas, and T. Sellis, “Sampling trajectory streamswith spatiotemporal criteria,” in Scientific and Statistical Database Man-agement, 2006. 18th International Conference on, pp. 275–284, IEEE,2006.

[2] D. H. Ballard, “Strip trees: A hierarchical representation for curves,”Communications of the ACM, vol. 24, no. 5, pp. 310–321, 1981.

[3] D. H. Douglas and T. K. Peucker, “Algorithms for the reduction of thenumber of points required to represent a digitized line or its caricature,”Cartographica: The International Journal for Geographic Informationand Geovisualization, vol. 10, no. 2, pp. 112–122, 1973.

[4] R. O. Duda, P. E. Hart, et al., Pattern classification and scene analysis,vol. 3. Wiley New York, 1973.

[5] T. Pavlidis, “Structural pattern recognition,” Springer Series in Electro-physics, Berlin: Springer, 1977, vol. 1, 1977.

[6] U. Ramer, “An iterative procedure for the polygonal approximation ofplane curves,” Computer Graphics and Image Processing, vol. 1, no. 3,pp. 244–256, 1972.

[7] K. Turner, Computer perception of curved objects using a televisioncamera. PhD thesis, University of Edinburgh, 1974.

[8] R. Bellman, “On the approximation of curves by line segments usingdynamic programming,” Communications of the ACM, vol. 4, no. 6,p. 284, 1961.

[9] Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma, “Mining interesting locationsand travel sequences from gps trajectories,” in Proceedings of the 18thinternational conference on World wide web, WWW ’09, (New York,NY, USA), pp. 791–800, ACM, 2009.

[10] Y. Zheng, Y. Chen, Q. Li, X. Xie, and W.-Y. Ma, “Understandingtransportation modes based on gps data for web applications,” ACMTrans. Web, vol. 4, pp. 1:1–1:36, Jan. 2010.

[11] Y. Zheng, X. Xie, and W.-Y. Ma, “Geolife: A collaborative socialnetworking service among user, location and trajectory,” IEEE Data Eng.Bull., vol. 33, no. 2, pp. 32–39, 2010.

756

M. Feher and B. Forstner • Self-Adjusting Method for Efficient GPS Tracklog Compression

Page 5: [IEEE 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom) - Budapest, Hungary (2013.12.2-2013.12.5)] 2013 IEEE 4th International Conference on Cognitive

Fig. 4. Results of comparing runtime of Haversine formula-based (values denoted by X marks) and input parameter transformational distance based (denotedby O marks) Douglas-Peucker GPS tracklog compression. The average speedup of our method is 26%

757

CogInfoCom 2013 • 4th IEEE International Conference on Cognitive Infocommunications • December 2–5, 2013 , Budapest, Hungary

Page 6: [IEEE 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom) - Budapest, Hungary (2013.12.2-2013.12.5)] 2013 IEEE 4th International Conference on Cognitive

758

M. Feher and B. Forstner • Self-Adjusting Method for Efficient GPS Tracklog Compression