9
54 Transportation Research Record: Journal of the Transportation Research Board, No. 2399, Transportation Research Board of the National Academies, Washington, D.C., 2013, pp. 54–62. DOI: 10.3141/2399-06 Research Center for Intelligent Transportation Systems, and Guangdong Provincial Key Laboratory of Intelligent Transportation Systems, School of Engineering, Sun Yat-Sen University, Guangzhou 510006, China. Corresponding author: Z.-F. Cheng, [email protected]. of data collection, thus damaging the integrity of the spatiotempo- ral database, failing to meet the vehicle supervision demand, and creating obstacles for many continuous analyses and data-mining operations. Furthermore, there is a need for a spatiotemporal data warehouse that can answer analytical queries more efficiently for applications such as vehicle tracing and traffic jam recognition. Despite the importance of the spatiotemporal data warehouse, a limited number of proposals have been put forth to tackle this prob- lem. The first work on this topic was based on variants of the R-tree with preaggregation techniques for spatial dimensioning (10) and employed the B+ tree to index information with respect to tempo- ral dimension (11). Each entry of the R-tree contained a pointer to the corresponding B-tree that stored historical aggregated data for a given entry. Another work proposed the spatiotemporal cube (ST cube), which stores data along grids indexed by the Hilbert curve and supports spatiotemporal aggregation queries through the use of the prefix sum approach (12, 13). Several critical limitations exist in the previous frameworks for spatiotemporal data warehouses. The multitree framework incurs a high management cost because of the combination of the R-tree and B-tree. In particular, if a small number of objects undergoes a change in position between two consecutive time stamps, the entire R-tree must be updated. This means that all of the corresponding B-trees also must be updated, which in turn leads to a significant amount of com- puting overhead. The multitree framework requires too high a volume of disk access during a tree traversal for aggregation queries to suc- cessfully render online processing. In addition, the performance of a multitree index tends to deteriorate as time progresses, which makes it difficult for a multitree index to keep pace with certain operational requirements. As for the ST cube, although this method is more effi- cient than those based on the R-tree, it operates on a single disk whose total storage volume is limited and must be launched from the start without the support of ordinary databases. Thus, this approach is unable to make full use of ordinary database technologies. Furthermore, neither of these frameworks possesses direct spatial information. For a multitree framework, the spatial boundaries of the R-tree grids are changeable and are able to overlap with each other. For the ST cube, the grid boundaries lie behind the Hilbert value, which leads to extra computation cost when this model needs to clarify the grid location of a vehicle at certain time. To overcome the limitations of the previous models, a new index structure for spatiotemporal data warehouses, the grid time (GT)– indexed cube, is proposed. The objectives in developing this method are to minimize management costs and storage requirements without sacrificing query performance and to make full use of the potential of ordinary databases. Spatiotemporal Data Warehouse for Vehicle Supervision Grid Time–Indexed Cube Approach Ji-Hua Hu, Zhi-Feng Cheng, Cheng-Zhi Zhan, and Wei Tang The large amounts of spatiotemporal data generated by vehicle super- vision systems cannot be efficiently managed by ordinary databases, mainly because of long query responses. To overcome the limitations of ordinary databases, this paper proposes a new approach, the grid time (GT)–indexed cube, which is a spatial grid–indexed, adaptive grid-based, trajectory-supported warehouse for spatiotemporal data. The GT cube partitions an embedded space–time into a set of size-fixed grids to form a cube that continues to grow throughout a constant time interval. Each grid is assigned an identifier composed of its coordinates and start time, and an aggregated value for each grid is stored in the grid records, regardless of the temporal length of the queries. Additionally, the basic grid structure of the GT cube remains unchanged at each time interval. Instead, this method refines the grid in a selected region to handle data skewing by adaptively partitioning the grid into subgrids. After exten- sive performance studies were conducted with spatiotemporal data from the main vehicle supervision system of Guangdong Province, China, it was observed that the GT cube achieved higher query performance than ordinary data storage technologies under various operational conditions, was easily applicable in practice, and demonstrated compatibility with traditional databases. Data warehousing is generally understood as an integrated and time- varying collection of data and is used primarily in strategic deci- sion making via online analytical processing (OLAP) techniques (1). A spatiotemporal data warehouse is a geometric and time-varying collection of data that supports spatiotemporal OLAP operations for extraction of spatiotemporal information (2–8). In the near future, spatiotemporal data warehousing will play a crucial role in assisted decision making for such applications as traffic supervision systems, transportation management (9), and digital battlefields. Because of the success and popularity of the use of the Global Positioning System (GPS) in vehicle supervision, the volume of stored spatiotemporal data has increased sharply. Ordinary databases encounter difficulty with large volumes of spatiotemporal data and are unable to operate smoothly under such situations. To improve performance efficiency, many supervision systems divided spatio- temporal databases into small isolated parts or slowed the frequency

Spatiotemporal Data Warehouse for Vehicle Supervisionsaiv.espaceweb.usherbrooke.ca/References/146_2013... · Data warehousing is generally understood as an integrated and time-varying

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Spatiotemporal Data Warehouse for Vehicle Supervisionsaiv.espaceweb.usherbrooke.ca/References/146_2013... · Data warehousing is generally understood as an integrated and time-varying

54

Transportation Research Record: Journal of the Transportation Research Board, No. 2399, Transportation Research Board of the National Academies, Washington, D.C., 2013, pp. 54–62.DOI: 10.3141/2399-06

Research Center for Intelligent Transportation Systems, and Guangdong Provincial Key Laboratory of Intelligent Transportation Systems, School of Engineering, Sun Yat-Sen University, Guangzhou 510006, China. Corresponding author: Z.-F. Cheng, [email protected].

of data collection, thus damaging the integrity of the spatiotempo-ral database, failing to meet the vehicle supervision demand, and creating obstacles for many continuous analyses and data-mining operations. Furthermore, there is a need for a spatiotemporal data warehouse that can answer analytical queries more efficiently for applications such as vehicle tracing and traffic jam recognition.

Despite the importance of the spatiotemporal data warehouse, a limited number of proposals have been put forth to tackle this prob-lem. The first work on this topic was based on variants of the R-tree with preaggregation techniques for spatial dimensioning (10) and employed the B+ tree to index information with respect to tempo-ral dimension (11). Each entry of the R-tree contained a pointer to the corresponding B-tree that stored historical aggregated data for a given entry. Another work proposed the spatiotemporal cube (ST cube), which stores data along grids indexed by the Hilbert curve and supports spatiotemporal aggregation queries through the use of the prefix sum approach (12, 13).

Several critical limitations exist in the previous frameworks for spatiotemporal data warehouses. The multitree framework incurs a high management cost because of the combination of the R-tree and B-tree. In particular, if a small number of objects undergoes a change in position between two consecutive time stamps, the entire R-tree must be updated. This means that all of the corresponding B-trees also must be updated, which in turn leads to a significant amount of com-puting overhead. The multitree framework requires too high a volume of disk access during a tree traversal for aggregation queries to suc-cessfully render online processing. In addition, the performance of a multitree index tends to deteriorate as time progresses, which makes it difficult for a multitree index to keep pace with certain operational requirements. As for the ST cube, although this method is more effi-cient than those based on the R-tree, it operates on a single disk whose total storage volume is limited and must be launched from the start without the support of ordinary databases. Thus, this approach is unable to make full use of ordinary database technologies.

Furthermore, neither of these frameworks possesses direct spatial information. For a multitree framework, the spatial boundaries of the R-tree grids are changeable and are able to overlap with each other. For the ST cube, the grid boundaries lie behind the Hilbert value, which leads to extra computation cost when this model needs to clarify the grid location of a vehicle at certain time.

To overcome the limitations of the previous models, a new index structure for spatiotemporal data warehouses, the grid time (GT)–indexed cube, is proposed. The objectives in developing this method are to minimize management costs and storage requirements without sacrificing query performance and to make full use of the potential of ordinary databases.

Spatiotemporal Data Warehouse for Vehicle SupervisionGrid Time–Indexed Cube Approach

Ji-Hua Hu, Zhi-Feng Cheng, Cheng-Zhi Zhan, and Wei Tang

The large amounts of spatiotemporal data generated by vehicle super-vision systems cannot be efficiently managed by ordinary databases, mainly because of long query responses. To overcome the limitations of ordinary databases, this paper proposes a new approach, the grid time (GT)–indexed cube, which is a spatial grid–indexed, adaptive grid-based, trajectory-supported warehouse for spatiotemporal data. The GT cube partitions an embedded space–time into a set of size-fixed grids to form a cube that continues to grow throughout a constant time interval. Each grid is assigned an identifier composed of its coordinates and start time, and an aggregated value for each grid is stored in the grid records, regardless of the temporal length of the queries. Additionally, the basic grid structure of the GT cube remains unchanged at each time interval. Instead, this method refines the grid in a selected region to handle data skewing by adaptively partitioning the grid into subgrids. After exten-sive performance studies were conducted with spatiotemporal data from the main vehicle supervision system of Guangdong Province, China, it was observed that the GT cube achieved higher query performance than ordinary data storage technologies under various operational conditions, was easily applicable in practice, and demonstrated compatibility with traditional databases.

Data warehousing is generally understood as an integrated and time-varying collection of data and is used primarily in strategic deci-sion making via online analytical processing (OLAP) techniques (1). A spatiotemporal data warehouse is a geometric and time-varying collection of data that supports spatiotemporal OLAP operations for extraction of spatiotemporal information (2–8). In the near future, spatiotemporal data warehousing will play a crucial role in assisted decision making for such applications as traffic supervision systems, transportation management (9), and digital battlefields.

Because of the success and popularity of the use of the Global Positioning System (GPS) in vehicle supervision, the volume of stored spatiotemporal data has increased sharply. Ordinary databases encounter difficulty with large volumes of spatiotemporal data and are unable to operate smoothly under such situations. To improve performance efficiency, many supervision systems divided spatio-temporal databases into small isolated parts or slowed the frequency

Page 2: Spatiotemporal Data Warehouse for Vehicle Supervisionsaiv.espaceweb.usherbrooke.ca/References/146_2013... · Data warehousing is generally understood as an integrated and time-varying

Hu, Cheng, Zhan, and Tang 55

The GT cube is an adaptive grid- and time-based index structure that is designed to handle aggregation queries over spatiotemporal data. The GT cube partitions an embedded space–time into a set of size-fixed grids to form a cube that continues to grow throughout a constant time interval. Each grid is assigned an identifier (ID) com-posed of its coordinates and start time, and its aggregated value is stored in the grid records, regardless of the temporal length of the queries. The GT cube does not alter its basic grid structure at each time interval. Instead, it is able to refine the grids in certain regions to handle data skewing by adaptively partitioning the grids into smaller subgrids. Such partitioned subgrids can appear and disappear any-where in the grid as vehicles move in the space and the skewed data regions change over time.

The remaining sections of this paper are organized as follows: The next section presents a literature review, which is followed by a description of the structure of the GT cube model and related concepts. Then the database design and complexity analysis are presented. That section is followed by the introduction of the query methods to provide a discussion of the basic functions of the spatiotemporal data warehouse. A presentation of experiments and validation follows. The final section offers conclusions and directions for future work.

RelATed WoRk

The previous research on this topic can be divided into two parts. The first part is based primarily on hierarchical index structures that con-sist of a single R-tree variant and numerous B-trees. The second part covers use of an ST cube. This section presents a survey of previous studies and their extensions for handling spatiotemporal data.

Multitree Framework

The data in a warehouse are conceptually modeled as a data cube constructed from a subset of attributes in a multidimensional data-base (14). Certain attributes are chosen as dimensions or func-tional attributes that describe the domain of attributes such as vehicles and organizations. Other attributes are selected as mea-surement attributes whose values are of particular interest (e.g., miles). Following the definition above, a data cube can be cre-ated in which one dimension corresponds to time and another to space, and the measurement attributes are placed in grids in this two-dimensional table.

A limited number of proposals have been aimed at supporting the spatiotemporal warehouse design, including the RB-tree, the aR-tree and the a3DR-tree (10, 15, 16). However, these attempts contain simi-lar flaws, including wastage of storage space, large index sizes, and poor query efficiency.

Papadias et al. were the first researchers to propose a framework for spatiotemporal data warehouses (11). They presented a multi-tree structure consisting of a single R-tree and numerous B-trees (as many as the number of entries of the R-tree). The aggregate RB-tree (aRB-tree) employed the B-trees to store historical aggre-gated data for the entries of the aR-trees. In particular, each R-tree entry had the form <MBR, ChildPtr, value, BtreePtr>, where BtreePtr was a pointer to the B-tree that maintained the historical data for the corresponding entry.

In typical situations, the aRB-tree requires excessive disk access to process online aggregation queries in practical applications. More-

over, the B-trees contain no one-dimensional order and are scat-tered all over the table; as a result, additional search time is incurred. Although the aRB-tree achieves simultaneous indexing on both the spatial and temporal dimensions to accelerate aggregation queries, the space requirements of the aRB-tree are still problematic for practical applications. For instance, in the case of an aR-tree with 100,000 entries, the aRB-tree requires 100,000 B-trees, which in turn require an immense amount of secondary storage. It appears quite obvious that the aRB-tree may violate the cost constraints for storage or updating, or both.

In vehicle supervision applications, the spatial dimension is dynamic. This dynamic behavior can further aggravate the innate problems of the aRB-tree because a new version of the aR-tree must be created for each change. To avoid duplication of subtrees not affected by the change, Nascimento and Silva employed the HR-tree (historical R-tree) and the 3DR-tree (15) instead of the aR-tree (16). The aHRB-tree and a3DRB-tree also inherit the problems of the aRB-tree. In addition, these two multitree structures incur additional overhead needed to maintain the dynamic information of the spatial dimension.

These problems show that multitree structures based on hierarchi-cal access methods (or data-partitioning methods) are not well suited for spatiotemporal data warehouses because they cannot fulfill the needs of vehicle supervision.

ST Cube

The ST cube partitioned an embedded space into N grids of fixed size. Each grid of the ST cube was represented by the Hilbert value of its center and was arranged on the disk in the order of its Hilbert value (12, 13). Next, these grids were arranged in chronological order to compose a cube. In addition to the Hilbert order in the spatial dimen-sion, the ordering along the temporal dimension imposed a total order-ing on the grids. Once the total ordering was imposed on the grids, a grid of interest was accessible instantaneously; this feature facilitated better performance for aggregation queries.

In the two-dimensional Euclidean space [0, 1] • [0, 1], the Hilbert curve divided the space into 4k grids according to its level k. Next, the grids divided according to the level of the Hilbert curve could be sequentially mapped into a one-dimensional space along the Hilbert curve. Specifically, for the Hilbert curve of level k, the grids were numbered in the sequence of 0, 1, . . . , 4k − 1 along a given curve. The sequence number assigned to each grid was known as the Hilbert value. Each grid of the ST cube, which corresponded to a disk page, contained entries of the form <value>. The spatiotemporal information of the entry was not explicitly stored (the intent being to store as little information as possible in each grid) but could be inferred from a location inside the total-ordered grid. In references to a specific entry, the notation E[h, t] was used, where h was the Hilbert value of a grid and t denoted the time stamp. The Hilbert value of a grid was the output of a hash function whose inputs were the coordinates of the center of a grid.

Certain limitations of the ST cube are evident in this description. First, the ST cube stands on a single disk with limited total storage volume and cannot fulfill large storage demands. Second, because all data are stored on a single disk, although the model may conserve search time for a disk page when retrieving a data set in a sequential mode, it is not effective for use in a parallel mode. Therefore, the ST cube cannot fulfill the needs of vehicle supervision, and a new approach is therefore required.

Page 3: Spatiotemporal Data Warehouse for Vehicle Supervisionsaiv.espaceweb.usherbrooke.ca/References/146_2013... · Data warehousing is generally understood as an integrated and time-varying

56 Transportation Research Record 2399

GT Cube Model

GT Cube Grid

In general, a vehicle supervision area can be described as a two-dimensional Euclidean space with bounds of latitude and longitude (lat1, long1; lat2, long2). If the area is indexed with a grid, the space is divided into M × N grids (along with the nearby space), where M is the number of rows and N is the number of columns. Each grid has an index value denoted by the grid center (latCenter, longCenter) or corner coordinates. The GT cube grid is composed of a fixed-size grid (the floor), a given time interval (the height), and a square column that serves as the basic unit of data storage. In other words, in the space dimension, the data are managed by the grid area, which can be denoted as ΔS, the ID of which is a string composed of the latitude and longitude. In the time dimension, the data are managed by the time interval, denoted as ΔT. Therefore, every ΔS × ΔT represents a three-dimensional GT cube grid, denoted as ΔC, the ID of which can be described by the grid ID and the start time. The grid ID is the output of a function whose inputs are the coordinates of the corner of a grid and its spatial resolution. The function ( f ) is defined as follows:

( )= , , (1)g f x y r

where

x = latitude, y = longitude, r = spatial resolution of the grid, and g = value of the grid ID.

For example, let the corner coordinates of a grid be (23.5032, 113.5756); if the value of r is 0.1°, g is 23.5-113.5; if r is 0.01°, g is 23.50-113.57. With this function, the GT cube not only can assign the value of a grid ID simply but also can find the grid that each object belongs to.

In contrast to the work of Choi et al. (13), in the present study, each GT cube grid serves as a bucket in which to store spatiotemporal data during time interval ΔT; the bucket can be a physical or logical table. Let Mn be the maximum number of entries that will fit in one grid, similar to the maximum number of records in a physical table with reasonable query efficiency (e.g., results are returned in 2 s for a com-mon query operation). Therefore, Mn may vary according to a specific database management system (e.g., Oracle, SQL SERVER).

Let C[g, T] be the GT cube grid and E[g, t] be the data of the time slice; then all data of the time slice during ΔT compose C[g, T], as follows:

[ ] [ ]=, , (2)∪C g T E g t

where T is a larger time interval and t is a smaller time interval. As with the vehicle supervision area, let B[b] be the data collection of the supervision area (B is the data collection of the supervision area and b is the identifier of B in the time dimension) during ΔT, as in Equation 3; then the GT cube G is modeled as a set of the grids B[b], as shown in Equation 4.

∪B b C g TM N

[ ][ ] =( )

( )

=

− −

, (3)index 0,0

1, 1

[ ]= (4)∪G B b

The corresponding definitions are shown in Figure 1a.

Grid Refinement

As discussed previously, the GT cube grid can use a physical or logi-cal table to store data. The record number of the table should not exceed Mn; otherwise, the efficiency of the GT cube will decrease

Data of time slice E[g,t]

GT-CubeC[g,T]

GT-Cube G

Data collection during ∆t B[b]

y

T1

3 7 11 15

2 6 10 14

1 5 9 13

0 4 8 12

3 7 11 15

1 5 9 13

0 4 8 12

26.16.3

6.210 14

T2

x

t

(a)

(b)

FIGURE 1 Diagram of (a) structure of GT cube and (b) grid refinement.

Page 4: Spatiotemporal Data Warehouse for Vehicle Supervisionsaiv.espaceweb.usherbrooke.ca/References/146_2013... · Data warehousing is generally understood as an integrated and time-varying

Hu, Cheng, Zhan, and Tang 57

sharply. The table is referred to as a “reasonable table.” The thresh-old Mn is equal to the total number of data records that can be stored in a reasonable table; this number varies with the type of database.

Vehicle position and status are dynamic by nature, and the distribu-tion on the supervision area is unpredictable and skewed with time. Skewed data can cause a grid in a dense region to become overgrown (in size), which in turn renders the grid unable to meet the need for efficiency during data access. If the record number of a grid exceeds Mn to a large extent, grid partitioning will occur. According to the rate of increase in the number of records, the partition type will have two subgrids, four subgrids, and 16 subgrids, as illustrated in Figure 1b. At time T1, all record numbers are less than Mn. However, at time T2, the record number of the sixth grid exceeds Mn, mainly in the area of the bottom right corner. Therefore, the grid is partitioned as shown in Figure 1b. A point of difference from the work of Choi et al. (12) is that the subgrid can be partitioned if its record number exceeds Mn by a large margin. Grid partitioning can occur in the spatial dimension, as outlined above, and can also occur in the time dimension; the partition types are one-dimensional, similar to that of the spatial dimension.

data Query

Data queries include spatially related queries and time-related que-ries. As illustrated in Figure 1b, a spatially related query attempts to find the GT cube grids that overlap with the query bounds and to access data in each grid. The subsequent processes are common database processes that can be processed in parallel. If subgrids exist, they also can be identified through the overlap function, and the data access method is similar to that described earlier.

WARehouSe deSIGn

To construct the GT cube that has been described, a warehouse design that includes designs for a spatiotemporal data table, an index table, and related functions is required.

design of Spatiotemporal Index Table

The data table is designed according to the GT cube and is divided into four parts:

• Historical table. The historical table, which is established accord-ing to a given granularity, is used primarily to record detailed historical information for every monitored vehicle.

• Current vehicle table. The structure of the current vehicle table is similar to that of the historical data information table and contains information for the current vehicle’s travel state during the monitoring period.

• Space–time cube index table. The space–time cube index table serves as metadata; it primarily provides the general information for the GT cube grid table. The information includes the spatial scope, time scope, and number of records. The structure of the table is presented in Table 1.

• Doubly linked list table. The doubly linked list table records the vehicle’s travel between GT cube grids. The structure of this table is presented in Table 2.

data Storage and Indexing

The data storage is primarily divided into two parts: storage of historical data and storage of new data.

Storage of Historical Data

Data are first divided by spatial area; the data’s geographic coordinates are used to search for the corresponding grid whose number refers to a specific data table in which the data would be stocked at a later time. The form of the table name is “tb_r + row number of the grid + column number of the grid + level of the grid + start time,” where tb_r is the table reference. Other related information for the table (e.g., table name, grid number, spatial scope, record number) is recorded simultaneously. Next, a decision must be made as to whether the table contains more records than the threshold Mn. If the answer is no, the data are stored; if the answer is yes, the tables are split by the grid refinement methods previously discussed. Therefore, new tables are created, and their information is recorded in the tb_index table. The recorded information includes basic items such as table name, grid number, start and end time, and grid level.

Finally, vehicle path management information is assembled by using the vehicle’s license plate number and the positions and times at which the vehicle moves in or out of a GT cube grid. The changes in the position of the vehicle between GT cube grids are clearly delin-eated, and the time and positions at which the vehicle moves in and out of a GT cube grid are recorded as a doubly linked list. In this method, a query of the vehicle path during a given time interval can be performed in two steps:

1. Searching the records of the vehicle in the doubly linked list to find the GT cube grids at which the vehicle had arrived during the time interval and

TABLE 1 Structure of tb_index

Field Description

TABLENAME Name of table of grid

STARTTIME Start time of grid

ENDTIME End time of grid

MAPEXTENT Spatial extension of grid

GRIDTIER Time interval number of grid

RECORDNUM Record number of grid

GRIDROW Row number of grid

GRIDCOLUMN Column number of grid

TABLE 2 Doubly Linked List Table for Vehicle Trajectory

Field Description

CARPLATE Car license plate number

CHANGETIME Time of transition between grids

CHANGETYPE Transition type: 0 = entering new grid, 1 = leaving old grid

GRIDROW Row number of transition grid

GRIDCOLUMN Column number of transition grid

Page 5: Spatiotemporal Data Warehouse for Vehicle Supervisionsaiv.espaceweb.usherbrooke.ca/References/146_2013... · Data warehousing is generally understood as an integrated and time-varying

58 Transportation Research Record 2399

2. Searching all of the related data tables of the GT cube grid to construct a path in the time dimension.

Storage of New Data

In case of new data, the current vehicle information table is estab-lished first and used to record the vehicle’s current positions during a predetermined time interval. The data-receiving platform stocks the data into this data table for convenient acquisition of update information for the vehicle. At the same time, the information is also stored in the historical data table. This process contains three main steps:

1. Adding the information into the data table corresponding to its GT cube grid division (which means storing the data into the corresponding data blocks),

2. Deciding whether the number of records in the data table is over the threshold Mn, and

3. Refreshing the doubly linked list for the vehicle path according to the position changes of the vehicle, if necessary.

Indexing

The indexing process primarily uses the GT cube grid index table and the doubly linked list. The index table primarily makes use of the grid numbers to index the corresponding names of the data tables and

the spatial scopes that they represent. The vehicle’s doubly linked list table is primarily used to record the changes between grids of the vehicle’s path and to identify the vehicle by the license plate number and time interval in which the data table of the historical data for a certain time period is saved (Figure 2).

Time Complexity Analysis

The time complexity of data storage and access may change greatly according to different indexing methods. For simplicity, an addi-tional index of the traditional data table and GT cube was not used in this study; instead, an analysis of the associated time complexities is presented.

The traditional method of data storage places all of the historical data in one data table, so that the time complexity of searching one record is approximately

( )( ) =+1

2(5)O N

N

where N is the total number of data tables and O is the time complexity.

For the GT cube model, there are two tables that must be searched when the operations described above are performed: the index table of a GT cube grid where the metadata of the grid are stored and the table for a GT cube grid where the detailed information for the vehicle is stored. Therefore, the time complexity of query-

Input:Car plate

Write into doublylinked list

Get the first recordsorted by time

Modify the referenceset: R_Set

Get the next record

If the record isequal to the reference

value in R_set

Is the recordthe last one?

End

Write into doublylinked list

NO

NO

YES

FIGURE 2 Flowchart to establish the doubly linked list.

Page 6: Spatiotemporal Data Warehouse for Vehicle Supervisionsaiv.espaceweb.usherbrooke.ca/References/146_2013... · Data warehousing is generally understood as an integrated and time-varying

Hu, Cheng, Zhan, and Tang 59

ing one record (Or) includes three parts, as shown in Equations 6 through 8.

( ) ( )( ) ( )= + +, (6)O G T O L O C O Mr

where O(L), O(C), and O(M) are time complexities of a doubly linked list query, GT-cube grid index query, and the query in a grid, respec-tively. Because the index table of a GT cube grid is relatively static, its grid information can be appended to the doubly linked list table, and there is no need to include the index table when querying the vehicle’s position. The time complexity can be simplified as follows:

( ) ( ) ( )= +, (7)O G T O L O Mr

As analyzed in this way, the time complexity of a path query (Op) can be defined as follows:

( ) ( ) ( )= + ×, (8)O G T O L O M np p p

where n is the number of tables of the GT cube grids that must be searched. These tables can be searched synchronously, and, there-fore, if each GT cube grid has its own disk, the time complexity will decrease sharply. This process can be described by the following equation:

( ) ( ) ( )= +, (9)O G T O L O Mp p p

Basic Functions

The basic functions of the GT cube are queries for vehicle position and vehicle path, in addition to the statistics for vehicles in a given district or on a specific road.

Position Query

The queries for a vehicle’s spatial position at a certain time are divided according to two conditions: the vehicle’s position at the current moment and the vehicle’s position at a certain historical moment. First, the system decides whether the user is searching for information for the current moment. If this is true, the information is directly accessed in the current vehicle table and displayed on the map. If the query is for historical data, the search time is compared with the in-and-out time of each GT cube grid saved in the doubly linked lists. Next, the corresponding GT cube grids in the search time are located, and the information is directly located in the related table. The entire process of the query is shown in Figure 3.

Vehicle Path Query

A query of the vehicle’s path in a certain period is equivalent to a query of the vehicle’s position at multiple moments. This process includes two steps:

1. Identification of how many GT cube grids the vehicle has vis-ited. In this step, the start and end times are used as conditions to search the doubly linked list table for the related GT cube grids and the corresponding tables.

2. A search of the corresponding tables of the GT cube grids for the detailed positions. In this step, all corresponding tables are searched for the vehicle’s detailed positions and other information in chronological order. Next, the information is organized along the time dimension, and the path information of the vehicle in the given time interval is acquired.

ExPErimEnt and Validation

Gt cube construction

According to the design that has been presented, supervision data from the system of key monitoring vehicles in Guangdong Province, China, were selected as the test data. The total number of data points from November 9, 2011, to December 3, 2011, was 2,595,636,719 (about 2.6 billion). During this time, the number of active vehicles was 30,692. The latitude of Guangdong Province is between 20.2° north and 25.5° north, and the longitude is between 109.75° east and 117.33° east. The strategies used to build the GT cube were as follows:

1. In the spatial dimension, the grids were divided by using ΔX = 0.3° and ΔY = 0.3°, which meet the need for space–time varia-tion frequency in the vehicle path as well as the requirements for data display in the spatial grid. (With regard to the frequency of space–time variation, an excessively high frequency will easily lead to a large number of search levels and low efficiency, but too low a frequency will allow the number of data points to grow rapidly, which will make the dividing time shorter and the frequency of division higher.).

2. In the time dimension, the granularity was set to ΔT = 1 day, and the threshold value Mn was 20,000,000 records from the vehi-cle’s GPS data, a quantity that meets the need for query efficiency. After the granularity was determined, the historical table could be divided into several tables according to the grid divisions.

The GT cube contained 7,387 grids in the first level and 93 grids in the second level; the number of data points in each grid table remained under the threshold of 20,000,000. If a grid table exceeded this thresh-old, it was further divided into small tables. During the process of

FIGURE 3 Position query (Y 5 yes; N 5 no).

Input: query time, plate number

Is it incurrent time?

Query the newest carinformation table

Query the doubly linkedlist table

Query the cell table

Query the car location

Present the result on map

Y N

Page 7: Spatiotemporal Data Warehouse for Vehicle Supervisionsaiv.espaceweb.usherbrooke.ca/References/146_2013... · Data warehousing is generally understood as an integrated and time-varying

60 Transportation Research Record 2399

building the GT cube, the metadata of each grid were recorded, and the grid index table was created. The index was constructed on the basis of the grid ID and the time interval. At the same time, the doubly linked list tables were also created and the position changes of the vehicles between grids were recorded. The total number of records was 2,487,006.

The time complexity analysis of the GT cube was performed as follows:

1. For the traditional method of data storage (i.e., daily stor-age of individual vehicle data by using the self-taking index of the database management system, so that the data table contains about 100 million records), the time complexity was calculated with Equation 10:

( ) =+

=1 2,595,636,719

21,297,818,360 (10)O N

2. The time complexity of the GT cube (OGT-cube) was calculated as follows:

O N ( )( ) = + ++

+

=

1 2,487,0061 7,480

220,000,000

22,490,747.5 (11)

GT-Cube

As shown above, it can be concluded that the GT cube sharply decreases the time complexity.

Query for Vehicle location and Path

Position Query

First, 160 vehicles were randomly chosen, their positions were que-ried in the traditional database and the GT cube, and the time costs were subsequently compared. A statistical graph of the resulting time cost is shown in Figure 4; the average time spent was 0.0437 s for the traditional method and 0.0367 s for the GT cube model.

The comparison shows that for a majority of the vehicles, a shorter time was required to query their positions under the GT cube, but for a subset of vehicles, the result was slightly better for queries in the traditional database. There are two reasons for this observation. First, all vehicle data are randomly stored in a table, and no addi-tional index exists such that the time cost for a position query varies greatly. Second, there are two additional levels of index that must be accessed, and this process requires additional time when position in a GT cube is queried. As analyzed previously, it is reasonable that a few time costs from a query in the traditional database are less than those from a query in the GT cube. However, the number of records in a table in the GT cube grid is far smaller than that of a traditional data table, and, therefore, data processing under the GT cube contains additional advantages overall.

Fifty vehicles were then chosen as a test group, and their simul-taneous positions were queried in a traditional database and in the GT cube. The experiment was repeated 50 times, and the mean of the time costs was computed for the traditional database and for the GT cube. The mean value of the traditional method was 3.23 s and that of the GT cube model was 0.78 s, as shown in Figure 5. On the basis of the comparison, it may be concluded that the time cost of a query in the GT cube is much shorter.

Path Query

Test groups of 10, 20, 30, 40, and 50 vehicles were chosen sepa-rately, and their simultaneous paths were queried in the traditional database and in the GT cube with a dynamic time span (from 1 to 24 h). Their time costs are shown in Figure 6; the GT cube model has greater efficiency and meets the goal of improving the efficiency of a path query.

Summary

As shown in these three sets of statistical data, the efficiency of data inquiry in the GT cube presents a distinct improvement. Therefore,

FIGURE 4 Comparison of time spent on one vehicle positioning inquiry.

Time (s)

1.6

1.4

1.2

1

0.8

0.2

0.4

0.6

01 11 21 31 41 51 61

Serial Number of Vehicle

71 9181 101 111 121 131 141

Traditional Method

GT-Cube Model

151 161

Page 8: Spatiotemporal Data Warehouse for Vehicle Supervisionsaiv.espaceweb.usherbrooke.ca/References/146_2013... · Data warehousing is generally understood as an integrated and time-varying

Hu, Cheng, Zhan, and Tang 61

the GT cube has been proven to be practical in dealing with large amounts of spatiotemporal data from monitored vehicles.

conclusions and FuturE Work

A vehicle supervision platform generates an immense amount of spatio temporal data that cannot be efficiently managed by an ordinary database. This paper proposed a new approach for spatiotemporal data warehouses, the GT cube, which is a spatial grid–indexed, adaptive grid-based, trajectory-supported cube characterized by three dimen-sions, including time. Underlying concepts of the GT cube, such as the GT cube grid and the adaptive grid, were also analyzed. The design of the data table structure for spatiotemporal data on the basis of the GT cube model was presented, and the data storage method and index-ing schemes were described. Finally, examples from experimental spatiotemporal data queries provided proof of the effectiveness of the design.

From the experiments, the following conclusions are drawn:

1. The grid-indexed GT cube and the doubly linked list table significantly enhanced the query efficiency.

2. The spatiotemporal data warehouse that was designed and implemented is easily applicable in practice and is compatible with ordinary databases and parallel computation.

Future studies will use a greater amount of vehicle supervision data to test and implement the spatiotemporal data warehouse and will attempt to optimize the initial grid division of the GT cube and its efficiency.

acknoWlEdGmEnts

This study was supported by the National Natural Science Foun-dation of China, the National 863 Plans Projects, and the 2011 Work Safety Special Fund of Guangdong Province. The authors are

FIGURE 5 Comparison of time cost of multivehicle positioning inquiry.

Time (s)6

5

4

3

2

1

01 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Serial Number of Test Group

Traditional Method

GT-Cube Model

1 2 3 4 5 6 7 8 9 10

Time Interval (hour)

11 12 13 14 15 16 17 18 19 20 21 22 23 24

2000 Traditional Method(10 vehicles)

GT-Cube(10 vehicles)

GT-Cube(20 vehicles)

GT-Cube(30 vehicles)

GT-Cube(40 vehicles)

GT-Cube(50 vehicles)

Traditional Method(20 vehicles)

Traditional Method(30 vehicles)

Traditional Method(40 vehicles)

Traditional Method(50 vehicles)

1800

1600

1400

1200

1000

800

600

400

200

0

Time (s)

FIGURE 6 Average time cost of multivehicle path query.

Page 9: Spatiotemporal Data Warehouse for Vehicle Supervisionsaiv.espaceweb.usherbrooke.ca/References/146_2013... · Data warehousing is generally understood as an integrated and time-varying

62 Transportation Research Record 2399

grateful for the contributions of Xianwei Wang of the Geography and Planning School, Sun Yat-Sen University.

ReFeRenCeS

1. Gosain, A., and S. Mann. Object Oriented Multidimensional Model for a Data Warehouse with Operators. International Journal of Database Theory and Application, Vol. 3, No. 4, 2010, pp. 35–40.

2. Birant, D., and A. Kut. ST-DBSCAN: An Algorithm for Clustering Spatial–Temporal Data. Data & Knowledge Engineering, Vol. 60, No. 1, 2007, pp. 208–221.

3. Matejícek, L., P. Engst, and Z. Janour. A GIS-Based Approach to Spatio-Temporal Analysis of Environmental Pollution in Urban Areas: A Case Study of Prague’s Environment Extended by LIDAR Data. Ecological Modelling, Vol. 199, No. 3, 2006, pp. 261–277.

4. Sengupta, R., and C. Yan. A Hybrid Spatio-Temporal Data Model and Structure (HST-DMS) for Efficient Storage and Retrieval of Land Use Information. Transactions in GIS, Vol. 8, No. 3, 2004, pp. 351–366.

5. Laurini, R. Real Time Spatio-Temporal Databases. Transactions in GIS, Vol. 5, No. 2, 2001, pp. 87–97.

6. Huang, S. M., T. H. Chou, and J. L. Seng. Data Warehouse Enhance-ment: A Semantic Cube Model Approach. Information Sciences, Vol. 177, No. 11, 2007, pp. 2238–2254.

7. Bogorny, V., B. Kuijpers, and L. O. Alvares. A Spatio-temporal Data Mining Query Language for Moving Object Trajectories. Technical Report TR-357. Instituto de Informatica, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, 2008.

8. Moreno, F., J. A. Echeverri Arias, and B. Manrique Losada. A Con-ceptual Spatio-Temporal Multidimensional Model. Revista Ingenierías Universidad de Medellín, Vol. 9, No. 17, 2011, pp. 175–184.

9. Demirel, H. A Dynamic Multi-Dimensional Conceptual Data Model for Transportation Applications. ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 58, No. 5, 2004, pp. 301–314.

10. Beckmann, N., H.-P. Begel, R. Schneider, and B. Seeger. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. Proc., 1990 ACM SIGMOD International Conference on Manage-ment of Data, Association for Computing Machinery, New York, 1990, pp. 322–331.

11. Papadias, D., P. Kalnis, J. Zhang, and Y. Tao. Efficient OLAP Opera-tions in Spatial Data Warehouses. Advances in Spatial and Temporal Databases, Vol. 2121, 2001, pp. 443–459.

12. Choi, W., B. Moon, and S. Lee. Adaptive Cell-Based Index For Mov-ing Objects. Data & Knowledge Engineering, Vol. 48, No. 1, 2004, pp. 75–101.

13. Choi, W., D. Kwon, and S. Lee. Spatio-Temporal Data Warehouses Using an Adaptive Cell-Based Approach. Data & Knowledge Engineering, Vol. 59, No. 1, 2006, pp. 189–207.

14. Gray, J., S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, and M. Venkatrao. Data Cube: A Relational Aggregation Operator General-izing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery, Vol. 1, No. 1, 1997, pp. 29–53.

15. Nascimento, M. A., and J. R. O. Silva. Towards Historical R-Trees. Proc., 1998 ACM Symposium on Applied Computing, Association for Computing Machinery, New York, 1998, pp. 235–240.

16. Theoderidis, Y., M. Vazirgiannis, and T. Sellis. Spatio-Temporal Index-ing for Large Multimedia Applications. Proc., Third IEEE International Conference on Multimedia Computing and Systems, 1996, IEEE, New York, pp. 441–448.

The Geographic Information Science and Applications Committee peer-reviewed this paper.