• UID623
  • Fans4
  • Follows1
  • Posts72

How can we store GPS data efficiently?

More Posted time:May 19, 2017 14:15 PM
How can we store GPS data efficiently?
Mobile devices have become quite popular in recent years and the use of GPS is also increasingly common. You can see GPS technology applied in functions like real-time location and historical location data in taxi apps, or in path tracking functions in running apps. Many developers have run into problems with designing system architectures to store and query GPS data efficiently.
Any internet product faces large traffic flow, and must be able to withstand sudden and huge surges while ensuring low latency, high stability, and real-time scalability in the future. Those in charge must also keep cost in mind. As a result, it is no simple matter to design a system that meets all of these demands.
In this article, we will design the GPS feature of a sports app to illustrate multiple architectures and their differences.
First we need to clarify the riding app's basic functions:
1. During the ride, the app records the GPS path
2. You can view the user's current location, riding path history, as well as the maximum speed, average speed, ride duration, and other statistics all in the mobile app
3. Once the ride is complete, you can view your own riding history
4. You can share the current path with friends, family members or your spouse in real time
5. The operator needs to be able to store and analyze users' paths and riding statistics
Storage system
First of all, let's look at the characteristics of such a product:
1. The user base is large. Famous products may have millions or even more users.
2. There are clear-cut peaks and valleys. For example, the morning and the evening are peak periods, and the early morning is a valley period.
3. The amount of data stored is relatively large, and the data size, scenarios, and size of the user base are closely related.
4. Products may experience an explosive growth period. The system needs to be quickly scalable.
5. The data is time-sensitive. Users may need to query a range by time, or query the ride path with the starting time and ending time unknown.
6. Low cost.
7. High stability, especially when writing.
The characteristics can be summarized as below:
1. Support high-concurrency writes, especially up to millions or even tens of millions of TPS.
2. Pay as you go.
3. Large storage capacity in a single table, preferably with no limits on the table size: petabytes level.
4. Real-time horizontal scalability.
5. Supports range queries.
6. Low cost.
7. High stability: SLA protection.
Tablestore is a NoSQL database that Alibaba Cloud developed independently for shared storage. It is a storage system designed for big data and offers 10 GB of capacity for free. Tablestore can fully meet all of the above requirements. In addition, it also provides multiple versions, TTL, incremental ID, incremental channels and other features.
After the storage system is determined, let's look at the outline.

Table structure design
Tablestore supports up to four primary keys, with each primary key supporting three types: string, binary and long integer. The attribute column is schema-free and can be freely modified.
In the current scheme, the Tablestore table can be designed in this way:

After the table structure is designed, we can take a look at how to save the path data:
For example, the raw data is:
At 10:10:10 on May 20, 2017, Will was riding a bicycle at the broken bridge on West Lake in Hangzhou. The riding speed was 5m/s, the wind speed was 2m/s, and the temperature was 20 degrees centigrade. Will had ridden eight kilometers by that time.
The data stored in Tablestore is:

GPS path storage
• Highlights:
o Store the meta data during the user's ride in the form of tables
• Mobile phone client:
o After the user starts the app, the mobile phone sends a piece of data every five seconds to the app server. This message includes the current GPS data and meta values (longitude, latitude, speed, distance and so on).
• Server end
o After receiving the message from the client, the app server will first calibrate the GPS data in the message and then save the message to the GPS path table in Tablestore. The primary key saves the user ID, the task ID, the timestamp, and the attribute column saves the longitude, the latitude, the time consumed, the distance and so on. The data at each time is stored in one row.
o [To add] After the user data is sent to the user's app server, the app server can first calibrate the data and then write the data to Tablestore. But sometimes we need to consider that the write process is the critical path, while the cleaning and calibration are a non-critical path, so the write and calibration processes are required to be asynchronous. At this time, we can use the stream feature in Tablestore. The data is written directly to Tablestore, and then the user app server reads the new data in real time through the Tablestore stream feature. The result after the cleaning and calibration are done is written back to Tablestore. Because the Tablestore stream will be kept for 12 hours, user' usage will not be affected even if the cleaning and calibration features are not available for a short period of time.
GPS path query
• You can query the full path history using the Tablestore GetRange interface.
• Mobile phone end:
o When you want to query the path history of the current ride and the current location, a range query (GetRange) is all you need, that is, to query the GPS data from the earliest time of this task to the current time. For the range query, the starting key of the first query can be (md5 (user_id) .sub (0,4), user_id, task_id, MIN), and the ending key can be (md5 (user_id) .sub (0,4), user_id, task_id, NOW). After the first query is completed, you can save the NOW values. The ending key of the last query can serve as the staring key for the next query. The same rule can be followed until the end of the ride.
• Server end
o After receiving the query request from the mobile end, the request will be converted into a Tablestore range query (GetRange), and you can then query the tride path history.
o For previous trajectories, you can also use the range query to get to the path history.
Query GPS paths of other users
• Highlights:
o Pull mode: The GPS path data is stored in the specific user ID only and other users need to pull the data to read it.
o The new table: shard_gps_table records the shared user ID, task ID and other information.
o Encryption: ID and other values designed for the sharing process are encrypted to prevent malicious impersonation.
• Mobile phone end:
o After User A starts the app, A can share the current ride to his/her friend B, that is, A's user_id and the task_id are sent to B.
o After B's client gets the user_id and task_id of User A, he or she can use the range query to get the real-time location and path of User A in this ride.
• Server end:
o If user A shares his or her data through a QR code or links with User B, User B's client can get User A's user ID and task ID (encryption can be adopted to prevent the ID from being tampered with). User B can go to the server side to obtain the real-time path of User A in this ride. In this way, the sharing history can be stored permanently in Table shard_gps_table, or you can choose not to make it persistent.
o If User A wants to share the data with User B in the app, after User A sends the records to be shared to the server, the server will store the data into Table shard_gps_table, and then send a notification to User B, including a special ID value (it may be an encrypted value of User A's user ID and task ID, meaning the user ID and task ID cannot be queried through this value).
o After receiving the notification, User B can check whether a matched sharing record exists in Table shard_gps_table using this special ID value on the server end. If so, the corresponding user ID and task ID of User A will be sent to User B who can then read the path of User A with the help of the data.
Group GPS path query
Users sometimes will ride in groups. In this case, group members may wish to see the real-time locations and paths of everyone in one map, and the group leader may also want to use the data as reference to arrange the location and time for a rest. In such scenarios, you can implement the following method:
• Highlights:
o All members use the same task ID
o Pull mode (Data is written to the user ID of each individual, and data with the same task ID of other members is read)
o Use the new table: group_user_table to record the current users and past users of each group (the Tablestore multiple version feature can save the times that multiple users joined or left the group)
• Mobile phone end:
o The group lead initiates a group and creates a task ID. Other members join the group to get the task ID
o Each member records his or her own GPS path data (the first two primary keys are: user_id and task_id) during the ride
o If a member exits and then joins the group again, the same task ID will be used, and the past and future GPS paths will be associated
• Server end:
o Use the new table (group_user_table) to record the member ID under each task ID. Here the multiple version feature can be used to record the multiple events of users joining and leaving.
o When you want to view the GPS path of the entire group, you can first view all the member IDs through the task ID, and then get the trajectory data of all members using the Tablestore BatchGetRow or GetRange.
Member route exception warning
Some members may lag behind or take a wrong route during group riding or a large-scale competition. In this case, it is necessary for the organizers to be notified. This can be easily implemented by leveraging the Tablestore stream feature to issue warnings on rider exceptions.
• Highlights:
o Use the Tablestore stream feature [to be launched in May]
o Pull mode: Some apps may hold nationwide activities that are attended by hundreds of thousands or even millions of users. The advantage of the pull mode shines under just such conditions.
• Server end:
o After the GPS trajectory data of each member is sent to Tablestore, it will make the data persistent and then route the new data and updated data to the stream channel in real time. Users can read the new incoming GPS data in real time or periodically using the interfaces provided by SDKs and then calculate the GPS data against the safe zones in their own app servers. If the data is out of the safe zone, or lags far behind the normal rate, a warning text message can be sent to the organizer. Here the Alibaba Cloud storage team Message Service can be used.
• Mobile phone end:
o After the organizer receives a warning that a member is lagging behind or out of the safe zone, it can check the path history and current location of that member immediately and arrange for someone to contact the member and avoid danger.
Follow VIPs  
Where there is a community, it is very possible that a VIP might stream a riding event through the app, with his or her GPS path displayed on the side, and hundreds of thousands or even millions of followers watching or sending awards.
• Highlights:
o Pull mode: This is the only logical choice for handling such a large task. Since the previous features all use pull mode, implementing it here is simple.
• Mobile phone end:
• Server end:
o Very similar to previous process logic, we'll skip it here.
Operation analysis
• Full-data analysis:
o SQL: With the Max Computer 2.0 (former ODPS) SQL feature, data in Tablestore can be read directly without being exported, saving time and enabling faster results.
o Self-owned job: With DataX, you can export the data in full and with high concurrency to MaxCompute for analysis and processing. After the processing is done, you can continue using DataX to write the data back to Tablestore.
o Suitable for batch processing.
• Incremental-data analysis:
o With the Tablestore stream feature, you can get the data involved in newly added, updated and deleted events by users in the last 12 hours, as well as real-time changes of data for analysis and processing.
o Suitable for real-time processing.
Direct write scheme
The above method can already fully meet demands for high concurrency and low latency. Can we continue optimizing this technical architecture? Of course the answer is yes. In the above scheme, the mobile phone app can actually directly store data into Tablestore without being transferred by the app server, thus reducing the pressure on the app server.

The five steps are:
1. After the user starts the app, the mobile phone requests from the app server the permission to write data into Tablestore.
2. After the app server approves the request and agrees to grant write permission to this user, you can continue to apply for temporary write permission and authorization time for Tablestore from Alibaba Cloud's STS service.
3. After receiving the request, the STS service will generate a temporary token, including the temporary AccessKeyId, the temporary AccessKeySecret, and the temporary token, and return them to the app server.
4. After the app server receives the STS result, it will return it to the app on the mobile phone end.
5. After the app on the mobile phone receives the temporary token, it can start to write data to Tablestore. The temporary token will expire after a while. If you want to write data after the token has expired, you can continue to apply for a new token.
In addition to reducing the pressure on the app server, the above method can decouple the most critical GPS write process from the app server and make the critical path independent, which is its biggest advantage. As a result, even if the app server suffers a fault and becomes unavailable, the user's GPS write process will not be affected.
In the above sections we have discussed how to use Tablestore to store GPS data in various scenarios, and several important applications of the stream feature. Despite covering a wide range of information, I only provided a brief explanation of each section to keep the length short. If you are interested, I am happy to go into more detail for any section later.