Taxi companies store trip logs in Alibaba Cloud Log Service and mine useful information based on reliable storage and rapid statistical calculations. This topic describes how taxi companies mine useful information from the data stored in Alibaba Cloud Log Service.

Taxi companies record details for each trip including the time when a passenger gets in and out, latitude and longitude, distance of the trip, payment method, payment amount, and tax amount. Detailed data greatly facilitates the operation of taxi companies. For example, the companies can determine the running intervals in peak hours and dispatch more vehicles to the areas where more taxis are needed. With the help of the data, the requirement of passengers can be met in a timely manner and drivers can have higher incomes. This improves the efficiency of the whole society.

Sample data:

RatecodeID:  1VendorID:  2__source__:  11.164.232.105    __topic__:  dropoff_latitude:  40.743995666503906    dropoff_longitude:  -73.983505249023437extra:  0    fare_amount:  9    improvement_surcharge:  0.3    mta_tax:  0.5    passenger_count:  2    payment_type:  1    pickup_latitude:  40.761466979980469    pickup_longitude:  -73.96246337890625    store_and_fwd_flag:  N    tip_amount:  1.96    tolls_amount:  0    total_amount:  11.76    tpep_dropoff_datetime:  2016-02-14 11:03:13    tpep_dropoff_time:  1455418993    tpep_pickup_datetime:  2016-02-14 10:53:57    tpep_pickup_time:  1455418437    trip_distance:  2.02
Sample data

Common statistics

Before query and analysis, you must enable and configure the index feature. For more information, see Enable and configure the index feature for a Logstore.

  • Run the following statement to count the number of passengers boarding taxis during the day and determine the peak hours:
     *| select count(1) as deals, sum(passenger_count) as passengers,    
     (tpep_pickup_time %(24*3600)/3600+8)%24 as time        
     group by (tpep_pickup_time %(24*3600)/3600+8)%24 order by time limit 24
    Passenger number per hour

    As shown in the preceding figure, the peak hours are generally the morning hours when people go to work and the evening hours when people get off work. Based on this data, taxi companies can dispatch more vehicles accordingly.

  • Run the following statement to collect statistics about the average trip distance in different time periods:
    *| select  avg(trip_distance)  as trip_distance, 
    (tpep_pickup_time %(24*3600)/3600+8)%24 as time         
    group by (tpep_pickup_time %(24*3600)/3600+8)%24 order by time limit 24
    Average trip distance

    Passengers tend to take a longer trip during certain time periods of the day, so taxi companies need to dispatch more vehicles.

  • Run the following statement to calculate the average trip duration (in minutes) and the time required for per unit of mileage (in seconds), and determine during which time period of the day taxis experience more traffic:
    *| select  avg(tpep_dropoff_time-tpep_pickup_time)/60  as driving_minutes, 
    (tpep_pickup_time %(24*3600)/3600+8)%24 as time  
    group by (tpep_pickup_time %(24*3600)/3600+8)%24 order by time limit 24   
    Rush hours
    *| select  sum(tpep_dropoff_time-tpep_pickup_time)/sum(trip_distance)  as driving_minutes, 
    (tpep_pickup_time %(24*3600)/3600+8)%24 as time        
    group by (tpep_pickup_time %(24*3600)/3600+8)%24 order by time limit 24
    Rush hours 2 More vehicles must be dispatched during peak hours.
  • Run the following statement to calculate average taxi fares during different time periods and determine the hours with more income:
    *| select  avg(total_amount)  as dollars, 
    (tpep_pickup_time %(24*3600)/3600+8)%24 as time 
    group by (tpep_pickup_time %(24*3600)/3600+8)%24 order by time limit 24
    Average taxi fares

    The average taxi fares per customer are higher around 4:00 AM, so financially challenged drivers can consider providing services during this time period.

  • Run the following statement to view the distribution of payment amounts:
    *| select case when total_amount < 1 then 'bill_0_1'  
    when total_amount < 10 then 'bill_1_10' 
    when total_amount < 20 then 'bill_10_20' 
    when total_amount < 30 then 'bill_20_30' 
    when total_amount < 40 then 'bill_30_40' 
    when total_amount < 50 then 'bill_10_50' 
    when total_amount < 100 then 'bill_50_100' 
    when total_amount < 1000 then 'bill_100_1000' 
    else 'bill_1000_'  end 
    as bill_level , count(1) as count group by 
    case when total_amount < 1 then 'bill_0_1'
    when total_amount < 10 then 'bill_1_10' 
    when total_amount < 20 then 'bill_10_20' 
    when total_amount < 30 then 'bill_20_30' 
    when total_amount < 40 then 'bill_30_40' 
    when total_amount < 50 then 'bill_10_50' 
    when total_amount < 100 then 'bill_50_100' 
    when total_amount < 1000 then 'bill_100_1000' 
    else 'bill_1000_'  end 
    order by count desc 
    Distribution of the payment amount As shown in the preceding figure, the payment amount of most transactions ranges from USD 1 to USD 20.