Abstract

In a congested large-scale subway network, the distribution of passenger flow in space-time dimension is very complex. Accurate estimation of passenger path choice is very important to understand the passenger flow distribution and even improve the operation service level. The availability of automated fare collection (AFC) data, timetable, and network topology data opens up a new opportunity to study this topic based on multisource data. A probability model is proposed in this study to calculate the individual passenger’s path choice with multisource data, in which the impact of the network time-varying state (e.g., path travel time) on passenger path choice is fully considered. First, according to the number and characteristics of OD (origin-destination) candidate paths, the AFC data among special kinds of OD are selected to estimate the distribution of passengers’ walking time and waiting time of each platform. Then, based on the composition of path travel time, its real-time probability distribution is formulated with the distribution of walking time, waiting time, and in-vehicle time as parameters. Finally, a membership function is introduced to evaluate the dependence between passenger’s travel time and the real-time travel time distribution of each candidate path and take the path with the largest membership degree as passenger’s choice. Finally, a case study with Beijing Subway data is applied to verify the effectiveness of the model presented in this study. We have compared and analysed the path calculation results in which the time-varying characteristics of network state are considered or not. The results indicate that a passenger’s path choice behavior is affected by the network time-varying state, and our model can quantify the time-varying state and its impact on passenger path choice.

1. Introduction

To alleviate the pressure on urban public transport caused by the increasing demand for urban travel, more and more cities have built large-scale subway networks, such as Shanghai, Beijing, Paris, Tokyo, especially in China, about 20 cities with subway operation mileage of more than 100 kilometers. Compared with other sustainable means of transportation [14], the subway has the characteristics of large capacity [5] and high reliability and is deeply welcomed by urban residents [6, 7]. While the subway brings convenient travel services to passengers [8], the expansion of the network and the influx of passenger flow have also brought new problems to the subway operation, such as crowding [9, 10], train utilization efficiency [11], ticket revenue allocation among operators [12, 13], and the uncertainty of passenger flow distribution caused by the diversity of passenger path choice. To improve service quality and operation efficiency, operators urgently need to know the distribution of passenger flow in time and space dimensions [14]. In a large-scale subway network, the estimation of passenger path is the basis of calculating passenger flow distribution, which aims to identify which line they actually take and at which stations they transfer. Therefore, estimating passenger path is a subject of great practical significance.

It is very difficult to estimate the passenger travel path in large-scale subway networks. On the one hand, in order to improve passengers’ travel experience, the subway operation service usually adopts “seamless transfer mode”; that is, passengers do not need to tap-out/tap-in when transferring between different lines. In this case, if there are multiple candidate paths between OD (origin-destination), the passenger’s path choice cannot be observed [15] or recorded. On the other hand, passenger path choice is affected by many factors, including travel time, number of transfers, departure headway, walking time, waiting time, crowding. In order to depict the impact of these factors on passenger path choice, based on the utility maximization criterion, using questionnaires to calibrate the logit parameters is a widely used traditional method. Raveau et al. [16] proposed a multinomial logit model to study the path choice behavior of subway passengers, in which the utility function considers many factors that affect passengers’ path choice. Later, in order to characterize the impact of the correlation between candidate paths on passenger’s path choice, Raveau et al. [17] extended their research to a C-logit model including a “commonality factor,” which can better deal with the calculation error caused by a path over-lapping. Literatures [12, 18] improved the calculation accuracy of the logit model by adding the transfer cost to the model and establishing multiclass utility function according to passenger category, respectively. However, there are many limitations in using a questionnaire to calibrate parameters, such as (1) collecting data by conducting surveys is costly both in time and resources [19], (2) due to the influence of survey location and respondents, the questionnaire results are often limited in scale and diversity [13], and (3) some key factors (such as congestion, comfortable) are difficult to quantify [20]. This limits the practical application of the logit model.

To overcome the limitations of traditional methods relying on questionnaires, scholars try to calibrate the parameters of the logit model by the data-driven method. Based on AFC (automated fare collection) data, Sun et al. [13] proposed an integrated Bayesian inference approach to study passenger path choice behavior. The core of this approach is still the logit model, while different from the traditional method, and it requires very limited information as input (e.g., passenger travel time from AFC data) but provides comprehensive posterior knowledge of passenger path choice. Then, Xu et al. [21] extended this model and considered the train crowding. However, the above studies did not consider the impact of time-dependent parameters (e.g., train timetable) and network time-varying state (e.g., path travel time) on passenger path choice.

In order to further consider the impact of time-dependence and network time-varying state on passenger path choice behavior, Zhou and Xu [22] combined AFC data and train timetable data to infer passengers’ path choice. The gap between the passengers arriving at the platform at the fastest walking speed and the first train’s departure that they can catch was defined as the “surplus time.” Then, a probability function was constructed with the “surplus time” of each path’s “boarding plan.” Finally, the path with the highest probability was assigned to the passenger. However, the author does not consider the impact of waiting time distribution on path travel time, which is an important factor affecting passenger path choice. Similarly, Zhao et al. [23] proposed a probability model to convert the likelihood of passengers choosing different paths to the probability of taking different trains, and the method was verified with the data collected from the Shenzhen metro system. Li et al. [24] proposed a synchronous clustering method based on passenger “pure” travel time to calculate passenger path selection. The “pure” travel time refers to the remaining time after their access/egress walk time and waiting time are deleted from their travel time. However, deleting the waiting time may affect the calculation accuracy of this method in crowded subway network. Wu et al. [25] presented a density peaks clustering algorithm (DPCA) to infer passengers’ path choice. First, passengers among the same OD are clustered by DPCA according to their travel time and then introduced a method to estimate the theoretical travel time of each candidate path considering the uncertain walking time and transfer time, and finally, according to the similarity between the cluster center value and the theoretical travel time of the candidate path, the cluster (with its passenger) and the candidate path are matched.

The above studies have made a great contribution to estimating passenger path choice based on data-driven method, but it can still be further improved. First, the probability distribution characteristics of time data are very important [26, 27], and these studies tend to rely on assumptions rather than statistical analysis. For example, it is assumed that passenger travel time follows normal distribution or uniform distribution, which will affect the accuracy and applicability of the model. Second, affected by passenger flow and crowding [10, 28] factors, network state (e.g., path travel time) has significant time-varying characteristics, which will ultimately affect the travel choice and travel time of passengers. Existing studies rarely explicitly consider the impact of this feature on passenger path choice. For example, there are two candidate paths for an OD pair, namely p1 and p2. The travel time of p1 in off-peak hours and peak hours is 5 minutes and 10 minutes, respectively, and the corresponding travel time of p2 is 10 minutes and 15 minutes, respectively. In this case, if only the passenger’s travel time (e.g., 10 minutes) is used, it is difficult to accurately infer the passenger’s path choice. If supplemented by passenger departure time (e.g., peak hours), then p1 is the most likely choice for passengers. It can be seen that the time-varying characteristics are helpful to estimate the passenger path.

To address the research gap mentioned above, this study proposes a probability model which can explicitly consider the influence of time-varying characteristics on estimating the passenger path. The model takes AFC, train timetable, and network topology data as inputs. First, the distribution characteristics of path travel time components such as walk time and waiting time are studied. Then, in order to characterize the time-varying state, a method to calculate the distribution of path real-time travel time is presented. Finally, a membership function that can consider both the passenger travel time and the real-time travel time distribution of the candidate path is introduced to infer the passenger path choice. To highlight the novelty of our work, some relevant studies are summarized in Table 1 and compared with our study, in terms of passenger travel time (A), the real-time travel time of path (B), the probability distribution of travel time (C), model and data. The detailed contributions of this study are summarized as follows:(1)We proposed a multisource data-driven method to estimate passenger walking time and waiting time and tested the probability distribution characteristics of relevant time samples by the improved Kolmogorov-Smirnov (KS) method.(2)According to the composition of path travel time and the distribution characteristics of walking time and waiting time, we presented a calculation method of path travel time distribution. On this basis, we further presented the calculation method of path real-time travel time distribution by calculating the time slot of waiting time distribution, which explicitly considers the time-varying state of the network.(3)A membership function is proposed to evaluate the correlation between the passenger travel time and the real-time travel time distribution of each candidate path, and the path with the largest membership degree is assigned to the passenger. In this way, the previous one-dimensional method that only considers passenger travel time can be extended to two dimensions; that is, both passenger travel time and passenger departure time are considered.(4)Based on the data collected from Beijing Subway, the influence of time-varying characteristics of the route on estimating passenger path choice is analysed.

The rest of the paper is organized as follows: Section 2 states the problem and the necessary assumptions. Section 3 introduces the estimation method of walking time/waiting time probability distribution, first. On this basis, the general formula of path travel time distribution and passenger path estimation method is given. As an illustration, we apply the proposed model on the Beijing Subway network as a case study in Section 4 and compared the calculation results in which the time-varying characteristics of network states are considered or not. Finally, we conclude our study, summarize our main findings, and discuss future research directions in Section 5.

2. Problem Description

In the subway system, passengers’ transaction information is recorded in the AFC system, specifically including the check-in time, check-out time, origin station, and destination station, but does exclude their transfer station and path information. Therefore, when there are multiple candidate paths between an OD pair, it is impossible to directly identify passenger path through AFC data. As shown in Figure 1, there are two paths from XD to CWM, and they are (1) XD-XWM-CWM and (2) XD-DD-CWM. They all include one transfer, and the travel distance is similar, so it is impossible to identify passengers’ path choice directly.

The passenger travel time (obtained from the check-in and check-out time in AFC data) refers to the time spent by a passenger on a specific path between these OD. Hence, passenger travel time can be regarded as an observation of the travel time of this path. From a statistical point of view, when a random variable obeys a certain probability distribution, this variable should be within the confidence interval of the distribution. In other words, if we can know the travel time probability distribution of each candidate path, we can calculate the most likely path for passengers according to the confidence level of passenger travel time in the travel time distribution of each path.

The observation set is the basis for estimating the probability distribution of path travel time. For the ODs with only one candidate path, the travel time of passengers between this OD can be regarded as the travel time observation value of this unique path. However, when there are multiple paths between ODs, due to the lack of path information in AFC data, the travel time observation set of each candidate path between such ODs cannot be directly obtained. In other words, AFC data cannot directly estimate the path travel time distribution between such ODs. Therefore, how to estimate the path travel time distribution between multipath ODs through the single path OD (with a determined observation set) is the key to our study.

To that end, we first classify the OD in the network. According to the number of candidate paths and the path characteristics, the OD can be divided into three categories: (1) type-I OD, which means that there is only one candidate path, and passengers do not need to transfer in travel; (2) type-II OD refers to that there is only one candidate path but the O station and D station are not on the same line so that passengers have to transfer in travel; (3) type-III OD pair has multiple candidate paths. Take part of the Beijing Subway network in Figure 1 as an example; NLSL-FXM belongs to type-I OD; NLSL-FCM belongs to type-II OD; NLSL-CWM and FCM-JGM belong to type-III OD.

According to the above analysis, we know that the path travel time observation set of type-I OD can be obtained directly, so our goal is to infer the passenger path choice between type-III ODs with type-I ODs. To bridge these two types of OD, we need to analyse the travel time composition of passengers. As shown in Figure 2, passengers’ travel time on a specific path includes in-vehicle time, walking time, and waiting time. That is, the path travel time of all kinds of OD is composed of these parts. Therefore, we can use the observation set between type-I OD to estimate the time components and then use these time components to estimate the travel time distribution of the path between type-III OD. Finally, the passenger path choice is estimated by means of the passenger travel time and the probability distribution of path travel time. Usually, the train runs according to the timetable, so the in-vehicle time is fixed; walking time can be divided into access time, transfer time, and egress time. The egress time can be calculated according to the timetable data and passenger check-out data. Nevertheless, it is unable to distinguish between waiting time and access/transfer time. Considering that the walking time is a linear function of walking distance and walking speed, if we assume that the walking speed of the same passenger in a trip is consistent, we can calculate the access time and transfer time according to the egress time and channel distances. This assumption is reasonable, although, in practice, some passengers may prolong their walking time due to channel congestion when check-in or transfer; the extended time can be regarded as part of the waiting time. The waiting time of passengers is affected by factors such as train departure frequency and the number of people waiting at the platform, which has strong randomness; that is, the waiting time at the platform has a significant time-varying characteristic. That is, each time component can be decomposed and estimated.

To sum up, based on AFC data, train timetable, and walking distance of subway platform, a passenger path estimation model considering time-varying travel time is proposed in this paper. First, based on type-I OD, the passenger walking time distribution and the waiting time distribution of each platform are estimated. Then, according to the path constituent elements, a method to restore the travel time distribution of each candidate path between type-III OD is proposed. Finally, taking the passenger check-in time, travel time and the real-time travel time distribution of each candidate path as parameters, a membership is introduced to estimate passenger path choice. The calculation flow is shown in Figure 3. To facilitate the subsequent analysis and modeling, the following assumptions are proposed:(i)A1: Egress time is not affected by passenger traffic congestion but is only related to passenger walking speed and walking distance.(ii)A2: The walking speed of the same passenger is consistent in different stages of a trip.(iii)A3: The distance between the ticket gates and the platform is the same at the same station.(iv)A4: Ignoring the time spent by passengers boarding and alighting the train.

3. Methodology

3.1. Probability Distribution of Walking Time

According to the previous analysis, walking time includes three kinds: access time, transfer time, and egress time. The egress time can be obtained from the passenger travel records between type-I OD. Therefore, the estimation method of egress time distribution is given first. Then, the calculation method of access time and transfer time distribution is derived based on the egress time distribution.

The passenger between type-I OD pair , whose check-in time at platform and check-out time from the platform are and , respectively, and Figure 4 shows all the possible trips of this passenger. The horizontal axis is the passenger’s travel time, and the vertical axis represents their location. The access time is the time passenger spends from the access ticket gate to the origin platform. Waiting time is the dwell time passenger spends at the platform before boarding. Denote as the arrival time of train to the platform , and the egress time is equal to the time spent between and . It can be seen from Figure 4 that there are three potential trips (2,3,4) run in the range of and , but the access/egress time reserved for passenger by train 2 does not meet the walking speed consistency assumption (see assumption “A2”), so only train 3 and 4 are feasible.

When passenger has more than one feasible trip, each trip j corresponds to an egress time , namely:

According to the previous assumption that the walking speed of passengers is consistent when egress time is equal to , the corresponding access time can be expressed as follows:where is the distance ratio to the entrance channel and the exit channel, and and are the entrance channel distance of the platform and the exit channel distance of platform , respectively.

Obviously, if the trip is feasible for passenger , its departure time from platform must satisfy

Meanwhile, from Figure 4, it can be seen that if passenger has only one feasible trip, the egress time can be uniquely determined by the equation (1). Therefore, based on such passengers’ egress time, the egress time probability distribution of the corresponding platform can be fitted. However, having only one feasible trip means that passengers’ check-out time should be between the arrival time of his/her unique feasible trip and of neighbour follow-up train , namely:

Equation (4) means that the upper limit of such passengers’ egress walking time cannot exceed the trains’ headway when passengers leave the departure platform, but this may lead to a deviation in the estimation of walking time. In Beijing Subway, the off-peak hours’ headway is about 4 minutes, which is long enough for most passengers to walk from the platform to the ticket gate. Therefore, using the AFC data with only one feasible trip in the off-peak hours can effectively reduce the deviation of estimating the passenger’s egress time. Since the walking speed changes relatively small in different periods, we take the off-peak hours’ walking time distribution to represent the whole day’s walking time distribution.

Assuming that all passengers between type-I OD satisfying equations (1) and (3), and having one feasible trip during off-peak hours, the sample space of their egress walking time is . Referring to the existing research [13, 27, 30, 31], this study uses the normal distribution [13, 30], logarithmic normal distribution [31], and gamma distribution [27] to fit the egress time of the 644 platforms in the Beijing Subway network and uses the improved KS (Kolmogorov-Smirnov) method [32] to test the fitting result. Table 2 gives the corresponding KS test statistical results.

It can be seen that the lognormal distribution has the best fitting effect on the sample space. Therefore, it is safe to assume that passengers’ egress time follows lognormal distribution , where and are the logarithmic mean and standard deviation of the egress walking time, which can be estimated from the egress time sample space by using the MLE (maximum likelihood estimation) method. In this way, we can get the occurrence probability of the egress time by discretizing the fitted probability distribution density; that is,where is the time granularity, such as 1 second; is the probability density function of the egress time, which is in the following form:

Suppose the influence of channel congestion and other factors on the access time is not considered. In that case, according to equation (2), the access time sample space of the platform can be expressed as the scalar multiplication of the walking distance coefficient and the sample space of the egress time.

Suppose obeys lognormal distribution , where and are the logarithmic mean and standard deviation of the access time of platform . In practice, the extension of access/transfer walking time caused by congestion can be regarded as a part of the subsequent waiting time connected with it. Therefore, this treatment will not affect the estimation of feasible trips.

As a special kind of access time (the walking time passengers spend between the arrival platform and the next departure platform), the main factor affecting the transfer time is also the channel distance. Therefore, the distribution of transfer time from the platform to the platform can be calculated by the same method, where and are the logarithmic mean and standard deviation of the walking time from the platform to the platform .

3.2. Probability Distribution of Waiting Time

The uncertainty of platform waiting time is mainly manifested in two aspects. On the one hand, the waiting time at the same platform varies in different periods. On the other hand, different platforms’ waiting time is different in the same period. The waiting passengers include the new check-in passengers, transfer passengers, and stranded passengers. Generally speaking, the boarding probability of the waiting passengers on the platform has nothing to do with their types. The earlier the passengers arrive at the platform, the easier for them to choose the vantage position and more likely to board, especially when the arriving train’s remaining capacity is not enough to load all of them. Therefore, the waiting time distribution of new passengers can represent all passengers’ waiting time distribution on the platform.

It can be seen from Figure 4 that when passenger takes the trip , his/her waiting time at the departure platform is

Substituting equations (1) and (2) into the above equation, we can get the following results:where is the running time of train from platform to platform . In an urban rail transit network, the train’s running time between two platforms on the same line is usually fixed, that is, . Therefore, in equation (8), can be abbreviated as the function of .

Let be the waiting time probability of passenger at the platform when he/she takes trip . From the above analysis, we can know that equals to the likelihood of passengers’ corresponding egress walking time; that is,where the egress walking time probability is given by equation (5).

Equation (9) is normalized, that is, .

The discrete value of waiting time is equal to the product of waiting time generated by different trips and its corresponding normalized probability. Therefore, the sample space of the platform waiting time can be expressed as follows:

Since the platform waiting time changes rapidly, we split the operation time by a short time (such as 30 minutes), and then, we get 22600 periods in total for 644 platforms in the Beijing Subway network. Similarly, the waiting time distribution is fitted based on different probability distributions [13, 27, 30, 31], and the fitting results are tested by improved KS method [32]. Table 3 shows the KS test statistical results of the waiting time distribution.

We can know that lognormal distribution is the best fit for samples, so we assume that the waiting time at platform follows the lognormal distribution with and as logarithmic mean and standard deviation, and the two parameters can be estimated from the sample space . For the platform with fewer passengers, a fixed value (such as the corresponding period’s headway) is used as the waiting time for passengers at the platform.

3.3. Probability Distribution of Path Travel Time

The above analysis obtains the travel time distribution of each part of the path. For the convenience of expression, we number each section/subpath’s travel time in chronological order. So, if there are candidate paths between any type-III OD, according to Figure 2, the travel time of path can be written as the sum of each subpaths’ travel time.where is the sections’ order number of path .

According to the previous analysis, the subpath travel time can be regarded as independent of each other. Beaulieu et al. [33] pointed out that when random variables obey lognormal distribution, the sum of random variables also obeys lognormal distribution. Therefore, the travel time of path follows the lognormal distribution with and as parameters, and and are equal to the logarithmic mean and standard deviation of , respectively.

According to the Wilkinson method [33], the first/second moments of the path travel time equal the first/second moments of the sum of the subpaths’ travel time, respectively. Therefore,

According to the nature of lognormal distribution, there are

Substituting equations (14) and (15) into equations (12) and (13), respectively.

Since the subpaths’ travel time is independent of each other, their covariance is 0. Therefore, we can derive and as follows:where and are the logarithmic expectation and standard deviation of subpaths’ travel time.

The probability density function of path travel time can be expressed as

3.4. Path Inference Method

The results and discussion may be presented separately, or in one combined section, and may optionally be divided into headed subsections.

Before estimating the passenger’s path choice, it is necessary to estimate the travel time distribution of each candidate path. Suppose the theoretical travel time of path between a type-III OD is , and according to equation (11), is the sum of in-vehicle time, walking time, and waiting time under ideal conditions, so it is usually shorter than the actual travel time of path . Hence, if path was the feasible path of passenger , his/her travel time should be not less than , namely:

To calculate the real-time travel time distribution of the path , it is necessary to estimate the real-time waiting time distribution of each departure platform, that is, to calculate the time slot when passengers arrive at different platforms. Assuming that path includes departure platforms, passenger arrives at the departure platform at time , and the time offset relative to is (see Figure 4). Denote as the number of subpaths before departure platform ; then from (11), the offset can be approximated as the sum of the mean of subpaths’ travel time before he/she arriving at platform , namely:

The interesting time period is divided into equal time slots, and the length of each one is . Denote as the time period of passenger arrives at platform , so ; therefore, the waiting time distribution of the platform follows , and are the logarithmic mean and standard deviation of the waiting time of platform in the period . Substituting them into equations (18), and (19), the real-time travel time distribution parameters and of path at time can be obtained.

According to the above analysis, there is a specific dependence between passengers’ travel time and the candidate paths’ real-time travel time. And the membership function in fuzzy set theory can better deal with this situation. For example, literature [34] proposed a fuzzy power Heronian function to deal with multicriteria decision-making problems. Considering that the passenger travel time of the same OD pair obeys the lognormal distribution, this study constructs a lognormal distribution membership function in the following form to estimate the similarity between the passenger’s travel time and the real-time travel time distribution of each candidate path. That is, the membership degree is equal to the discrete value of the passenger travel time on the probability distribution of each candidate path.where is the time granularity, such as 1 second; and are the logarithmic mean and standard deviation of real-time travel time distribution of path at time , respectively. Denote , which represents the difference between the logarithm of passenger travel time and the logarithmic mean of the path real-time travel time, the smaller is, the greater is. When is the same, the greater is, the greater is, indicating that the passenger travel time has higher confidence in the real-time travel time distribution of path .

According to the principle of maximum membership degree, the passenger path between type-III OD pairs considering the time-varying characteristics of network state can be determined by the following formula:

4. Case Study and Comparison

For the purpose of model illustration and verification, we apply the proposed model to Beijing Subway network. The network (as of 2017) consists of 19 lines with 608 km, serving 370 stations including 56 transfer stations. It serves about 5.4 million trips per day. Most of the passengers use smart card or mobile phone to pay the ticket, and the transactions would be recorded by the AFC systems, including the check-in and check-out stations and corresponding times.

We use a typical weekday AFC in October 2017 for the model application. As the punctuality rate of train operation exceeds 99.9% (refer to the report of Beijing Rail Transit Operation Co., Ltd.), we choose the planned timetable on the same day as AFC data as the input timetable. Walking distance is mainly obtained by two methods: (1) Baidu map (map.baidu.com) and (2) field investigation.

We select a typical OD pair (TTYB-CYM) for analysis and comparison, as shown in Figure 5. The relevant walking distances have been indicated in the figure ( represents the access walking distance, represents the transfer distance, and represents the egress walking distance). Yen’s algorithm [35] is adopted to generate candidate paths. To generate a high-quality path set in the complex Beijing Subway network, we introduce two auxiliary rules to Yen’s algorithm, which are (1) dominance coefficient, that is, the ratio of travel time of path to travel time of path should be within a reasonable threshold, and otherwise, stop; (2) relative transfer number limit, that is, compared with the transfer times of the shortest circuit, the transfer number of path should be within a reasonable threshold, and otherwise, stop. In this case, we set the dominance coefficient and relative transfer number as 1.5 and 2, respectively. Hence, there are three candidate paths between this OD, namely:Path 1: TTYB- > YHG (Line 5 to Line2)- > CYM; Transfer 1 time, the theoretical travel time is about 2179 seconds.Path 2: TTYB- > DS (Line 5 to Line6)- > CYM; Transfer 1 time, the theoretical travel time is about 2430 seconds.Path 3: TTYB- > LSQ (Line 5 to Line13)- > DZM (Line 13 to Line2)- > CYM; Transfer 2 times, the theoretical travel time is about 2540 seconds.

4.1. Waiting Time Estimation

The platform waiting time is an important indicator to measure the performance of the subway network. It can reflect the network congestion and passengers’ dwell time in different time periods. Figure 6 shows the average waiting time at different transfer platforms between TTYB-CYM. The observation period is 6:00 ∼ 16:00, and the peak hours are 7:00 ∼ 9:00.

We found that in the peak hours (7:00 ∼ 9:00), the waiting time of YHG and LSQ is close to the train headway, respectively. That is, most passengers can board the first train they can catch after they arrive at the platform. The waiting time of DS and DZM is significantly longer than their train headway, which indicates that passengers may be stranded. This may be caused by the platform overcrowding or the limited capacity of arriving trains. Whatever the reason, passengers’ choice behavior will be affected. In addition, from the whole time period, the changing trend of waiting time and train headway is not consistent, indicating that the waiting time is affected by many factors and has an obvious random characteristic.

4.2. Path Inference Results and Comparison

Figure 7 illustrates the paths’ travel time and the result of passenger path inference in the observation period. The horizontal axis represents the observation period, and the vertical axis represents the time. The solid line represents the travel time of each candidate path between TTYB-CYM, and the dot/triangle/diamond represents the passengers’ path choice. For example, the dot indicated by the red arrow is the AFC data of a passenger and the path inference result. His/her check-in time and travel time are 7:05:31 and 2487 s, respectively, and the choice estimated by our model is path 1.

We can observe that (1) the travel time of path 1 and path 2 increases significantly during peak hours, while that of path 3 changes less. (2) Most passengers choose path 1 and path 2 with a short travel time, and few passengers choose path 3, which is consistent with the field survey investigation. (3) The distribution of passenger travel time in peak hours is compact, while that is relatively scattered in other periods. In the whole observation period, the average passenger travel time in peak hours is significantly longer than that in off-peak hours. Therefore, it is necessary to take the time-varying characteristics of path travel time into consideration when calculating passenger path choice.

Next, two comparative scenarios are designed to further analyse the impact of time-varying characteristics on estimating passenger path choice. In scenario 1, we apply our model to the Beijing Subway network. In scenario 2, we apply a static model to estimate passengers’ path choice, in which the time-varying waiting time is replaced by the line’s headway. Table 4 shows the comparison of passenger path inference results under two different scenarios.Scenario 1: The calculation result shows that more than 90% of passengers tend to choose path 1 and path 2 with a shorter travel time and fewer transfer times during peak hours. And passengers prefer path 1 to path 2. Echoing this result, it can be seen from Figures 5 and 6 that the transfer distance and the waiting time of path 1 (platform YHG) are better than those of path 2 (platform DS).Scenario 2: The result shows that more passengers choose path 2 during peak hours, which is inconsistent with the field survey. The reason for this phenomenon is that the waiting time in peak hours is much longer than the headway (Figure 6). Hence, if the time-varying waiting time is not considered, the passengers’ actual travel time will be underestimated, which leads to the illusion that passengers prefer the path with a longer time. In the off-peak hours, the proportion of choosing path 1 increases but still less than 50%, while the proportion of passengers choosing path 3 is close to 20%, which is not reasonable either. According to our field survey, the transfer distance and travel time of path 3 are very unfriendly to passengers, so 20% is not a reliable proportion. This indicates that the waiting time is a key component to path travel time, which is very important for estimating the passenger path.

In summary, no matter in peak hours or off-peak hours, the proportion of passengers choosing path 1 and path 2 travel exceeds 90% in both scenarios, which indicates that passengers tend to choose the path with fewer transfer times and shorter travel time. In scenario 1, the proportion of passengers who choose path 1 in peak hours is lower than that in off-peak hours, while the changing trend of passenger proportion of the other two paths is opposite. According to the user equilibrium, all passengers tend to choose the “advantage” path with shorter travel time and fewer transfer times, which leads to a sharp increase in waiting time and congestion/crowding of the “advantage” path. Then, the passengers who are sensitive to congestion/crowding or waiting time will seek the so-called “disadvantage” path instead, until no passengers can find a better path than their current one. In other words, in the peak hours, due to the waiting time and congestion/crowding, the attraction of line 1 is reduced, so some passengers turn to choose path 2 or path 3 instead. At this time, passengers’ path choices reach an “equilibrium” state. In off-peak hours, with the decrease of passenger flow, the influence of waiting time and congestion/crowding on path 1 is weakened, so the passenger path choice reaches a new “equilibrium” state. That is, in off-peak hours, the proportion of choosing path 1 is higher than that in peak hours. In short, passenger path choice is a complex time-varying dynamic equilibrium. When calculating their path, the influence of time-varying network state on passenger path choice must be considered.

5. Conclusions

In this study, we propose a probability model to infer passenger path choice in which the time-vary characteristics of network state are explicitly considered. The model takes AFC, train timetable, and network topology data as input parameters and can provide a number of network performance indicators including passenger path choice, waiting time distribution, path travel time probability distribution. To that end, we propose a method to estimate the real-time travel time probability distribution of path based on multisource data. On this basis, a logarithmic membership function that can simultaneously consider passenger travel time and network state is introduced to infer the passenger path choice. Finally, a case study was conducted with the data collected from Beijing Subway to demonstrate the effectiveness of our proposed model. The case study shows that (1) the path travel time has significant time-varying characteristics (such as the change of waiting time); (2) compared with the method without considering the time-varying characteristics, the model proposed in this study can estimate the passenger path choice more accurately.

In general, this study fills the gaps in the existing studies from the following two aspects: (1) a method for estimating the probability distribution of walking time and waiting time based on multisource data is presented, and the probability distribution is tested by improved KS method [32]. The results show that the passenger travel time in Beijing Subway follows a lognormal distribution. (2) Time-varying state is explicitly considered in the estimation method of passenger path choice; that is, compared with the existing studies, the proposed model can consider both passenger travel time and the network state when passenger departs.

Practically, it is easy to apply the proposed model to the path choice or passenger flow calculation in the subway system. Once the input information (e.g., AFC and other data) is predetermined, the subway network performance indicators such as path choice proportion and path travel time distribution can be obtained quickly. Specifically, it takes 12 minutes to estimate the path for the passengers of one day in Beijing Subway. Compared with traditional methods, on the one hand, this method is convenient and efficient, so it can meet the needs of daily operation; on the other hand, because this method is based on actual data rather than questionnaire, it can bring more operation benefits and managerial insights from the “current facts.”

Nevertheless, this study still has some limitations: (1) in the case study, we use the train timetable rather than the real train movement data, which will still affect the accuracy of the method to a certain extent. (2) When the travel volume of the station is small (e.g., the suburban station at night), there may be a large error between the estimated value of waiting time distribution and the actual value. In this case, the estimated value can be replaced by the headway of the corresponding time period or corrected by the multiday cumulative AFC data of the station. (3) In this study, only one membership function is used to infer the passenger path choice. In the subsequent work, other advanced decision functions can be introduced for comparisons, such as fuzzy (monotone) measure [36] and Fermatean fuzzy group decision-making [37].

In the future, we will expand our research in the following aspects: (1) we study the data-fusion-based passenger path estimation model combined with questionnaire data or video data, and (2) we can also study the probability for passengers taking different trains on the basis of current work, so as to better estimate train utilization, platform crowding, etc. In addition, with the increasing scale of subway network, it is also an interesting direction to study passenger travel behavior based on network characteristics [38] or network topology index [39].

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The research was funded by the National Natural Science Foundation of China (91746201).