Abstract

User location prediction in location-based social networks can predict the density of people flow well in terms of intelligent transportation, which can make corresponding adjustments in time to make traffic smooth, reduce fuel consumption, reduce greenhouse gas emissions, and help build a green cycle low-carbon transportation green system. This paper proposes a Markov chain position prediction model based on multidimensional correction (MDC-MCM). Firstly, extract corresponding information from the user’s historical check-in position sequence as a position-position conversion map. Secondly, the influence of check-in period, space distance, and other factors on the position prediction is linearly weighted and merged with the position prediction of the n-order Markov chain to construct MDC-MCM. Finally, we conduct a comprehensive performance evaluation of MDC-MCM using the dataset collected from Brightkite. Experimental results show that compared with other advanced location prediction technologies, MDC-MCM achieves better location prediction results.

1. Introduction

With the development of the world’s industrial economy, the rapid increase in population, and the unrestrained production and lifestyles, the world climate faces more serious problems. Greenhouse gas emissions are increasing, and the earth’s ozone layer is suffering from unprecedented crises. Catastrophic climate changes have repeatedly appeared globally, which have seriously endangered the living environment and health and safety of human beings. The communication network and the positioning system are combined to form a new type of social network—location-based social network [1]. In a location-based social network, people can share their location and location information at any time through communication devices, also known as sign-in. These data can be used for user location prediction, friend relationship prediction, and personal behavior patterns [24]. The user’s location prediction is of great use in intelligent transportation. It can predict the density of people flow and make corresponding adjustments in time to make traffic smooth [5], reduce fuel consumption, reduce greenhouse gas emissions, and help build a green cycle low-carbon transportation green system. In addition, it also plays an important role in smart cities and epidemiological communication research.

Currently, many methods of position prediction have emerged. Among them, Yuan et al. [6] explored the influence of time and space on location prediction in location-based social networks. Ye et al. [7] used power law distribution to model spatial factors and combined user preferences and friend relationships to predict location. Cheng et al. [8] used a first-order Markov chain based on the influence of the most recently visited location on the next location and integrated the matrix decomposition method to predict the location. Based on the high-order influence of n-order weighted Markov chain, Zhang and Chow [9] combined time and space with friend relationship and popularity factors for location prediction.

In this paper, we adopt the n-order Markov chain [10] and then consider the period of check-in, space distance, friend relationship, and popularity of check-in points and propose a Markov chain position prediction model based on multidimensional correction (MDC-MCM), which realizes the position prediction for LBSNs.

In short, our contribution to this research work has three aspects.Firstly, we link user location prediction in location-based social networks with intelligent transportation to help build a green, circular, low-carbon transportation green system.Secondly, Markov chain position prediction model based on multidimensional correction (MDC-MCM) comprehensively considers the check-in time period, spatial distance, friend relationship, and check-in point popularity. The dimensions considered are more comprehensive.Finally, we evaluated the proposed location prediction method on the Brightkite dataset. The experimental results show that our proposed location prediction method has better prediction performance compared with other methods.

The rest of the paper is organized as follows. Section 2 describes the Markov chain position prediction model based on multidimensional correction in detail. In the third section, we will experiment with the proposed model in the Brightkite dataset to get the results and discuss further. Finally, in the fourth section, the conclusion is drawn and the future work arrangements are described.

2. MDC-MCM

2.1. LLTG Diagram

Figure [11]. A data structure composed of a set of vertices and a set of relations between vertices defined as Graph = (V, E).

Out Degree [11]. The number of edges associated with a vertex is called a degree. In a directed graph, a vertex is the end of the arc and the number of arcs starting from the vertex.

Location to Location Transition Graph (LLTG graph, Location to Location Transition Graph). This contains a series of vertices L and edges . Each vertex represents a point of interest, and each vertex has an out-degree, denoted as , and the transition frequency from to is denoted as . For example, in Figure 1, the out-degree of location node is 8, the out-degree of location node is 3, the out-degree of location node is 7, and the out-degree of location node is 0.

It can be seen from Figure 1 that the LLTG graph describes the transfer frequency from one location node to another location node and the outgoing degree of each node.

Transition probability represents the probability of one location node to another location node, and the transition probability from to is recorded as . And, considering that the out-degree of the location node may be 0, we assume that the transition probability of the out-degree of the location node is 1:

2.2. n-Order Markov Chain

Markov chain [12] is a sequence of random variables , , , and so on. The range of these variables, the set of all their possible values, is called the state space. If the state corresponding to time n is , then is regarded as a function of , …, , also known as an n-order Markov chain [10], which has n-order memory. The matrix composed of transition probabilities is the transition probability matrix.

Assuming that user u has m location nodes and is now at time n and the location node is , the transition probability matrix is as follows:

Among them, .

The probability distribution vector of the initial state is as follows: . Then, the probability distribution vector of user u going to each location node at time n + 1 is as follows:

2.3. Time Zone When Signing in

Studies have shown that the user’s sign-in behavior largely meets the regularity of time [13]. Therefore, analyzing the data from the perspective of time is essential to improve the accuracy of position prediction. We select the Brightkite dataset and make a map of the week distribution and hour distribution of user sign-in (Figures 2 and 3).

From Figure 2, it is found that the proportion of check-in times varies periodically with the week. The number of check-ins from Monday to Thursday is relatively even, the number of check-ins on Friday and Saturday has increased significantly, and the number of check-ins on Saturday is the highest, and the number of check-ins on Sunday and Monday to Thursday is similar.

From Figure 3, it is found that the proportion of the number of check-ins changes periodically with the hour. From 0 : 00 in the morning, the number of user check-ins showed a downward trend, until the lowest peak of check-ins appeared at about 10 am. As the number of check-ins increased, the highest peak appeared at about 7 pm, after which the number of check-ins fluctuated within a small range. According to the law of change, a day is divided into three-time intervals: interval 1, interval 2, and interval 3. Let , then corresponding time range is 0 : 00–10 : 00, 10 : 00–19 : 00, and 19 : 00–24 : 00.

Consider the week and time interval comprehensively to study user sign-in location prediction.

Define the probability of the user u checking in at location in the time interval h as

Among them, h is the element in the previously defined time interval , m is the size of the location set , and indicates that the user u checked in at the location in the time interval h frequency.

Therefore, the check-in probability of the user u at the location node on the t day of the week can also be obtained:

Among them, is the number of check-ins in interval h on the t day of the week and is the total number of check-ins on the t day of the week.

To simplify the calculation, the obtained probability is subjected to min-max normalization processing [14]:

Then, the probability distribution vector of user u going to each location node at time n + 1 is as follows:

2.4. Spatial Distance

Since the spatial distances of the two consecutive check-in points are different, it is necessary to estimate the distribution of the two consecutive check-in points with the spatial distance.

The sampling data of the space are collected from the check-in set D as shown in the following:

Among them, the Haversine distance formula [15] is as follows:where r is the radius of the Earth, about 6371 km.

Assuming that the spatial distance d between two consecutive check-in points approximately obeys the power law distribution [16], the probability density formula of the power law distribution is as follows:

According to the maximum likelihood estimation method [17], we can estimate from sample D

The Brightkite dataset is selected to plot the probability density and the spatial distance of two consecutive check-in points, as shown in Figure 4.

In Figure 4, we find that the spatial distance between the probability density and two consecutive check-in points is very similar to the estimated power law distribution. It shows that our hypothesis is reasonable and effective. The spatial distance d of two consecutive check-in points can be regarded as obeying the power law distribution.

Assuming that user u has m location nodes and is now at time n and the location node is , the probability of going to each node is as follows:

Among them, is the i-th sign-in point.

To simplify the calculation, the obtained probability is subjected to min-max normalization processing [14].

Then, the probability distribution vector of user u going to each location node at time n + 1 is as follows:

2.5. Friendship

Based on the previous research [9], user sign-in points are related to friends. Different friends have different influences. In order to measure the influence of different friends, we have introduced the Jaccard coefficient to measure the similarity and difference between different friends.

2.5.1. Jaccard Coefficient

Jaccard coefficient [18] is widely used in the field of information retrieval. It is often used as an index to measure the similarity of two objects, that is, to judge the probability that a certain characteristic is shared by two objects. Here, a certain characteristic is defined as the number of common friends, that is, the number of common friends owned by two user accounts for the sum of the number of friends owned by two users. The formula is as follows:

Among them, is the set of neighbors of user node i and is the set of neighbors of user node j. The larger the Jaccard coefficient value, the higher the similarity between friends and the closer the relationship.

Assuming that user u has m location nodes and p friends and he is now at time n and his location node is , then the probability that a friend will influence user u’s check-in at location node is as follows:

Among them, represents the check-in frequency of the k-th friend of the user u at the location node .

To simplify the calculation, the obtained influence probability is subjected to min-max normalization processing [14].

Then, the probability distribution vector of user u going to each location node at time n + 1 is as follows:

2.6. Popularity of Check-In Points

The popularity of check-in points can greatly affect the user’s prediction of the next check-in location. The popularity of the check-in point can be directly determined by the historical check-in frequency of the location of the user u.

Assuming that user u has m location nodes and is now at time n and the location node is , the probability of going to each node is as follows:

Among them, represents the historical check-in frequency of the user u at the check-in node .

To simplify the calculation, the obtained probability is subjected to min-max normalization processing [14].

Then, the probability distribution vector of user u going to each location node at time n + 1 is as follows:

The linear weighted fusion of the various predicted probabilities that affect the next check-in position proposed above is used to obtain a Markov chain position prediction model based on multidimensional correction (MDC-MCM). The probability distribution vector of each check-in point of user u at time n + 1 is as follows:

Among them, , , , and are all correction coefficients.

3. Experiment

In this section, the proposed model is compared with the latest position prediction technology, and the accuracy and recall rates are obtained on the Brightkite dataset [2].

3.1. Brightkite Dataset

The Brightkite dataset is a dataset based on user sign-in data in the LBSN sign-in website. The data format for check-in in Brightkite dataset is <userid, check-in time, latitude, longitude, locationid>. Brightkite is the second-largest sign-in site after Foursquare. The statistics of the dataset are shown in Table 1.

In Table 1, we need to preprocess the data in Table 1 to ensure the quantity and quality of the data. In the preprocessing, to prevent the sparse data from affecting the experimental results, users with less than ten check-ins and points of interest with a total of fewer than ten check-ins are filtered out. According to the check-in time, the check-in data are divided into training set and test set. And, the first 80% of the check-in data are used as the training set, and the last 20% of the check-in data are used as the test set. In the experiment, the training set adopts the Markov chain position prediction model based on multidimensional correction to predict the test data.

3.2. Evaluation Technology

We will compare the Markov chain location prediction model (MDC-MCM) we built based on multidimensional corrections and previous location recommendation technologies, including the following:STI. This method considers time and space factors, independently predicts the user’s preference for location nodes in each time interval, and users are more inclined to visit nearby points of interest [6].USG. This method uses comprehensive location prediction model spatial factors according to a power law distribution and combines user preferences and friend relationships [7].FMC. This method is based on a first-order Markov chain, which uses the influence of the most recently visited location on the next location and incorporates the matrix factorization method [8].AMC. This method uses a sequence prediction algorithm based on an n-order weighted Markov chain, combined with a simple weight decay method, so that the recommendation results are more inclined to check-in to places that are closer [19].LORE. This method uses a high-order sequential influence based on an n-order weighted Markov chain and combines time and space with friendly relations and popularity factors [9].MDC-MCM. The MDC-MCM proposed in this paper is based on the high-order sequence influence of the n-order Markov chain and combines the check-in period, space distance, friend relationship, and check-in point popularity factors.

3.3. Performance Metrics

To evaluate the performance of each method, we selected two metrics, precision [20] and recall [20] as follows:

Among them, is the number of users to be predicted, is the predicted hit number of user u, is the number of location prediction sequences of user u, and is the set of locations visited by user u in the test set.

3.4. Result

The number of next positions (top-k) for each prediction is set from 1 to 20. Repeatedly adjust the correction coefficient in the training set, and finally get the current correction coefficient: , , , and . Better prediction results can be obtained in the test set, and draw precision and recall separately with other position prediction techniques curve. The results are shown in Figures 5 and 6.

3.5. Analysis

Here, we analyze the experimental results.

3.5.1. The Number of Check-In Points Recommended for Users Top-k

In Figures 5 and 6, it can be observed that as the number of recommended check-in point top-k increases, the accuracy rate gradually decreases and the recall rate gradually increases. This is also in line with expectations. As the number of recommended check-in point top-k increases, if the location visited by the user is already in the recommended check-in point, it will change as the remaining recommended check-in points increase and the user will be at the recommended check-in point. The proportion of CM becomes lower, and the accuracy rate decreases; as the recommended check-in points increase, the more likely the place users visit is in the recommended check-in points, the greater the recall rate.

3.5.2. The Effect of Different Factors on Recommendation Results

In Figures 5 and 6, through the prediction curve of the FMC method and the prediction curve of the STI method, it can be found that the time factor plays an important role in the position prediction. Through the prediction curve of the STI method and the prediction curve of the USG method, it can be found that the friend relationship plays an important role in location prediction. Through the prediction curve of the USG method and the prediction curve of the AMC method, it can be found that spatial distance plays an important role in position prediction. Through the prediction curve of the AMC method and the prediction curve of the LORE method, it can be found that the popularity of the check-in point plays an important role in the location prediction. MDC-MCM models the sequence influence based on the n-order Markov chain and considers the influence of check-in period, space distance, friend relationship, and check-in point popularity to ensure that MDC-MCM is superior to other location prediction algorithms. However, MDC-MCM uses an n-order Markov chain and has many correction parameters, which makes each run time very long; there are too many correction parameters, and parameter adjustment is cumbersome.

4. Conclusion

This paper proposes a Markov chain position prediction model based on multidimensional correction (MDC-MCM). First, MDC-MCM utilizes the high-order sequence influence based on the n-order Markov chain to consider all positions and transition probabilities in the user’s check-in history. In addition, MDC-MCM combines the influence of check-in period, space distance, friendship, and popularity of check-in points. Finally, the experimental results on the Brightkite dataset show that the MDC-MCM position prediction works well. In the future, we will consider using the community as a unit to make predictions and then make predictions in the community to reduce the workload of computer operations. In addition, consider deploying the model on a distributed computing platform, which greatly shortens the running time and makes it easier to adjust the correction parameters.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was partly supported by the Fundamental Research Funds for the Central Universities (no. N182304010), the Natural Science Foundation of Liaoning Province (no. 20170520333), and the Natural Science Foundation of Hebei Province (no. F2019501012).