Abstract

Point-of-interest (POI) recommendations are a popular form of personalized service in which users share their POI location and related content with their contacts in location-based social networks (LBSNs). The similarity and relatedness between users of the same POI type are frequently used for trajectory retrieval, but most of the existing works rely on the explicit characteristics from all users’ check-in records without considering individual activities. We propose a POI recommendation method that attempts to optimally recommend POI types to serve multiple users. The proposed method aims to predict destination POIs of a user and search for similar users of the same regions of interest, thus optimizing the user acceptance rate for each recommendation. The proposed method also employs the variable-order Markov model to determine the distribution of a user’s POIs based on his or her travel histories in LBSNs. To further enhance the user’s experience, we also apply linear discriminant analysis to cluster the topics related to “Travel” and connect to users with social links or similar interests. The probability of POIs based on users’ historical trip data and interests in the same topics can be calculated. The system then provides a list of the recommended destination POIs ranked by their probabilities. We demonstrate that our work outperforms collaborative-filtering-based and other methods using two real-world datasets from New York City. Experimental results show that the proposed method is better than other models in terms of both accuracy and recall. The proposed POI recommendation algorithms can be deployed in certain online transportation systems and can serve over 100,000 users.

1. Introduction

The check-in behaviors in location-based social networks (LBSNs) have become a new lifestyle component for millions of users who share their point-of-interest (POI) locations with their contacts in such LBSNs and provide geo-tagged user posts, photos, and micropayments [1]. The functionality of LBSNs has become increasingly sophisticated in recent years and now includes numerous user-centric services. Among these, personalized POI recommendations [2, 3], such as cinemas, restaurants, and tourist attractions, that is predicted to be of personal interest to users, have become an increasingly important service that can greatly enhance the travel experience of users [4, 5]. As a result, POI recommendations that can greatly enhance the travel experience of users have received increasing attention from both industry and academia [3]. However, previous studies usually fit a POI recommendation model based on all the collected check-in data. It does not fully capture user behavior in different scenarios due to the heterogeneities of interuser and intrauser differences [6]. Here, we investigate the tendency of users to travel under different patterns (both spatial and semantic), and also their tendency to select the POIs based on their interests and social links. In addition, existing POI recommendations have generally sought to discover unknown POI types for a user from his or her contacts [7]. However, these recommendations can involve very sparse data and [8], moreover, they fail to make use of the semantics of POI data for LBSN users with similar interests and travel habits as those of the target user, but which are not among the user’s contacts.

Based on the above considerations, in this paper we propose a personal POI recommendation method based on destination prediction. First, the check-in behaviors of individuals are analyzed from the distribution of a user’s POIs based on their travel histories in LBSNs. A variable-order Markov model is employed to predict the intended POI types [9], with consideration of the semantics in the spatial layout, which serves as the key constraint of the recommendation process. This overcomes the weaknesses associated with existing POI recommendations, by not only contacting the users with POIs visited, but also identifying the influence of the current users’ locations on their future movements and taking the types of the next destination into account as the POI types to be recommended. Second, groups of users [10] with similar preferences based on users’ historical trip data and interests in the same topics are obtained [11]. The probability of each POI intended by a user using the suggestions from the communities can thus be calculated. Finally, a list of the recommended destination POIs ranked by probability is provided.

Discovering the travel communities alleviates the problem of data sparsity [12] while simultaneously alleviating the weaknesses of existing methods based on the check-in data collected from the contacts of users, which ignores the fact that many users like interacting with people with different social links [13]. In addition, social interaction has been used in conjunction with movement-related information to improve recommendations by detecting the communities, which enhances the users’ experiences. An analysis employing real-world datasets demonstrates that the use of activity pattern and social interaction for real-life communities facilitates a significant increase of the number of candidate POIs, which contributes to more accurate POI recommendations to individuals relative to ratings-based POI recommendations.

Our key contributions are summarized as follows.(i)We generalize POI recommendations by detecting communities from social interactions and semantics in the spatial movements.(ii)We solve the personalized POI recommendation by taking the types of each user’s historical POIs as the candidates. The prediction model is trained to calculate the probability of users’ intended POIs based on departure longitude and latitude and on departure time.(iii)We evaluate the proposed method against other existing recommendation techniques on two real-world datasets. Experimental results demonstrate that our approach accurately discovers real grouping behaviors, recommends the most interested POIs to the target users in both test cases, and outperforms existing algorithms.

The remainder of this paper is organized as follows. Pertinent research specific to existing POI recommendation techniques employed by LBSNs is presented in Section 2. In Section 3, the POI prediction model, with the integrated travel community, is proposed. The travel-community-based recommendation algorithm is derived in Section 4. In Section 5, numerical results are provided to demonstrate the advantages of the proposed method over two other algorithms. Concluding remarks are given in Section 6.

The datasets for POI recommendations usually include global-positioning-system- (GPS-) based trajectory and the check-in data of LBSNs [14]. While numerous studies [1517] have developed POI recommendations based on GPS trajectory data, these approaches first mined the sequence of semantic POIs visited, which is represented by the check-in data. Additionally, check-in data provide additional information markers (e.g., social interactions, POI types, or semantics in the spatial layout) that are especially useful in capturing latent relationships among users of the same POI type. Therefore, a POI recommendation based on check-in data is greatly favored by researchers, and numerous studies have been conducted [1, 1820].

Increasingly sophisticated POI recommendations have been developed based on check-in data, although each has characteristic weaknesses. For example Berjani and Strufe [21] applied a collaborative filtering (CF) model with check-in data for conducting a POI recommendation. Unfortunately, differences in the number of times a user checks in at the various locations are ignored, leading to the inability to fully discover and rank the users by their interests based on locations. Shi et al. [22] recommended POI locations based on a category-related regular matrix using the historical locations visited by users. However, without considering the current travel activity of users, that method cannot suggest where a user should go next. Ye et al. [23] used the ratings provided by friends in conjunction with the social distance among friends to provide a POI recommendation, without any information regarding the user’s travel interests. Ference et al. [24] extended a CF model with user’s current locations and social interactions, but it recommends a POI only from the travel distance and searches a similar user considering only their social influence [25]. Therefore, it did not work well for sparse datasets. A Bayes classification was used by Jing-jin to calculate the check-in probability of users for specific locations in the future using the historical check-in spots, under a distance-based constraint [26]. However, this method did not identify the influences from other users on the recommendation, which accordingly reduced the accuracy of the recommendation. Ye et al. [27] constructed the diffusion process on multiple information sources (i.e., people’s interests, social influences, and spatial proximity) to improve the accuracy of their proposed recommendation. However, this method can provide a general list of the intended POIs without regard to the locations in which a user is at the present moment.

3. Travel Community Discovery from Predicted Semantic POI

3.1. Semantic POI Prediction

The sequence of semantic POIs visited is especially useful in capturing latent relationships among community members [28]. A POI-related model describes the temporal activity pattern for real-life users that includes all POIs visited by user u over a 1 d period as , where represents a spot from the trajectory database, = (, , , , ), which is defined according to the latitude () and longitude () of the check-in time (), the name of the POI (), and the POI types (). All m historical trajectories of user u are collected in , where id distinguishes trajectories.

We applied the variable-order Markov model to predict the POI destination [29]. The set of historical POIs of a user is abstracted from that is given as HPOI=. Given in , if , then the Nth-order context model of refers to a sequence of length N with as the next POI; that is, =. By looking for the trajectories with the same length as that of in , we can predict the probability distribution of from the number of trajectories observed based on the prediction by partial matching (PPM) model. We calculate the probability of ’ next POI destinations based on the context model using

Here, represents the number of considering as the destination in , S() denotes the total number of for different destinations in , and A() describes the type set of different destinations that have the same contextual sequence with in .

Equation (1) shows that, given a in HPOI, if there are a sequence of trajectories with the same length as its , the probability of that indicates would be a next POI destination that is determined and returned by . Otherwise, it starts to decrease N by 1 and updates the context model to be for predicting using . The “” in (3) indicates the escape cade which is provided in the PPM model to control the searching until it equals 0.

where indicates the prediction probability of the escape code of , denotes the frequent of escape code of , and .

We then decrease by 1, identify the number of in , and search the sequence of trajectories with the same with in . If there is no such sequence, continue to decrease the frequent of escape code until finding the context model in the r-th round. The probability is then calculated as

If no trajectory of the same with the context model is found while the frequent of escape code equals 1, we assign the prediction probability as

We can determine the type of predicted POI with the maximum probability. In the following section, social interaction has been used in conjunction with movement-related information to recommend the POIs with the same type. The similarity and relatedness between users of the same POI types are identified with social links or similar interests in the topics. The POIs with the same type from the communities’ suggestions is finally provided.

3.2. Community Detection

Modeling Social Interests. We employed the social topics of interest [30] to users to define their similarity. We define as a set of social-media data posted by user . A Latent Dirichlet Allocation (LDA) [31, 32] is then applied to learn the topics of through word splitting, stop-word filtering, and part of speech. A vector is then established for the intended topics corresponding to user u. For users and , the similarity of social interests is denoted , expressed by the following equation, where users u and v have similar interests when :

Modeling Travel Preference. Our previous study suggested modeling of the users’ movement-related information using a heterogeneous information network (HIN) [33, 34]. We identify the similarity between two users with the same travel preference in the HIN by a SimRank [35] model in a random-walk process [36].

The LBSN is first modeled as a heterogeneous information network , where the where the travel information in a LBSN refers to the check-in behaviors. Here, U is the set of users and POIs is the set of all POIs. denotes the set of all undirected edges in the network, where represents the relation between users, indicating each user pair has similar travel preferences, implies the check-in behavior of users, and represents POIs of the same type. The similarity of travel between users depends on whether two users can meet each other while randomly walking in the network H. Based on the lengths of the paths through which u and v meet and the number of times they meet, the similarity is calculated using

where denotes the two random walks that start from u and v, respectively. Suppose that they first meet at node x in H, and the lengths of two tracks from their respective origins to x are defined as l(t). Given the two random-walk paths , , the probabilities of a user walking along , are and , respectively, where O() denotes the nearby locations of O(). The probability of u and v meeting via , is then . . To calculate from the random-walk perspective, all paths in H whose length is less than or equal to and their probability that a user walked along the paths are detected. Given a node, the similarity between two users that have the same destination and length of paths is thus determined via (8).

A direct way to combine two information sources for community discovery is to obtain a unified similarity by a weighted combination of all the similarity matrices as follows:

where , , and . We can thus obtain a set Community (u) of N users that are most similar to u.

4. Travel-Community-Based POI Recommendation

The POI recommendation algorithm proposed in this study combines the predicted POIs using personal historical trip data with the candidate POIs of the same types as those of predicted POIs generated by the detected travel communities. Such a candidate POI set is then obtained.

To determine the potential POI locations of greatest interest to community members, it is possible to identify how a user prefers a location that can be measured by the number of times that user checks in at the given location. In general, the more a user checks in, the more he or she feels interested in the location. However, counting check-ins for a user at a given location cannot be an indicator of interest in that location because it fails to account for the number of times the user may check-in at other locations. As such, we seek to measure the relative degree of interest for a user among various POIs of the same type. Therefore, in this study we refer the degree of interest of user v on location to the proportion of the number of check-ins for v at , represented as , to the total number of check-ins for v at all locations of the same types as that of ; it is expressed as follows:

where denotes the average number of check-ins of v at locations within the same types as , and denotes the variance in the number of check-ins of v at locations within the same types as .

The degree of interest of user u in location is then expressed as

With (7), we can provide a list of the recommended POIs from ranked by the degree of interest of user u. Algorithm 1 recommends POIs based on users’ historical trip data and community members who have similar social interests.

Input: U:all users in systems
,,: users;
: the POI types of
: the POI of
: set of candidate POIs
l(u): set of degree of interest of
Output: R(u): set of POIs u is interested in
begin
for each do
  
get
end for
  
  
if()
  ;
end if
end for
end for
for each do
  
(15) if
(16)
(17)
(18)
(19) end if
(20) end for
(21) end for
(22) .POI
(23) return R()
(24) end

5. Results and Discussion

5.1. Dataset

We evaluated the performance of our algorithms by two check-in databases centered in New York City, i.e., Foursquare and Gowalla. Each dataset includes both check-in records and reviews. Both datasets were subjected to preprocessing, where false check-in data were removed, such that the data of a single check-in by a user during a day were collected. Then, the Foursquare dataset contained 3,357 users, 3,543 POI locations, and a total of 168,297 check-ins, whereas the Gowalla dataset contained 5,419 users, 6,742 POI locations, and a total of 330,724 check-ins.

5.2. Evaluation of Recommendation Performance

Two performance indices, denoted accuracy and recall, were used to assess the recommendation performance of the proposed TC-based POI recommendation algorithm. These indices compare distinct relationships between R(u) and the set of POI locations actually visited by a user u according to the actual check-in data (i.e., T(u)). The accuracy and recall of the POI recommendations for u are defined as follows:

Here, reflects the accuracy of the recommendation and refers to the proportion of locations in the recommendation results that users actually visit in the future compared to the total number of POI locations recommended. reveals the comprehensiveness of the recommendation and refers to the number of locations in the recommendation results that users actually visit in the future compared to the total number of POIs that the users actually visit in the future. Accuracy and recall are mutually constrained and a comprehensive utilization of the two can provide an objective evaluation of the prediction results.

The recommendation performance using the destination prediction proposed in this study was verified by comparison with three other typical POI recommendation algorithms, in terms of accuracy and recall. The accuracy and recall of the three algorithms, in which the top-N (N=5, 10, 15, and 20) POIs are suggested, are shown in Figure 1 for the two datasets.

It can be seen that the TC-based POI recommendation algorithm performed better than the other two algorithms for both datasets for all values of N considered. It can be seen from the figure that our proposed algorithm achieves a better accuracy, which is 15% higher than the baseline CF-based algorithm in terms of accuracy. The performance is reasonable because the use of travel community alleviates the problem caused by data sparsity, leading to an improved POI recommendation in sparse and complex networks like LBSNs. The social interaction is also integrated into the recommendations, making the suggestions more personalized.

We also employ the LSTM (Long Short-Term Memory) algorithm to predict the destination. As illustrated in Table 1, the accuracy of LSTM-based POI recommendation algorithm is far less than that of proposed method. The reason is that the LSTM-based algorithm is likely to hinder mobility recognition without the knowledge of the latent semantic relationships between two near neighbors. Moreover, using the various travel trajectories as the input of LSTM-based algorithm may yield suboptimal prediction, due to the differences in the length of travel trajectories.

We study the hyperparameter , which is the trade-off term for combining the interests of social and travel information. The result is shown in Figure 2 where N is 5. We use the weight to combine two kinds of information for fair comparison. It shows that the prediction is very low (usually less than 0.3) when , that is, by relying on a single proximity-related metric. As shown in Figure 2(a), when we increase the weight of the social information, the performance of our algorithm will arise. But after reaching 0.5, the performance will start to go down slightly. This is because direct combination of two kinds of similarity matrices can lead to a stable community detection solution. As we can see in Figure 2(b) the best performance is obtained when we use , at which both objectives are combined most appropriately.

6. Conclusions

POI recommendations play a key role in attracting users in LBSNs. The algorithm proposed in this paper aims to optimally recommend POI types to serve multiple users. First, the intended POIs of an individual are analyzed according to their historical trip data, and a variable-order Markov model is employed to predict the types of potential POI locations for the user. Second, a degree of interest is defined to discover the community and the set of POIs according to the social links and travel preferences between users. Two types of POI information are then combined to rank the candidate POIs for a top-N recommendation. The results of experiments employing real-world datasets demonstrate that the proposed algorithm provides better accuracy and recall than two other typical POI recommendation algorithms. However, the performance of the algorithm would benefit from further studies to model the temporal information for mining user behavior. In addition, the weighted combination in (9) would lead to limited flexibility in processing real data. Owing to the problem of community detection with multiple similarity matrices, we plan to perform multisource diffusion modeling to guarantee the maximal consistency of different data manifolds and effective information fusion.

Data Availability

The authors declare that the data supporting the findings of this study are available within the paper or from the authors upon reasonable request.

Disclosure

This work has been presented in the 2nd International Workshop on Social Computing (IWSC’18).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was funded by the NFSC under Grant 61303041, Funds of Key Scientific and Technological Innovation Team of the Shanxi Province, China, under Grant 2017KCT-29, and Funds for International Scientific and Technological Cooperation Project of the Shanxi Province under Grant 2017KW-015.” Lei Tang thanks LetPub (www.letpub.com) for its linguistic assistance during the preparation of this manuscript.”